Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-5)
ACL HLT 2011 / SIGMT / SIGLEX
Workshop
23 June 2011, Portland, Oregon, USA
The Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-5) seeks to build on the foundations established in the first four SSST workshops, which brought together a large number of researchers working on diverse aspects of structure and representation in relation to statistical machine translation. Its program each year has comprised high-quality papers discussing current work spanning topics including: new grammatical models of translation; new learning methods for syntax-based models; formal properties of synchronous/transduction grammars (hereafter S/TGs); discriminative training of models incorporating linguistic features; using S/TGs for semantics and generation; and syntax- and semantics-based evaluation of machine translation.
The need for structural mappings between languages is widely recognized in the fields of statistical machine translation and spoken language translation, and there is a growing consensus that these mappings are appropriately represented using a family of formalisms that includes synchronous/transduction grammars and their tree-transducer equivalents. To date, flat-structured models, such as the word-based IBM models of the early 1990s or the more recent phrase-based models, remain widely used. But tree-structured mappings arguably offer a much greater potential for learning valid generalizations about relationships between languages.
Within this area of research there is a rich diversity of approaches. There is active research ranging from formal properties of S/TGs to large-scale end-to-end systems. There are approaches that make heavy use of linguistic theory, and approaches that use little or none. There is theoretical work characterizing the expressiveness and complexity of particular formalisms, as well as empirical work assessing their modeling accuracy and descriptive adequacy across various language pairs. There is work being done to invent better translation models, and work to design better algorithms. Recent years have seen significant progress on all these fronts. In particular, systems based on these formalisms are now top contenders in MT evaluations.
At the same time, SMT has seen a movement toward semantics over the past five years, which has been reflected at recent SSST workshops. The issues of deep syntax and shallow semantics are closely linked. Semantic SMT research now includes semantic role labeling (SRL) for MT evaluation, SRL for SMT, and WSD for SMT.
In order to emphasize structure and representation at semantic and not only syntactic levels, “Semantics” has been explicitly added to the name of this year's Workshop (the acronym remains SSST), and is a special workshop theme. Special sessions will be devoted to the Semantics theme.
We invite papers on:
- syntax-based / semantics-based / tree-structured SMT
- machine learning techniques for inducing structured translation models
- algorithms for training, decoding, and scoring with semantic representation structure
- empirical studies on adequacy and efficiency of formalisms
- creation and usefulness of syntactic/semantic resources for MT
- formal properties of synchronous/transduction grammars
- learning semantic information from monolingual, parallel or comparable corpora
- unsupervised and semi-supervised word sense induction and disambiguation methods for MT
- lexical substitution, word sense induction and disambiguation, semantic role labeling, textual entailment, paraphrase and other semantic tasks for MT
- semantic features for MT models (word alignment, translation lexicons, language models, etc.)
- evaluation of syntactic/semantic components within MT (task-based evaluation)
- scalability of structured translation methods to small or large data
- applications of S/TGs to related areas including:
- speech translation
- formal semantics and semantic parsing
- paraphrases and textual entailment
- information retrieval and extraction
- syntactically- and semantically-motivated evaluation of MT
For more information: http://www.cs.ust.hk/~dekai/ssst/
Special Theme: Semantics in SMT
The need for semantic modeling in MT is becoming increasingly obvious in the MT community: even as BLEU scores steadily improve, crucial errors of meaning still hurt the quality of current SMT systems. At the same time, there is renewed interest in the semantics community for designing models that are directly relevant to NLP applications. However, semantic models designed for standalone tasks do not easily fit in current MT architectures. With this year's special theme, we seek to bridge this gap by bringing together researchers working on semantics and on translation in order to encourage cross-pollination of ideas, share insights into the needs of MT and what current developments in semantics have to offer.
We particularly encourage the submission of papers addressing the following issues:
- Learning and using semantic representations for MT. This is currently a very active topic in lexical semantics, and many relevant tasks were defined for the last edition of SemEval. There is work on unsupervised sense induction in both monolingual and cross-lingual settings (e.g., Apidianaki (2009), Manandhar et al. (2010)). Cross-lingual sense disambiguation (Lefever and Hoste, 2010) and lexical substitution tasks (Mihalcea et al., 2010) can be cast as SMT lexical choice (e.g., Aziz and Specia (2010)) and exploit similar resources as SMT systems. However, it remains to be seen how models developed in this context scale up for use on unrestricted text and whether they are directly exploitable in end-to-end MT systems.
- Integration of semantic models in MT. What semantic representations and integration strategies are needed for specific MT problems and architectures? Deeper understanding of these issues is much needed, given the variety of promising results that have emerged over the past three years: WSD models have been successfully repurposed for SMT lexical choice (e.g., Carpuat and Wu (2007), Chan et al. (2007), Stroppa et al. (2007), Gimenez and Màrquez (2008)); bilingual SRL can now improve SMT through reordering (Wu and Fung, 2009); and various monolingual semantic models have been targeted to specific problems, such as translating unknown words and low resource languages (e.g., (Specia et al. 2008; Marton et al., 2009, Mirkin et al. 2009, Baker et al. 2010, Pal et al., 2010)).
- Semantics-driven evaluation of MT. Ongoing work suggests that MT evaluation is improved by generalizing across similar word meanings (e.g., Zhou et al. (2006), Apidianaki and He (2010), Snover et al. (2009), Denkowski and Lavie (2010)), and explicitly modeling preservation of meaning with textual entailment (Padó et al. 2009), or semantic frames (Lo and Wu, 2010a). What frameworks are best suited to measure MT quality in general, and the impact of semantic modeling in particular?
Program
Session 1 | |
09:00 | Opening Remarks |
09:15 | Automatic Projection of Semantic Structures: an Application to Pairwise Translation Ranking Daniele Pighin and Lluís Màrquez |
09:40 | Structured vs. Flat Semantic Role Representations for Machine Translation Evaluation Chi-kiu Lo and Dekai Wu |
10:05 | Semantic Mapping Using Automatic Word Alignment and Semantic Role Labeling Shumin Wu and Martha Palmer |
10:30 | Coffee Break / Poster Session |
Incorporating Source-Language Paraphrases into Phrase-Based SMT with Confusion Networks Jie Jiang, Jinhua Du and Andy Way | |
Multi-Word Unit Dependency Forest-based Translation Rule Extraction Hwidong Na and Jong-Hyeok Lee | |
An Evaluation and Possible Improvement Path for Current SMT Behavior on Ambiguous Nouns Els Lefever and Véronique Hoste | |
Improving Reordering for Statistical Machine Translation with Smoothed Priors and Syntactic Features Bing Xiang, Niyu Ge and Abraham Ittycheriah | |
Session 2 | |
11:00 | Reestimation of Reified Rules in Semiring Parsing and Biparsing Markus Saers and Dekai Wu |
11:25 | A Dependency Based Statistical Translation Model Giuseppe Attardi, Atanas Chanev and Antonio Valerio Miceli Barone |
11:50 | Improving MT Word Alignment Using Aligned Multi-Stage Parses Adam Meyers, Michiko Kosaka, Shasha Liao and Nianwen Xue |
12:15 | Lunch |
Session 3 | |
13:50 | Automatic Category Label Coarsening for Syntax-Based Machine Translation Greg Hanneman and Alon Lavie |
14:15 | Utilizing Target-Side Semantic Role Labels to Assist Hierarchical Phrase-based Machine Translation Qin Gao and Stephan Vogel |
14:40 | Combining statistical and semantic approaches to the translation of ontologies and taxonomies John McCrae, Mauricio Espinoza, Elena Montiel-Ponsoda, Guadalupe Aguado-de-Cea and Philipp Cimiano |
15:05 | A Semantic Feature for Statistical Machine Translation Rafael E. Banchs and Marta R. Costa-jussa |
15:30 | Coffee Break / Poster Session |
Session 4 | |
16:00 | A General-Purpose Rule Extractor for SCFG-Based Machine Translation Greg Hanneman, Michelle Burroughs and Alon Lavie |
16:25 | Panel Discussion |
Organizers
- Dekai WU, Hong Kong University of Science and Technology (HKUST), Hong Kong
Co-chairs for special theme on Semantics in SMT
- Marianna APIDIANAKI, Alpage, INRIA and University Paris 7, France
- Marine CARPUAT, National Research Council (NRC), Canada
- Lucia SPECIA, University of Wolverhampton, UK
Program Committee
- Eneko AGIRRE, University of the Basque Country, Spain
- Colin CHERRY, National Research Council (NRC), Canada
- Marc DYMETMAN, Xerox Research Center Europe, France
- Hieu HOANG, University of Edinburgh, UK
- Philipp KOEHN, University of Edinburgh, UK
- Philippe LANGLAIS, University of Montreal, Canada
- Aurélien MAX, Université Paris Sud 11, France
- Diana McCARTHY, Lexical Computing, UK
- Sudip Kumar NASKAR, Dublin City University, Ireland
- Roberto NAVIGLI, University of Rome "La Sapienza", Italy
- Hwee Tou NG, National University of Singapore, Singapore
- Sebastian PADO, Universität Heidelberg, Germany
- Martha PALMER, University of Colorado, USA
- Ted PEDERSEN, University of Minnesota, USA
- Markus SAERS, Hong Kong University of Science and Technology (HKUST), Hong Kong
- Matthew SNOVER, City University of New York, USA
- Nicolas STROPPA, Google, Switzerland
- François YVON, Université Paris Sud 11, France
Important Dates
Submission deadline: 11 Apr 2011
Notification to authors: 30 Apr 2011
Camera copy deadline: 7 May 2011
Submission
Papers will be accepted on or before 11 Apr 2011 in PDF or Postscript formats via the START system at https://www.softconf.com/acl2011/ssst/. Submissions should follow the ACL HLT 2011 length and formatting requirements for long papers of eight (8) pages of content with two (2) additional pages of references, found at http://www.acl2011.org/call.shtml.
Camera Copy
Camera ready final versions will be accepted on or before 7 May 2011 in PDF or Postscript formats via the START system at https://www.softconf.com/acl2011/ssst/. Papers should follow the ACL HLT 2011 camera ready length and formatting requirements for long papers of nine (9) pages of content with unlimited additional pages of references, found at http://www.acl2011.org/authors.shtml.
Contact
Please send inquiries to ssst@cs.ust.hk.
Last updated: 2011.05.24