Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-6)
ACL 2012 / SIGMT / SIGLEX
Workshop
12 July 2012, Jeju, Korea
*** [NEW] Slides for all papers below ***
*** Special theme: Semantic MT Evaluation ***
The Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-6) seeks to build on the foundations established in the first five SSST workshops, which brought together a large number of researchers working on diverse aspects of structure, semantics and representation in relation to statistical machine translation. Its program each year has comprised high-quality papers discussing current work spanning topics including: new grammatical models of translation; new learning methods for syntax- and semantics-based models; formal properties of synchronous/transduction grammars (hereafter S/TGs); discriminative training of models incorporating linguistic features; using S/TGs for semantics and generation; and syntax- and semantics-based evaluation of machine translation.
The need for structural mappings between languages is widely recognized in the fields of statistical machine translation and spoken language translation, and there is a growing consensus that these mappings are appropriately represented using a family of formalisms that includes synchronous/transduction grammars and their tree-transducer equivalents. To date, flat-structured models, such as the word-based IBM models of the early 1990s or the more recent phrase-based models, remain widely used. But tree-structured mappings arguably offer a much greater potential for learning valid generalizations about relationships between languages.
Within this area of research there is a rich diversity of approaches. There is active research ranging from formal properties of S/TGs to large-scale end-to-end systems. There are approaches that make heavy use of linguistic theory, and approaches that use little or none. There is theoretical work characterizing the expressiveness and complexity of particular formalisms, as well as empirical work assessing their modeling accuracy and descriptive adequacy across various language pairs. There is work being done to invent better translation models, and work to design better algorithms. Recent years have seen significant progress on all these fronts. In particular, systems based on these formalisms are now top contenders in MT evaluations.
At the same time, SMT has seen a movement toward semantics over the past few years, which has been reflected at recent SSST workshops, including the last edition which had semantics for SMT as a special theme. The issues of deep syntax and shallow semantics are closely linked and SSST-6 encourages submissions on semantics for MT in a number of directions, including semantic role labeling (SRL) for SMT, WSD for SMT and in particular, semantics for MT evaluation. In order to emphasize the need to evaluate MT in a way that properly assesses preservation of structure and semantics, SSST-6 is highlighting Semantic MT Evaluation as a special workshop theme.
We invite papers on:
- syntactically- and semantically-motivated evaluation of MT
- syntax-based / semantics-based / tree-structured SMT
- machine learning techniques for inducing structured translation models
- algorithms for training, decoding, and scoring with semantic representation structure
- empirical studies on adequacy and efficiency of formalisms
- creation and usefulness of syntactic/semantic resources for MT
- formal properties of synchronous/transduction grammars
- learning semantic information from monolingual, parallel or comparable corpora
- unsupervised and semi-supervised word sense induction and disambiguation methods for MT
- lexical substitution, word sense induction and disambiguation, semantic role labeling, textual entailment, paraphrase and other semantic tasks for MT
- semantic features for MT models (word alignment, translation lexicons, language models, etc.)
- evaluation of syntactic/semantic components within MT (task-based evaluation)
- scalability of structured translation methods to small or large data
- applications of S/TGs to related areas including:
- speech translation
- formal semantics and semantic parsing
- paraphrases and textual entailment
- information retrieval and extraction
- syntactically- and semantically-motivated evaluation of MT
For more information: http://www.cs.ust.hk/~dekai/ssst/
Special Theme: Semantics MT Evaluation
Ongoing work suggests that MT evaluation is improved by generalizing across similar word meanings (Zhou et al., 2006; Apidianaki et al, 2009; Snover et al., 2009; Denkowski and Lavie, 2010), and explicitly modeling preservation of meaning with textual entailment (Padó et al. 2009), or semantic frames (Lo and Wu, 2011). However, crucial questions such as what frameworks are best suited to measure MT quality in general, and the impact of semantic modeling in MT evaluation remain unanswered. With this year's special theme, we seek to bring together researchers working on semantics and on translation evaluation in order to encourage cross-pollination of ideas, share insights into the needs of MT evaluation and what current developments in semantics have to offer. We particularly encourage the submission of papers addressing the following issues related to semantics-driven evaluation of MT:
- MT evaluation metrics generalizing across similar word meanings
- MT evaluation metrics explicitly modeling preservation of meaning via textual entailment, semantic frames, etc
- New frameworks to measure MT quality using semantic information, including machine learning approaches
- Evaluation of the impact of semantic modeling on MT evaluation
- Use of semantic information for quality/confidence estimation (MT evaluation without reference translations)
Program
08:45 | Opening
Remarks Proceedings of SSST-6, Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation Marine Carpuat, Lucia Specia and Dekai Wu (editors) |
Session 1: Source language modeling | |
09:00 | WSD for n-best reranking and local language modeling in SMT [slides] Marianna Apidianaki, Guillaume Wisniewski, Artem Sokolov, Aurélien Max and François Yvon |
09:30 | Linguistically-Enriched Models for Bulgarian-to-English Machine Translation [slides] Rui Wang, Petya Osenova and Kiril Simov |
10:00 | Enriching Parallel Corpora for Statistical Machine Translation with Semantic Negation Rephrasing [slides] Dominikus Wetzel and Francis Bond |
10:30 | Coffee Break |
Session 2: MT output evaluation and processing | |
11:00 | Towards a Predicate-Argument Evaluation for MT [slides] Ondřej Bojar and Dekai Wu |
11:30 | Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical Errors [slides] Rudolf Rosa, Ondřej Dušek, David Mareček and Martin Popel |
12:00 | Unsupervised vs. supervised weight estimation for semantic MT evaluation metrics [slides] Chi-kiu Lo and Dekai Wu |
12:30 | Lunch |
Session 3: Semantic dependencies | |
14:00 | Head Finalization Reordering for Chinese-to-Japanese Machine Translation [slides] Dan Han, Katsuhito Sudoh, Xianchao Wu, Kevin Duh, Hajime Tsukada and Masaaki Nagata |
14:30 | Extracting Semantic Transfer Rules from Parallel Corpora with SMT Phrase Aligners [slides] Petter Haugereid and Francis Bond |
15:00 | Towards Probabilistic Acceptors and Transducers for Feature Structures [slides] Daniel Quernheim and Kevin Knight |
15:30 | Coffee Break / Poster Session |
Using Domain-specific and Collaborative Resources for Term Translation [slides] Mihael Arcan, Christian Federmann and Paul Buitelaar | |
Improving
Statistical Machine Translation through co-joining parts of verbal
constructs in English-Hindi translation [no slides] Karunesh Kumar Arora and R. Mahesh K. Sinha | |
Application of Clause Alignment for Statistical Machine Translation [slides] Svetla Koeva, Svetlozara Leseva, Ivelina Stoyanova, Rositsa Dekova, Angel Genov, Borislav Rizov, Tsvetana Dimitrova, Ekaterina Tarpomanova and Hristina Kukova | |
Zero Pronoun Resolution can Improve the Quality of J-E Translation [slides] Hirotoshi Taira, Katsuhito Sudoh and Masaaki Nagata | |
Session 4 | |
16:00 | Panel Discussion |
Organizers
- Marine CARPUAT, National Research Council (NRC), Canada
- Lucia SPECIA, University of Sheffield, UK
- Dekai WU, Hong Kong University of Science and Technology (HKUST), Hong Kong
Program Committee
- Marianna APIDIANAKI, LIMSI-CNRS, France
- Wilker AZIZ, University of Wolverhampton, UK
- Srinivas BANGALORE, AT&T Research, USA
- David CHIANG, USC ISI, USA
- Colin CHERRY, National Research Council (NRC), Canada
- Mona DIAB, Columbia University, USA
- Alexander FRASER, University of Stuttgart, Germany
- Daniel GILDEA, University of Rochester, USA
- Nizar HABASH, Columbia University, USA
- Yifan HE, Dublin City University, Ireland
- Philipp KOEHN, University of Edinburgh, UK
- Kevin KNIGHT, USC ISI, USA
- Alon LAVIE, Carnegie Mellon University, USA
- Yanjun MA, Baidu, China
- Daniel MARCU, USC ISI and Language Weaver, USA
- Lluìs MÀRQUEZ, Universitat Politècnica de Catalunya, Spain
- Sudip Kumar NASKAR, Dublin City University, Ireland
- Hwee Tou NG, National University of Singapore, Singapore
- Daniel PIGHIN, Universitat Politècnica de Catalunya, Spain
- Markus SAERS, Hong Kong University of Science and Technology (HKUST), Hong Kong
- Libin SHEN, IBM, USA
- Matthew SNOVER, BBN, USA
- John TINSLEY, Dublin City University, Ireland
- Stephan VOGEL, Carnegie Mellon University, USA
- Taro WATANABE, NICT, Japan
- Deyi XIONG, National University of Singapore, Singapore
- François YVON, Université Paris Sud 11, France
Important Dates
Submission deadline: 27 Apr 2012
Notification to authors: 16 May 2012
Camera copy deadline: 23 May 2012
Submission
Papers will be accepted on or before 27 Apr 2012 in PDF or Postscript formats via the START system at https://www.softconf.com/acl2012/ssst-6/. Submissions should follow the ACL 2012 length and formatting requirements for long papers of eight (8) pages of content with two (2) additional pages of references, found at http://www.acl2012.org/call/sub01.asp.
Camera Copy
Camera ready final versions will be accepted on or before 23 May 2012 in PDF or Postscript formats via the START system at https://www.softconf.com/acl2012/ssst-6/. Papers should follow the ACL 2012 camera ready length and formatting requirements for long papers of eight (8) pages of content with two (2) additional pages of references, found at http://www.acl2012.org/call/sub01.asp.
Contact
Please send inquiries to ssst@cs.ust.hk.
Last updated: 2012.07.12