Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8)

EMNLP 2014 / SIGMT / SIGLEX Workshop
25 Oct 2014, Doha, Qatar

*** [NEW] Slides for all papers below ***

*** Special theme: Compositional Distributional Semantics and Machine Translation ***

The Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8) seeks to bring together a large number of researchers working on diverse aspects of structure, semantics and representation in relation to statistical machine translation. Since its first edition in 2006, its program each year has comprised high-quality papers discussing current work spanning topics including: new grammatical models of translation; new learning methods for syntax- and semantics-based models; formal properties of synchronous/transduction grammars (hereafter S/TGs); discriminative training of models incorporating linguistic features; using S/TGs for semantics and generation; and syntax- and semantics-based evaluation of machine translation.

We invite two types of submissions this year:

Special Theme Extended Abstracts

This year, the special theme of semantics of the past three editions of SSST takes a new step with a "working workshop" bringing together researchers interested in compositional distributional semantics, distributed representations, and continuous vector space models in MT, with tutorials bridging both directions, as well as discussions and hands-on work on relevant tasks with real data. Such models have proven beneficial for a number of NLP tasks, for example phrasal similarity, lexical entailment, modeling semantic deviance, detecting order restrictions in recursive structures, or improving NP bracketing in parsing. However, they have not received as much attention in MT.

Extended abstracts of at most two (2) pages should describe poster or hands-on presentations that will stimulate discussions on the special theme of compositional distributional semantics and machine translation, including position papers, recent work, pilot studies, negative results. We encourage the presentation of relevant work that has been published or submitted elsewhere, as well as new work in progress.

Full Papers

The need for structural mappings between languages is widely recognized in the fields of statistical machine translation and spoken language translation, and there is now wide consensus that these mappings are appropriately represented using a family of formalisms that includes synchronous/transduction grammars and similar notational equivalents. To date, flat-structured models, such as the word-based IBM models of the early 1990s or the more recent phrase-based models, remain widely used. But tree-structured mappings arguably offer a much greater potential for learning valid generalizations about relationships between languages.

Within this area of research there is a rich diversity of approaches. There is active research ranging from formal properties of S/TGs to large-scale end-to-end systems. There are approaches that make heavy use of linguistic theory, and approaches that use little or none. There is theoretical work characterizing the expressiveness and complexity of particular formalisms, as well as empirical work assessing their modeling accuracy and descriptive adequacy across various language pairs. There is work being done to invent better translation models, and work to design better algorithms. Recent years have seen significant progress on all these fronts. In particular, systems based on these formalisms are now top contenders in MT evaluations.

At the same time, SMT has seen a movement toward semantics over the past few years, which has been reflected at recent SSST workshops, including the last three editions which had semantics for SMT as a special theme. The issues of deep syntax and shallow semantics are closely linked and SSST-8 continues to encourage submissions on semantics for MT in a number of directions, including semantic role labeling, sense disambiguation, and compositional distributional semantics for translation and evaluation.

We invite full papers on:


Session 1: Morning Orals
09:00–09:10    Opening remarks
Dekai Wu, Marine Carpuat, Xavier Carreras, Eva Maria Vecchi
09:10–09:30    Vector Space Models for Phrase-based Machine Translation [slides]
Tamer Alkhouli1, Andreas Guta2, Hermann Ney1
1RWTH Aachen University, 2RWTH Aachen
09:30–09:50    Bilingual Markov Reordering Labels for Hierarchical SMT [slides]
Gideon Maillette de Buy Wenniger1 and Khalil Sima'an2
1Institute for Logic Language and Computation - University of Amsterdam, 2ILLC, University of Amsterdam
09:50–10:10    Better Semantic Frame Based MT Evaluation via Inversion Transduction Grammars [slides]
Dekai Wu1, Chi-kiu Lo1, Meriem Beloucif1, Markus Saers2
1HKUST, 2Hong Kong University of Science and Technology
10:10–10:30    Rule-based Syntactic Preprocessing for Syntax-based Machine Translation [slides]
Yuto Hatakoshi, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura
Nara Institute of Science and Technology
10:30–11:00    Coffee break
Invited Talk
11:00–12:00    Composed, Distributed Reflections on Semantics and Statistical Machine Translation [slides]
Timothy Baldwin
Session 2: Morning Spotlights
12:00–12:05    Applying HMEANT to English-Russian Translations
Alexander Chuchunkov, Alexander Tarelkin, Irina Galinskaya
Yandex LLC
12:05–12:10    Reducing the Impact of Data Sparsity in Statistical Machine Translation [slides]
Karan Singla1, Kunal Sachdeva1, Srinivas Bangalore2, Dipti Misra Sharma1, Diksha Yadav3
1LTRC, IIIT-Hyderabad, 2AT&T Labs-Research, 3IIIT-Hyderabad
12:10–12:15    Expanding the Language model in a low-resource hybrid MT system [slides]
George Tambouratzis, Sokratis Sofianopoulos, Marina Vassiliou
ILSP/Athena R.C.
12:15–12:20    Syntax and Semantics in Quality Estimation of Machine Translation [slides]
Rasoul Kaljahi1, Jennifer Foster1, Johann Roturier2
1Dublin City University, 2Symantec
12:20–12:25    Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation [slides]
Jean Pouget-Abadie1, Dzmitry Bahdanau2, Bart van Merrienboer3, Kyunghyun Cho3, Yoshua Bengio3
1Ecole Polytechnique, 2Jacobs University Bremen, 3University of Montreal
12:25–12:30    Ternary Segmentation for Improving Search in Top-down Induction of Segmental ITGs [slides]
Markus Saers1 and Dekai Wu2
1Hong Kong University of Science and Technology, 2HKUST
12:30–14:00    Lunch break
Session 3: Afternoon Orals and Spotlights
14:00–14:20    A CYK+ Variant for SCFG Decoding Without a Dot Chart [slides]
Rico Sennrich
University of Edinburgh
14:20–14:40    On the Properties of Neural Machine Translation: Encoder–Decoder Approaches [slides]
Kyunghyun Cho1, Bart van Merrienboer1, Dzmitry Bahdanau2, Yoshua Bengio1
1University of Montreal, 2Jacobs University Bremen
14:40–15:00    Transduction Recursive Auto-Associative Memory: Learning Bilingual Compositional Distributed Vector Representations of Inversion Transduction Grammars [slides]
Karteek Addanki and Dekai Wu
15:00–15:20    Transformation and Decomposition for Efficiently Implementing and Improving Dependency-to-String Model In Moses [slides]
Liangyou Li1, Jun Xie2, Andy Way3, Qun Liu4
1CNGL Centre for Global Intelligent Content, School of Computing, Dublin City University, 2ICT,CAS, 3CNGL, Dublin City University, 4Dublin City University
15:20–15:25    Word's Vector Representations meet Machine Translation [slides]
Eva Martinez Garcia1, Jörg Tiedemann2, Cristina España-Bonet1, Lluís Màrquez3
1TALP Research Center, 2Uppsala University, 3Qatar Computing Research Institute
15:25–15:30    Context Sense Clustering for Translation [slides]
João Casteleiro, Gabriel Lopes, Joaquim Silva
Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia, Departamento de Informática
15:30–16:00    Coffee break
Session 4: Afternoon Spotlights
16:00–16:05    Evaluating Word Order Recursively over Permutation-Forests [slides]
Miloš Stanojević1 and Khalil Sima'an2
1University of Amsterdam, ILLC, 2ILLC, University of Amsterdam
16:05–16:10    Preference Grammars and Soft Syntactic Constraints for GHKM Syntax-based Statistical Machine Translation [slides]
Matthias Huck, Hieu Hoang, Philipp Koehn
University of Edinburgh
16:10–16:15    How Synchronous are Adjuncts in Translation Data? [slides]
Sophie Arnoult1 and Khalil Sima'an2
1Institute of Logic, Language and Computation (ILLC), University of Amsterdam (UvA), 2ILLC, University of Amsterdam
Poster Session
16:15–17:30    Poster session of all workshop papers
All workshop presenters


Important Dates

Submission deadline for papers and extended abstracts: 1 Aug 2014
Notification to authors: 26 Aug 2014
Camera copy deadline: 15 Sep 2014


Papers will be accepted on or before 1 Aug 2014 in PDF or Postscript formats via the START system at https://www.softconf.com/emnlp2014/SSST-8/. Submissions should follow the EMNLP 2014 length and formatting requirements for long papers of nine (9) pages of content with any number of additional pages of references, found at http://emnlp2014.org/templates.html.


Please send inquiries to ssst@cs.ust.hk.

Last updated: 2014.10.25