The 3rd Workshop on Technologies for MT of

Low Resource Languages (LoResMT 2020)

AACL-IJCNLP, China, December 4, 2020

https://sites.google.com/view/loresmt-2020

@ AACL-IJCNLP 2020 (http://aacl2020.org/)

NEWS


Timeline: July 3, 2020 – Call for papers released September 18, 2020 – Paper submissions due (extended) September 21 - October 25, 2020 – Review period October 27, 2020 – Notification November 6, 2020 – Camera-ready due November 15, 2020 – Video recordings due December 4, 2020 - LoResMT workshop

SCOPE


In the past few years machine translation (MT) performance has been improved significantly. With the development of new techniques such as multilingual translation and transfer learning, the use of MT is no longer a privilege to users of popular languages. Consequently, there has been an increasing interest in the community to expand the coverage to more languages with different geographical presence, degree of diffusion and digitalization. However, the goal to increase MT coverage for more users speaking diverse languages, is limited by the fact the MT methods demand huge amounts of data to train quality systems, which has posed a major obstacle to develop MT systems for low resource languages. Therefore, developing comparable MT systems with relative small datasets is still highly desirable.
In addition, despite the fast developments of MT technologies, MT systems still rely on several NLP tools to pre-process human-generated texts in the forms that are required as input for MT systems and post-process the MT output in proper textual forms in the target language. This is especially true when it comes to systems involving low resource languages. These NLP tools include, but are not limited to, several kinds of word tokenizers/de-tokenizers, word segmenters, morphology analysers, etc. The performance of these tools has a great impact on the quality of the resulting translation. There is only limited discussion on these NLP tools, their methods, their role in training different MT systems, and their coverage of support in the many languages of the world.
The workshop provides a discussion panel for researchers working on MT systems/methods for low resource and under-represented languages in general. We would like to help review/overview the state of MT for low resource languages and define the most important directions. We also solicit papers dedicated to supplementary NLP tools that are used in any language and especially in low resource languages. Overview papers of these NLP tools are very welcome. It will be beneficial if the evaluations of these tools in research papers include their impact on the quality of MT output.

TOPICS


We solicit original research papers, review papers, and position papers on MT research for low resource languages in the workshop. Multilingual and/or cross-lingual NLP tools for low resource languages are especially welcome. Topics of the workshop include but are not limited to:
- Research and review papers of pre-processing and/or post-processing NLP tools for MT- Position papers on the development of pre-processing and/or post-processing tools for MT- Word tokenizers/de-tokenizers for specific languages- Word/morpheme segmenters for specific languages- Alignment/Re-ordering tools for specific language pairs- Use of morphology analyzers and/or morpheme segmenters in MT- Multilingual/cross-lingual NLP tools for MT- Re-usability of existing NLP tools for low resource languages- Corpora creation and curation technologies for low resource languages- Review of available parallel corpora for low resource languages- Research and review papers of MT methods for low resource languages- MT systems/methods (e.g. rule-based, SMT, NMT) for low resource languages- Pivot MT for low resource languages- Zero-shot MT for low resource languages- Fast building of MT systems for low resource languages- Re-usability of existing MT systems for low resource languages- Machine translation for language preservation

INVITED SPEAKERS

  • Grace Tang, Alp Öktem - Translators Without Borders
Gamayun: using language technology to improve humanitarian communication
Alp Öktem and Grace Tang will introduce Translators without Borders’ (TWB) Gamayun project where they aim to enable two-way communication using language technology for marginalized language speakers. They will share their experiences in applying state-of-the-art low-resource methodologies in machine translation to achieve impact in humanitarian crisis response for some of the world’s under-represented languages and dialects.
  • Bonaventure Dossou & Chris Emezue - Jacobs University Bremen & Technical University of Munich, Germany
Fon-French Neural Machine Translation
Machine Translation for low-resourced African Languages is still a wide-open challenge. African languages are very diverse and morphologically rich, making very important and crucial every single building of the NMT system, from data collection to processing and training. In our talk, we explore, with Fon as case study, how NMT systems could be trained efficiently, reducing the ambiguity and improving the model's translation.

SUBMISSION INFORMATION


There are two types of submissions in the workshop. For research, review and position papers, the length of each paper should be at least four (4) and not exceed eight (8) pages, plus unlimited pages for references. For system demonstration papers, the limit is four (4) pages. Submissions should be formatted according to the official AACL-IJCNLP 2020 style templates (LaTeX, Microsoft Word, Overleaf). Accepted papers will be published on-line in the AACL-IJCNLP 2020 proceedings and will be presented at the conference either orally or as a poster.
Submissions must be anonymised and should be done using the Softconf START conference management system at https://www.softconf.com/aacl-ijcnlp2020/LoResMT/.Scientific papers already, or to be, submitted to other venues must be declared as such, and must be withdrawn from the other venues if accepted and published at LoResMT. The review will be double-blind.
We would like to encourage authors to cite papers written in ANY language that are related to the topics, as long as both original bibliographic items and their corresponding English translations are provided.
Registration will be handled by the main conference: http://aacl2020.org/registration/

IMPORTANT DATES


July 3, 2020 – Call for papers released September 18, 2020 – Paper submissions due (extended) September 21 October 25, 2020 – Review period October 27, 2020 – Notification November 6, 2020 – Camera-ready due December 4, 2020 – LoResMT workshop