The 2nd Workshop on Technologies for MT of Low Resource Languages (LoResMT 2019)

The Helix, DCU, Dublin, August 20, 2019

https://sites.google.com/view/loresmt/

@ MT Summit XVII (https://www.mtsummit2019.com/)

NEWS
- Call for Papers:https://easychair.org/cfp/LoResMT2019
Submission due on "May 24, 2019":https://easychair.org/conferences/?conf=loresmt2019
- Shared Tasks on MT for Bhojpuri, Magahi, Sindhi, and Latvian (<> English)
@Registration link: Participants please register by sending email toloresmt@googlegroups.comwith Team name, members (emails and affiliations) information.
Timeline: May 03, 2019: Release of training data June 04, 2019: Release of test data June 11, 2019: Submission of the systems June 16, 2019: Notification of results June 25, 2019: Submission of shared task papers
- Invited talk: "Building Cross-Lingual Knowledge Base for Low Resource Languages in China"by Prof Xiaobing Zhao et al.
- LoResMT 2019 Website on-line!
- Slides of LoResMT 2018 Workshop presentationshttps://sites.google.com/view/loresmt-2018/
- Proceeding of LoResMT 2018 Workshophttps://amtaweb.org/amta-2018-proceedings-for-the-conference-workshops-and-tutorials/
SCOPES
Machine translation (MT) technologies have been improved significantly in the last two decades, with the developments on phrased-based statistical MT (SMT) and recently the neural MT (NMT). However, most of these methods rely on the availability of large parallel data (millions to tens of millions sentence pairs) in the training, which are resources that do not exist in many language pairs.
In addition, MT methods still rely on a few natural language processing (NLP) tools to help pre-process human generated texts in the forms that are required as input for these methods, and/or post-process the output in proper textual forms in target languages. In many MT systems, the performance of these tools has great impacts on the quality of the resulting translation. These NLP tools include, but not limited to, several kinds of word tokenizers/de-tokenizers, word segmenters, morphology analyzers, etc.
The workshop solicits papers on MT systems/methods for low resource languages in general. We also solicit papers dedicated to these supplementary NLP tools that are used in any language and especially in low resource languages. We would like to have an overview of research on MT for low resource languages and these NLP tools from our community.
TOPICS
We solicit original research papers, review papers, and position papers on MT research for low resource languages in the workshop. Multilingual and/or cross-lingual NLP tools for low resource languages are especially welcome. Topics of the workshop include but are not limited to:
- Research and review papers of pre-processing and/or post-processing NLP tools for MT- Position papers on the development of pre-processing and/or post-processing tools for MT- Word tokenizers/de-tokenizers for specific languages- Word/morpheme segmenters for specific languages- Alignment/Re-ordering tools for specific language pairs- Use of morphology analyzers and/or morpheme segmenters in MT- Multilingual/cross-lingual NLP tools for MT- Re-usability of existing NLP tools for low resource languages- Corpora creation and curation technologies for low resource languages- Review of available parallel corpora for low resource languages- Research and review papers of MT methods for low resource languages- MT systems/methods (e.g. rule-based, SMT, NMT) for low resource languages- Pivot MT for low resource languages- Zero-shot MT for low resource languages- Fast building of MT systems for low resource languages- Re-usability of existing MT systems for low resource languages- Machine translation for language preservation
SUBMISSION INFORMATION
Workshop papers should adhere to MT Summit 2019 style guide (LaTeX, OpenOffice, Word).https://www.mtsummit2019.com/submissions
There are two types of submissions in the workshop. For research, review and position papers, the length of each paper should be at least four (4) and not exceed eight (8) pages, plus unlimited pages for references. More pages would be allowed as long as it could be justified. The review will be double-blinded. For non-archival system demonstration abstracts, the limit is four (4) pages. The review will be single-blind.
We would like to encourage authors to cite papers written in ANY language that are related to the topics, as long as both original bibliographic items and their corresponding English translations are provided.
IMPORTANT DATES
March 19, 2019: Call for papers April 24, 2019: 2nd Call for papers May 17, 2019: Abstract submission due May 24, 2019: Submission deadline of workshop papers June 21, 2019: Notification of acceptance July 12, 2019: Camera-ready papers due July 19, 2019: Workshop proceeding on-line August 20, 2019: LoResMT workshop
ORGANIZERS (listed alphabetically)
Alina Karakanta FBK-Fondazione Bruno Kessler Atul Kr. Ojha Panlingua Language Processing LLP/Jawaharlal Nehru University Chao-Hong Liu ADAPT Centre, Dublin City University Jonathan Washington Swarthmore College Nathaniel Oco National University (Philippines) Pinkey Nainwani Cognizant Technology Solutions, Bangalore Surafel Melaku Lakew FBK-Fondazione Bruno Kessler Valentin Malykh Huawei and Moscow Institute of Physics and Technology Varvara Logacheva Moscow Institute of Physics and Technology Xiaobing Zhao Minzu University of China
ACKNOWLEDGEMENTS
1. We would like to thank Tilde (https://tilde.com) for providing Latvian<>English corpus for the shared task, which is coordinated by Varvara Logacheva.
2. The ADAPT Centre for Digital Content Technology is funded under the SFI Research Centres Programme (Grant No. 13/RC/2106) and is co-funded under the European Regional Development Fund. The organization of this workshop has partially received funding from the European Union's Horizon 2020 Research and Innovation programme under the Marie Skłodowska-Curie Actions (Grant No. 734211).