Program
Saturday, May 06, 2023 (GMT+2)
09:00 - 09:15 Opening remarks by Workshop Chairs
09:15 - 10:05 Invited talk: Nikola Ljubešić, Jožef Stefan Institute, Ljubljana, Slovenia
Chair: Atul Kr. Ojha
Title - Crawling your way out of less-resourcedness
Abstract: In this talk, Nikola Ljubešić will argue that by far the best option for a less-resourced language with a reasonable online presence to catch up with its better-resourced peers is through online data harvesting. While the proposed direction might seem obvious, it cannot be considered a simple endeavour. He will share some early lessons learned from my two decades of fighting less-resourcedness of Croatian, the official language of the location of the workshop. For a greater part of the talk, he will focus on the recent lessons learned during the MaCoCu project (https://macocu.eu) on crawling monolingual and parallel data for less-resourced European languages, covering, inter alia, questions of crawling decisions, language and variety identification, text quality estimation, and the impact of all these decisions on language and translation modelling. He will conclude his talk with a discussion on the way forward for less-resourced languages in the light of large language models.
10:05 - 10:35 Session 1: Finding Papers
Chair: Sina Ahmadi
10:05-10:20 Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting - Zifan Jiang, Amit Moryossef, Mathias Müller and Sarah Ebling
10:20-10:35 Decipherment as Regression: Solving Historical Substitution Ciphers by Learning Symbol Recurrence Relations - Nishant Kambhatla, Logan Born and Anoop Sarkar
10:35 - 11:15 COFFEE/TEA BREAK
11:15 - 12:45 Session 2: Scientific Research Papers
Chair: Ekaterina Vylomova
11:15-11:35 Train Global, Tailor Local: Minimalist Multilingual Translation into Endangered Languages - Zhong Zhou, Jan Niehues and Alexander Waibel
11:35- 11:55 Measuring the Impact of Data Augmentation Methods for Extremely Low-Resource NMT - Annie Lamar and Zeyneb N. Kaya
11:55-12:15 Language-Family Adapters for Low-Resource Multilingual Neural Machine Translation - Alexandra Chronopoulou, Dario Stojanovski and Alexander Fraser
12:15-12:35 Multilingual Bidirectional Unsupervised Translation through Multilingual Finetuning and Back-Translation - Bryan Li, Mohammad Sadegh Rasooli, Ajay Patel and Chris Callison-Burch
12:45 - 14:15 LUNCH
14:15 - 15:00 Rico Sennrich, University of Zurich, Switzerland
Chair: Chao-Hong Liu
Title - Applying Lessons from Low-Resource Machine Translation to Speech and Sign Language Translation
Abstract: Large multilingual models have revolutionized natural language processing by unlocking knowledge sharing across tasks and languages. For modalities other than text, such as audio, images, or video, neural architectures commonly used for text have also proven effective, but the beneficial sharing of representations across modalities remains a challenge. In this talk, will discuss recent successes (and failures) for the multimodal tasks of speech translation and sign language translation. Both tasks being very low-resourced, what lessons from low-resource text translation can be applied to these multimodal tasks? What unique solutions are required to address the audio and video modality? To what extent is information shared across modalities in multi-task multimodal systems?
15:00 - 15:45 Session 3: Finding Papers
Chair: Nathaniel Oco
15:00-15:15 Are the Best Multilingual Document Embeddings simply Based on Sentence Embeddings? - Sonal Sannigrahi, Josef van Genabith and Cristina España-Bonet
15:15-15:30 Intent Identification and Entity Extraction for Healthcare Queries in Indic Languages - Ankan Mullick, Ishani Mondal, Sourjyadip Ray, Raghav R, G Chaitanya and Pawan Goyal
15:30-15:45 A Simplified Training Pipeline for Low-Resource and Unsupervised Machine Translation - Àlex R. Atrio, Alexis Allemann, Ljiljana Dolamic and Andrei Popescu-Belis
15:45 - 16:30 COFFEE/TEA BREAK
16:30 - 18:05 Session 4: Scientific Research Papers
Chair: Valentin Malykh
16:30-16:50 Improving Neural Machine Translation of Indigenous Languages with Multilingual Transfer Learning - Wei-Rui Chen and Muhammad Abdul-Mageed
16:50-17:10 PEACH: Pre-Training Sequence-to-Sequence Multilingual Models for Translation with Semi-Supervised Pseudo-Parallel Document Generation - Alireza Salemi, Amirhossein Abaskohi, Sara Tavakoli, Azadeh Shakery and Yadollah Yaghoobzadeh
17:10-17:30 Investigating Lexical Replacements for Arabic-English Code-Switched Data Augmentation - Injy Hamed, Nizar Habash, Slim Abdennadher and Ngoc Thang Vu
17:30-17:50 Evaluating Sentence Alignment Methods in a Low-Resource Setting: An English-YorùBá Study Case - Edoardo Signoroni and Pavel Rychlý
17:50-18:05 Findings from the Bambara - French Machine Translation Competition (BFMT 2023) - Ninoh Agostinho Da Silva, Tunde Ajayi, Alex Antonov, Panga Azazia Kamate, Moussa Coulibaly, Mason Del Rio, Yacouba Diarra, Sebastian Diarra, Chris Emezue, Joel Hamilcaro, Christopher Homan, Alexander Most, Joseph Mwatukange, Peter Ohue, Michael Pham, Abdoulaye Sako, Sokhar Samb, Yaya Sy, Tharindu Cyril Weerasooriya, Yacine Zahidi and Sarah Luger
18:05 -18:10 Closing remarks by Workshop Chairs