Saturday, May 06, 2023 (GMT+2)

09:00 - 09:15   Opening remarks by Workshop Chairs

09:15 - 10:05   Invited talk: Nikola Ljubešić, Jožef Stefan Institute, Ljubljana, Slovenia

                           Chair:  Atul Kr. Ojha

Title - Crawling your way out of less-resourcedness

Abstract:  In this talk, Nikola Ljubešić will argue that by far the best option for a less-resourced language with a reasonable online presence to catch up with its better-resourced peers is through online data harvesting. While the proposed direction might seem obvious, it cannot be considered a simple endeavour. He will share some early lessons learned from my two decades of fighting less-resourcedness of Croatian, the official language of the location of the workshop. For a greater part of the talk, he will focus on the recent lessons learned during the MaCoCu project ( on crawling monolingual and parallel data for less-resourced European languages, covering, inter alia, questions of crawling decisions, language and variety identification, text quality estimation, and the impact of all these decisions on language and translation modelling. He will conclude his talk with a discussion on the way forward for less-resourced languages in the light of large language models.

10:05 - 10:35    Session 1: Finding Papers

Chair:  Sina Ahmadi

10:35 - 11:15    COFFEE/TEA BREAK

11:15 - 12:45    Session 2: Scientific Research Papers

Chair:  Ekaterina Vylomova

12:45 - 14:15    LUNCH

14:15 - 15:00    Rico Sennrich, University of Zurich, Switzerland

                                    Chair:  Chao-Hong Liu

Title - Applying Lessons from Low-Resource Machine Translation to Speech and Sign Language Translation

Abstract:  Large multilingual models have revolutionized natural language processing by unlocking knowledge sharing across tasks and languages. For modalities other than text, such as audio, images, or video, neural architectures commonly used for text have also proven effective, but the beneficial sharing of representations across modalities remains a challenge. In this talk, will discuss recent successes (and failures) for the multimodal tasks of speech translation and sign language translation. Both tasks being very low-resourced, what lessons from low-resource text translation can be applied to these multimodal tasks? What unique solutions are required to address the audio and video modality? To what extent is information shared across modalities in multi-task multimodal systems?

15:00 - 15:45    Session 3: Finding Papers

Chair:  Nathaniel Oco

15:45 - 16:30    COFFEE/TEA BREAK

16:30 - 18:05    Session 4: Scientific Research Papers

Chair:  Valentin Malykh

18:05 -18:10  Closing remarks by Workshop Chairs