Workshop Program


9:00–10:50 Morning Session: New Datasets

Chair: Mariana Romanyshyn


9:00–9:10 Opening Remarks
9:10–9:55 Keynote Speech: Mona Diab
9:55–10:15 Silver Data for Coreference Resolution in Ukrainian: Translation, Alignment, and Projection

Pavlo Kuchmiichuk

10:15–10:35 The Parliamentary Code-Switching Corpus: Bilingualism in the Ukrainian Parliament in the 1990s-2020s

Olha Kanishcheva, Tetiana Kovalova, Maria Shvedova and Ruprecht von Waldenfels

10:35–10:50 Creating a POS Gold Standard Corpus of Modern Ukrainian

Vasyl Starko and Andriy Rysin


10:50–11:20 Morning Break

11:20–12:55 Morning Session: New Directions

Chair: Oleksii Ignatenko


11:20–12:05 Keynote Speech: Gulnara Muratova
12:05–12:20 The Evolution of Pro-Kremlin Propaganda From a Machine Learning and Linguistics Perspective

Veronika Solopova, Christoph Benzmüller and Tim Landgraf

12:20–12:40 Extension Multi30K: Multimodal Dataset for Integrated Vision and Language Research in Ukrainian

Nataliia Saichyshyna, Daniil Maksymenko, Oleksii Turuta, Andriy Yerokhin, Andrii Babii and Olena Turuta

12:40–12:55 Exploring Word Sense Distribution in Ukrainian with a Semantic Vector Space Model

Nataliia Cheilytko and Ruprecht von Waldenfels


12:55–14:25 Lunch

14:25–16:00 Afternoon Session: Shared Task

Chair: Mariana Romanyshyn


14:25–14:40 UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Oleksiy Syvokon, Olena Nahorna, Pavlo Kuchmiichuk and Nastasiia Osidach

14:40–14:55 The UNLP 2023 Shared Task on Grammatical Error Correction for Ukrainian

Oleksiy Syvokon and Mariana Romanyshyn

14:55–15:15 Comparative Study of Models Trained on Synthetic Data for Ukrainian Grammatical Error Correction

Maksym Bondarenko, Artem Yushko, Andrii Shportko and Andrii Fedorych

15:15–15:30 A Low-Resource Approach to the Grammatical Error Correction of Ukrainian

Frank Palma Gomez, Alla Rozovskaya and Dan Roth

15:30–15:50 RedPenNet for Grammatical Error Correction: Outputs to Tokens, Attentions to Spans

Bohdan Didenko and Andrii Sameliuk

15:50–16:00 Best Paper and Thank You


16:00–16:30 Afternoon Break

16:30–18:10 Afternoon Session: UberText

Chair: Oleksii Ignatenko


16:30–16:50 Introducing UberText 2.0: A Corpus of Modern Ukrainian at Scale

Dmytro Chaplynskyi

16:50–17:05 GPT-2 Metadata Pretraining Towards Instruction Finetuning for Ukrainian

Volodymyr Kyrylov and Dmytro Chaplynskyi

17:05–17:25 Learning Word Embeddings for Ukrainian: A Comparative Study of FastText Hyperparameters

Nataliia Romanyshyn, Dmytro Chaplynskyi and Kyrylo Zakharov

17:25–17:45 Contextual Embeddings for Ukrainian: A Large Language Model Approach to Word Sense Disambiguation

Yurii Laba, Volodymyr Mudryi, Dmytro Chaplynskyi, Mariana Romanyshyn and Oles Dobosevych

17:45–18:00 Abstractive Summarization for the Ukrainian Language: Multi-Task Learning with News Dataset

Svitlana Galeshchuk

18:00–18:10 Closing Words


Keynote Speakers


Mona Diab, Lead Responsible AI Research Scientist with Meta, Professor of Computer Science at the George Washington University (on leave)


Topic: The Сriticality of Low-Resource Language Research for NLP Future Sustainability


In this keynote, I will highlight the necessity of low-resource language research. I will argue for the critical need for research and productivization of low-resource language technology as a way for ensuring diversity and inclusion but also the future sustainability of NLP at large. I will showcase some of our work in Arabic dialects and also work by various groups on creating resources and technologies for “digitally underprivileged” languages.


Gulnara Muratova, NGO “QIRI’M Young”, co-coordinator of the National Corpus of the Crimean Tatar Language


Topic: National Corpus of the Crimean Tatar Language


Crimean Tatar is the language of the indigenous people of Ukraine that is currently listed by UNESCO as one of the severely endangered languages. Nowadays, additional danger to the language is posed by the Russian Federation’s temporary occupation of the Autonomous Republic of Crimea, where most of the speakers of Crimean Tatar live. To preserve Crimean Tatar, create a fundamental online platform for linguistic research, and integrate it with various digital platforms, a team of enthusiasts began the development of the first National Corpus of the Crimean Tatar language. Limited or no access to the crucial printed sources, four different graphic systems, no tools for processing Crimean Tatar — these are only a few of the challenges that the team faced. Learn more at UNLP!


The project was initiated by the Ministry of Reintegration of the Temporarily Occupied Territories of Ukraine within the Strategy for the Development of the Crimean Tatar Language for 2022-2032 and is implemented with the support of the Swiss-Ukrainian EGAP Program by the Eastern Europe Foundation.