9:00-9:10 Opening remarks
9:10-10:10 Morning session: New Datasets
Chair: Mariana Romanyshyn
9:10-9:25 | A Contemporary News Corpus of Ukrainian: Compilation, Annotation, Publication
Stefan Fischer, Kateryna Haidarzhyi, Jörg Knappen, Olha Polishchuk, Yuliya Stodolinska and Elke Teich |
9:25-9:40 | Introducing the Djinni Recruitment Dataset: A Corpus of Anonymized CVs and Job Postings
Nazarii Drushchak and Mariana Romanyshyn |
9:40-9:55 | Creating Parallel Corpora for the Ukrainian Language: a German-Ukrainian Parallel Corpus
Maria Shvedova and Arsenii Lukashevskyi |
9:55-10:10 | Introducing NER-UK 2.0: A Rich Corpus of Named Entities for Ukrainian
Dmytro Chaplynskyi and Mariana Romanyshyn |
10:10-10:30 Morning session: Invited Lightning Talks
10:10-10:20 | Introducing CLARIN K-center for Ukrainian Language Research: Cooperation and Development
Olha Kanishcheva |
10:20-10:30 | PAWUK: Polish Automatic Web corpus of UKrainian
Witold Kieraś, Łukasz Kobyliński, Dorota Komosińska, Bartłomiej Nitoń, Michał Rudolf, Maria Shvedova, Aleksandra Zwierzchowska |
10:30-11:00 Morning Coffee break
11:00-12:10 Morning Session: New Directions
Chair: Oleksii Ignatenko
11:00-11:20 | Instant Messaging Platforms News Multi-Task Classification for Stance, Sentiment, and Discrimination Detection
Taras Ustyianovych and Denilson Barbosa |
11:20-11:35 | Setting up the Data Printer with Improved English to Ukrainian Machine Translation
Yurii Paniv, Dmytro Chaplynskyi, Nikita Trynus and Volodymyr Kyrylov |
11:35-11:55 | Automated Extraction of Hypo-Hypernym Relations for the Ukrainian WordNet
Nataliia Romanyshyn, Dmytro Chaplynskyi and Mariana Romanyshyn |
11:55-12:10 | Ukrainian Visual Word Sense Disambiguation Benchmark
Yurii Laba, Yaryna Mohytych, Ivanna Rohulia, Halyna Kyryleyza, Hanna Dydyk-Meush, Oles Dobosevych and Rostyslav Hryniv |
12:10-13:00 | Invited talk: Towards Equitable and Culturally Adapted Multilingual Dialog Systems
Ivan Vulić, University of Cambridge, UK |
13:00-14:00 Lunch break
14:00-16:00 Afternoon session: LLMs for Ukrainian
Chair: Mariana Romanyshyn
14:00-14:15 | The UNLP 2024 Shared Task on Fine-Tuning Large Language Models for Ukrainian
Oleksiy Syvokon, Mariana Romanyshyn and Roman Kyslyi |
14:15-14:35 | Fine-tuning and Retrieval Augmented Generation for Question Answering using affordable Large Language Models
Tiberiu Boros, Radu Chivereanu, Stefan Dumitrescu and Octavian Purcaru |
14:35-14:55 | From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation
Artur Kiulian, Anton Polishko, Mykola Khandoga, Oryna Chubych, Jack Connor, Raghav Ravishankar and Adarsh Shirawalmath |
14:55-15:15 | Spivavtor: An Instruction Tuned Ukrainian Text Editing Model
Aman Saini, Artem Chernodub, Vipul Raheja and Vivek Kulkarni |
15:15-15:35 | Eval-UA-tion 1.0: Benchmark for Evaluating Ukrainian (Large) Language Models
Serhii Hamotskyi, Anna-Izabella Levbarg and Christian Hänig |
15:35-15:55 | LiBERTa: Advancing Ukrainian Language Modeling through Pre-training from Scratch
Mykola Haltiuk and Aleksander Smywiński-Pohla |
16:00-16:30 Afternoon Coffee break
16:30-17:00 Afternoon session: LLMs for Ukrainian
Chair: Oleksii Ignatenko
16:30-16:45 | Entity Embelishment Mitigation in LLMs Output with Noisy Dataset for Alignment
Svitlana GALESHCHUK |
16:45-17:00 | Language-Specific Pruning for Efficient Reduction of Large Language Models
Maksym Shamrai |
17:00-17:50 | Invited talk: BRUK Team’s Resources for Ukrainian Corpus Creation
Vasyl Starko, Ukrainian Catholic University (Ukraine) |
17:50-18:00 Closing Words
Keynote Speakers
Ivan Vulić, University of Cambridge, UK
Topic: Towards Equitable and Culturally Adapted Multilingual Dialog Systems
The ability to intelligently converse with humans has been one of the fundamental objectives in the pursuit of artificial intelligence, and dialog systems are one of the prime user-facing applications of NLP technology. Task-oriented dialog systems in particular have been designed to assist or replace human operators in focused problems and domains with well-defined goals. However, designing and bootstrapping such systems that are able to cover multiple languages and/or domains without performance degradation, or even just collecting data for training and evaluation, is known to be notoriously difficult. In this talk, I will first point to the main gaps and challenges of modern task-oriented dialog systems, including the sheer lack of multilingual datasets and lack of cultural adaptation, which result in culturally biased, inequitable, English-centric, and non-adaptive systems. I will then follow up by describing new data collection protocols that enable creation of high-quality culturally adapted multi-parallel multilingual data. The new datasets unlock the potential for unprecedented quantitative and qualitative evaluation and analyses of multilingual performance disparities. I will then delve deeper into current performance and (cultural) diversity gaps and disparities in multilingual multi-domain dialog systems, and provide a quick overview of the latest algorithmic developments aiming to reduce the detected gaps.
Vasyl Starko, Ukrainian Catholic University, Ukraine
Andriy Rysin, Independent researcher, USA
Topic: BRUK Team’s Resources for Ukrainian Corpus Creation
The talk will focus on the key resources and tools developed by the BRUK team for the automatic processing of Ukrainian texts, especially for building Ukrainian corpora. The resources include:
* BRUK (Ukrainian Brown Corpus, a projected one-million-word POS gold standard)
* VESUM (A Large Electronic Dictionary of Ukrainian, over 420,000 lemmas and counting, for POS tagging)
* USL (Ukrainian Semantic Lexicon for semantic tagging).
The tools come in the form of the NLP_UK suite for Ukrainian text tokenization, lemmatization, POS tagging, and cleaning. The application of NLP_UK to build multiple iterations of the GRAC corpus will be discussed.