Program

Workshop Program and Accepted Papers

WORKSHOP PROCEEDINGS CAN BE FOUND HERE.

All times are in Central European Time (CET).

UNLP 2024 on YouTube

9:00-9:10 Opening remarks

9:10-10:10 Morning session: New Datasets

Chair: Mariana Romanyshyn

9:10-9:25	A Contemporary News Corpus of Ukrainian: Compilation, Annotation, Publication Stefan Fischer, Kateryna Haidarzhyi, Jörg Knappen, Olha Polishchuk, Yuliya Stodolinska and Elke Teich
9:25-9:40	Introducing the Djinni Recruitment Dataset: A Corpus of Anonymized CVs and Job Postings Nazarii Drushchak and Mariana Romanyshyn
9:40-9:55	Creating Parallel Corpora for the Ukrainian Language: a German-Ukrainian Parallel Corpus Maria Shvedova and Arsenii Lukashevskyi
9:55-10:10	Introducing NER-UK 2.0: A Rich Corpus of Named Entities for Ukrainian Dmytro Chaplynskyi and Mariana Romanyshyn

10:10-10:30 Morning session: Invited Lightning Talks

10:10-10:20

Introducing CLARIN K-center for Ukrainian Language Research: Cooperation and Development

Olha Kanishcheva

10:20-10:30

PAWUK: Polish Automatic Web corpus of UKrainian

Witold Kieraś, Łukasz Kobyliński, Dorota Komosińska, Bartłomiej Nitoń, Michał Rudolf, Maria Shvedova, Aleksandra Zwierzchowska

10:30-11:00 Morning Coffee break

11:00-12:10 Morning Session: New Directions

Chair: Oleksii Ignatenko

11:00-11:20	Instant Messaging Platforms News Multi-Task Classification for Stance, Sentiment, and Discrimination Detection Taras Ustyianovych and Denilson Barbosa
11:20-11:35	Setting up the Data Printer with Improved English to Ukrainian Machine Translation Yurii Paniv, Dmytro Chaplynskyi, Nikita Trynus and Volodymyr Kyrylov
11:35-11:55	Automated Extraction of Hypo-Hypernym Relations for the Ukrainian WordNet Nataliia Romanyshyn, Dmytro Chaplynskyi and Mariana Romanyshyn
11:55-12:10	Ukrainian Visual Word Sense Disambiguation Benchmark Yurii Laba, Yaryna Mohytych, Ivanna Rohulia, Halyna Kyryleyza, Hanna Dydyk-Meush, Oles Dobosevych and Rostyslav Hryniv
12:10-13:00	Invited talk: Towards Equitable and Culturally Adapted Multilingual Dialog Systems Ivan Vulić, University of Cambridge, UK

13:00-14:00 Lunch break

14:00-16:00 Afternoon session: LLMs for Ukrainian

Chair: Mariana Romanyshyn

14:00-14:15	The UNLP 2024 Shared Task on Fine-Tuning Large Language Models for Ukrainian Oleksiy Syvokon, Mariana Romanyshyn and Roman Kyslyi
14:15-14:35	Fine-tuning and Retrieval Augmented Generation for Question Answering using affordable Large Language Models Tiberiu Boros, Radu Chivereanu, Stefan Dumitrescu and Octavian Purcaru
14:35-14:55	From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation Artur Kiulian, Anton Polishko, Mykola Khandoga, Oryna Chubych, Jack Connor, Raghav Ravishankar and Adarsh Shirawalmath
14:55-15:15	Spivavtor: An Instruction Tuned Ukrainian Text Editing Model Aman Saini, Artem Chernodub, Vipul Raheja and Vivek Kulkarni
15:15-15:35	Eval-UA-tion 1.0: Benchmark for Evaluating Ukrainian (Large) Language Models Serhii Hamotskyi, Anna-Izabella Levbarg and Christian Hänig
15:35-15:55	LiBERTa: Advancing Ukrainian Language Modeling through Pre-training from Scratch Mykola Haltiuk and Aleksander Smywiński-Pohla

16:00-16:30 Afternoon Coffee break

16:30-17:00 Afternoon session: LLMs for Ukrainian

Chair: Oleksii Ignatenko

16:30-16:45

Entity Embelishment Mitigation in LLMs Output with Noisy Dataset for Alignment

Svitlana GALESHCHUK

16:45-17:00

Language-Specific Pruning for Efficient Reduction of Large Language Models

Maksym Shamrai

17:00-17:50

Invited talk: BRUK Team’s Resources for Ukrainian Corpus Creation

Vasyl Starko, Ukrainian Catholic University (Ukraine)
Andriy Rysin, Independent Researcher (USA)

17:50-18:00 Closing Words

Keynote Speakers

Ivan Vulić, University of Cambridge, UK

Topic: Towards Equitable and Culturally Adapted Multilingual Dialog Systems

The ability to intelligently converse with humans has been one of the fundamental objectives in the pursuit of artificial intelligence, and dialog systems are one of the prime user-facing applications of NLP technology. Task-oriented dialog systems in particular have been designed to assist or replace human operators in focused problems and domains with well-defined goals. However, designing and bootstrapping such systems that are able to cover multiple languages and/or domains without performance degradation, or even just collecting data for training and evaluation, is known to be notoriously difficult. In this talk, I will first point to the main gaps and challenges of modern task-oriented dialog systems, including the sheer lack of multilingual datasets and lack of cultural adaptation, which result in culturally biased, inequitable, English-centric, and non-adaptive systems. I will then follow up by describing new data collection protocols that enable creation of high-quality culturally adapted multi-parallel multilingual data. The new datasets unlock the potential for unprecedented quantitative and qualitative evaluation and analyses of multilingual performance disparities. I will then delve deeper into current performance and (cultural) diversity gaps and disparities in multilingual multi-domain dialog systems, and provide a quick overview of the latest algorithmic developments aiming to reduce the detected gaps.

Vasyl Starko, Ukrainian Catholic University, Ukraine

Andriy Rysin, Independent researcher, USA

Topic: BRUK Team’s Resources for Ukrainian Corpus Creation

The talk will focus on the key resources and tools developed by the BRUK team for the automatic processing of Ukrainian texts, especially for building Ukrainian corpora. The resources include:
* BRUK (Ukrainian Brown Corpus, a projected one-million-word POS gold standard)
* VESUM (A Large Electronic Dictionary of Ukrainian, over 420,000 lemmas and counting, for POS tagging)
* USL (Ukrainian Semantic Lexicon for semantic tagging).
The tools come in the form of the NLP_UK suite for Ukrainian text tokenization, lemmatization, POS tagging, and cleaning. The application of NLP_UK to build multiple iterations of the GRAC corpus will be discussed.