Program

Workshop Program and Accepted Papers

Day 1

All times are in Central European Time (CET).

9:00-9:10 Opening Remarks

9:10-10:30 Morning Session: Downstream Tasks

 

Chair: Mariana Romanyshyn

9:10-9:25 Improving Named Entity Recognition for Low-Resource Languages Using Large Language Models: A Ukrainian Case Study

Vladyslav Radchenko and Nazarii Drushchak

9:25-9:45 A Framework for Large-Scale Parallel Corpus Evaluation: Ensemble Quality Estimation Models Versus Human Assessment

Dmytro Chaplynskyi and Kyrylo Zakharov

9:45-10:05 Introducing OmniGEC: A Silver Multilingual Dataset for Grammatical Error Correction

Roman Kovalchuk, Mariana Romanyshyn and Petro Ivaniuk

10:05-10:25 Improving Sentiment Analysis for Ukrainian social media code-switching data

Yurii Shynkarov, Veronika Solopova and Vera Schmitt

 

10:30-11:00 Morning Coffee Break

11:00-12:00 Morning Session: Towards a Ukrainian LLM

 

Chair: Oleksii Ignatenko

11:00-11:20 From English-Centric to Effective Bilingual: LLMs with Custom Tokenizers for Underrepresented Languages

Artur Kiulian, Anton Polishko, Mykola Khandoga, Yevhen Kostiuk, Guillermo Gabrielli, Łukasz Gągała, Fadi Zaraket, Qusai Abu Obaida, Hrishikesh Garud, Wendy Wing Yee Mak, Dmytro Chaplynskyi, Selma Belhadj Amor and Grigol Peradze

10:20-10:40 Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains

Yurii Paniv, Artur Kiulian, Dmytro Chaplynskyi, Mykola Khandoga, Anton Polishko, Tetiana Bas and Guillermo Gabrielli

11:40-12:00 On the Path to make Ukrainian a High-Resource Language

Mykola Haltiuk and Aleksander Smywiński-Pohl

 

12:00-13:00 Keynote: Sebastian Ruder

13:00-14:15 Lunch

14:15-15:30 Afternoon Session: Linguistics and NLP

 

Chair: Mariana Romanyshyn

14:15-14:30 Developing a Universal Dependencies Treebank for Ukrainian Parliamentary Speech: Annotation Methods and Case Variation Analysis

Maria Shvedova, Arsenii Lukashevskyi and Andriy Rysin

14:30-14:50 Vuyko Mistral: Adapting LLMs for Low-Resource Dialectal Translation

Roman Kyslyi, Yuliia Maksymiuk and Ihor Pysmennyi

14:50-15:10 Context-Aware Lexical Stress Prediction and Phonemization for Ukrainian TTS Systems

Anastasiia Senyk, Mykhailo Lukianchuk, Valentyna Robeiko and Yurii Paniv

15:10-15:30 Precision vs. Perturbation: Robustness Analysis of Synonym Attacks in Ukrainian NLP

Volodymyr Mudryi and Oleksii Ignatenko

 

15:30-16:00 Afternoon Coffee Break

16:00-17:00 Keynote: Illia Strelnykov

17:00-17:50 Afternoon Session: Responsible AI

 

Chair: Oleksii Ignatenko

17:00-17:15 UAlign: LLM Alignment Benchmark for The Ukrainian Language

Andrian Kravchenko, Yurii Paniv and Nazarii Drushchak

17:15-17:30 GBEM-UA: Gender Bias Evaluation and Mitigation for Ukrainian Large Language Models

Mykhailo Buleshnyi, Maksym Buleshnyi, Marta Sumyk and Nazarii Drushchak

17:30-17:50 Gender Swapping as a Data Augmentation Technique: Developing Gender-Balanced Datasets for Ukrainian Language Processing

Olha Nahurna and Mariana Romanyshyn

 

17:50-18:00 Closing Words

Day 2

9:00-9:10 Intro
9:10-10:40 Morning Session: Shared Task

 

Chair: TDB

9:00-9:15 The UNLP 2025 Shared Task on Detecting Social Media Manipulation

Roman Kyslyi, Nataliia Romanyshyn and Volodymyr Sydorskyi

9:15-9:30 Detecting Manipulation in Ukrainian Telegram: A Transformer-Based Approach to Technique Classification and Span Identification

Md. Abdur Rahman and Md Ashiqur Rahman

9:30-9:45 Hidden Persuasion: Detecting Manipulative Narratives on Social Media During the 2022 Russian Invasion of Ukraine

Kateryna Akhynko, Oleksandr Kosovan and Mykola Trokhymovych

9:45-10:00 Comparing Methods for Multi-Label Classification of Manipulation Techniques in Ukrainian Telegram Content

Oleh Melnychuk

10:00-10:15 Framing the language at UNLP 2025: Fine-Tuning Gemma 3 for Manipulation Detection

Mykola Khandoga, Yevhen Kostiuk, Anton Polishko, Kostiantyn Kozlov, Yurii Filipchuk and Artur Kiulian

10:15-10:30 Transforming Causal LLM into MLM Encoder for Detecting Social Media Manipulation in Telegram

Anton Bazdyrev, Ivan Bashtovyi, Ivan Havlytskyi, Oleksandr Kharytonov and Artur Volodymyrovych Khodakovskyi

 

10:30-11:00 Morning Coffee Break

11:00-13:00 Panel Discussion: Disinformation Detection from a Business Perspective

Panelists:
Kateryna Burovova, ML Engineer at LetsData
Nataliia Romanyshyn, AI Specialist at Texty.org.ua
Yaroslav Peliushenko, Head of Analytics at Osavul
Yuliia Dukach, Head of Disinformation Investigations at OpenMinds

 

Chair: Roman Kyslyi

13:00-14:00 Lunch

Keynote Speakers

 

Illia Strelnykov, Data Scientist at YouScan, Ukraine

 

Topic: Leveraging User Feedback to Improve Your Models

While academic research provides a strong foundation for model development, the ultimate goal is to deploy these models in real-world applications, where they interact with actual users. This talk addresses the critical challenge of effectively leveraging user feedback to enhance model performance in practical scenarios. We’ll explore ways to incorporate the highly valuable — yet inherently noisy — user-provided data into model training and fine-tuning pipelines. First, we’ll cover methods for collecting user feedback and the challenges involved in processing it, including issues like bias and conflicting information. Then we will examine various solutions for tackling these challenges and how to use refined feedback for model improvement.

 

 

Sebastian Ruder, Research Scientist at Meta, Germany

 

Topic: Multilingual Modeling and Evaluation in Llama 4 and Beyond

In this talk, I will cover some of the multilingual modeling methods and evaluations we used for Llama 4. Looking ahead, I will discuss the current challenges in cross-lingual research, with a focus on Ukrainian specifically.

Vasyl Starko, Ukrainian Catholic University, Ukraine

Andriy Rysin, Independent researcher, USA

Topic: BRUK Team’s Resources for Ukrainian Corpus Creation

 

The talk will focus on the key resources and tools developed by the BRUK team for the automatic processing of Ukrainian texts, especially for building Ukrainian corpora. The resources include:
* BRUK (Ukrainian Brown Corpus, a projected one-million-word POS gold standard)
* VESUM (A Large Electronic Dictionary of Ukrainian, over 420,000 lemmas and counting, for POS tagging)
* USL (Ukrainian Semantic Lexicon for semantic tagging).
The tools come in the form of the NLP_UK suite for Ukrainian text tokenization, lemmatization, POS tagging, and cleaning. The application of NLP_UK to build multiple iterations of the GRAC corpus will be discussed.