Program

Workshop Program and Accepted Papers

WORKSHOP PROCEEDINGS CAN BE FOUND HERE.
UNLP 2025 on YouTube

Day 1: July 31

All times are in Central European Time (CET).

9:00-9:10 Opening Remarks

9:10-10:30 Morning Session: Downstream Tasks

Chair: Mariana Romanyshyn

9:10-9:25	Improving Named Entity Recognition for Low-Resource Languages Using Large Language Models: A Ukrainian Case Study Vladyslav Radchenko and Nazarii Drushchak
9:25-9:45	A Framework for Large-Scale Parallel Corpus Evaluation: Ensemble Quality Estimation Models Versus Human Assessment Dmytro Chaplynskyi and Kyrylo Zakharov
9:45-10:05	UNLP 2025 Best Paper Introducing OmniGEC: A Silver Multilingual Dataset for Grammatical Error Correction Roman Kovalchuk, Mariana Romanyshyn and Petro Ivaniuk
10:05-10:25	Improving Sentiment Analysis for Ukrainian Social Media Code-Switching Data Yurii Shynkarov, Veronika Solopova and Vera Schmitt

10:30-11:00 Morning Coffee Break

11:00-12:00 Morning Session: Towards a Ukrainian LLM

Chair: Oleksii Ignatenko

11:00-11:20

From English-Centric to Effective Bilingual: LLMs with Custom Tokenizers for Underrepresented Languages

Artur Kiulian, Anton Polishko, Mykola Khandoga, Yevhen Kostiuk, Guillermo Gabrielli, Łukasz Gągała, Fadi Zaraket, Qusai Abu Obaida, Hrishikesh Garud, Wendy Wing Yee Mak, Dmytro Chaplynskyi, Selma Belhadj Amor and Grigol Peradze

10:20-10:40

Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains

Yurii Paniv, Artur Kiulian, Dmytro Chaplynskyi, Mykola Khandoga, Anton Polishko, Tetiana Bas and Guillermo Gabrielli

11:40-12:00

On the Path to Make Ukrainian a High-Resource Language

Mykola Haltiuk and Aleksander Smywiński-Pohl

12:00-13:00 Keynote: Sebastian Ruder. Multilinguality in Llama 4 and Beyond

13:00-14:15 Lunch

14:15-15:30 Afternoon Session: Linguistics and NLP

Chair: Mariana Romanyshyn

14:15-14:30	Developing a Universal Dependencies Treebank for Ukrainian Parliamentary Speech Maria Shvedova, Arsenii Lukashevskyi and Andriy Rysin
14:30-14:50	Vuyko Mistral: Adapting LLMs for Low-Resource Dialectal Translation Roman Kyslyi, Yuliia Maksymiuk and Ihor Pysmennyi
14:50-15:10	Context-Aware Lexical Stress Prediction and Phonemization for Ukrainian TTS Systems Anastasiia Senyk, Mykhailo Lukianchuk, Valentyna Robeiko and Yurii Paniv
15:10-15:30	Precision vs. Perturbation: Robustness Analysis of Synonym Attacks in Ukrainian NLP Volodymyr Mudryi and Oleksii Ignatenko

15:30-16:00 Afternoon Coffee Break

16:00-17:00 Keynote: Illia Strelnykov. Leveraging User Feedback to Improve Your Models

17:00-17:50 Afternoon Session: Responsible AI

Chair: Oleksii Ignatenko

17:00-17:15

UAlign: LLM Alignment Benchmark for the Ukrainian Language

Andrian Kravchenko, Yurii Paniv and Nazarii Drushchak

17:15-17:30

GBEM-UA: Gender Bias Evaluation and Mitigation for Ukrainian Large Language Models

Mykhailo Buleshnyi, Maksym Buleshnyi, Marta Sumyk and Nazarii Drushchak

17:30-17:50

Gender Swapping as a Data Augmentation Technique: Developing Gender-Balanced Datasets for Ukrainian Language Processing

Olha Nahurna and Mariana Romanyshyn

17:50-18:00 Closing Words

Day 2: August 1

9:00-10:30 Morning Session: Shared Task

Chair: Roman Kyslyi

9:00-9:15	The UNLP 2025 Shared Task on Detecting Social Media Manipulation Roman Kyslyi, Nataliia Romanyshyn and Volodymyr Sydorskyi
9:15-9:30	Detecting Manipulation in Ukrainian Telegram: A Transformer-Based Approach to Technique Classification and Span Identification Md. Abdur Rahman and Md Ashiqur Rahman
9:30-9:45	Hidden Persuasion: Detecting Manipulative Narratives on Social Media During the 2022 Russian Invasion of Ukraine Kateryna Akhynko, Oleksandr Kosovan and Mykola Trokhymovych
9:45-10:00	Comparing Methods for Multi-Label Classification of Manipulation Techniques in Ukrainian Telegram Content Oleh Melnychuk
10:00-10:15	Framing the language: Fine-Tuning Gemma 3 for Manipulation Detection Mykola Khandoga, Yevhen Kostiuk, Anton Polishko, Kostiantyn Kozlov, Yurii Filipchuk and Artur Kiulian
10:15-10:30	Transforming Causal LLM into MLM Encoder for Detecting Social Media Manipulation in Telegram Anton Bazdyrev, Ivan Bashtovyi, Ivan Havlytskyi, Oleksandr Kharytonov and Artur Khodakovskyi

10:30-11:00 Morning Coffee Break

11:00-13:00 Panel Discussion: Disinformation Detection from a Business Perspective

Panelists: Kateryna Burovova, Nataliia Romanyshyn, Yaroslav Peliushenko, Yuliia Dukach

Chair: Roman Kyslyi

Keynote Speakers

Illia Strelnykov, Data Scientist at YouScan, Ukraine

Topic: Leveraging User Feedback to Improve Your Models

While academic research provides a strong foundation for model development, the ultimate goal is to deploy these models in real-world applications, where they interact with actual users. This talk addresses the critical challenge of effectively leveraging user feedback to enhance model performance in practical scenarios. We’ll explore ways to incorporate the highly valuable — yet inherently noisy — user-provided data into model training and fine-tuning pipelines. First, we’ll cover methods for collecting user feedback and the challenges involved in processing it, including issues like bias and conflicting information. Then we will examine various solutions for tackling these challenges and how to use refined feedback for model improvement.

Sebastian Ruder, Research Scientist at Meta, Germany

Topic: Multilinguality in Llama 4 and Beyond

Abstract: Multilingual LLMs have become so powerful that they can be used in real-world conversations in a variety of applications. While this presents many opportunities, it also poses challenges associated with the complexity of natural language. In this talk, I will seek to connect academic research to real-world challenges of multilingual conversational AI. I will first provide an overview of multilinguality in Llama 4, highlighting the importance of evaluation. I will then discuss what it takes to bridge the gap between academic and real-world evaluations. Finally, I will discuss how we can develop models that are useful to speakers in their local context, across the globe and for the Ukrainian language.

Panelists

Kateryna Burovova, ML Engineer at LetsData

Kateryna specializes in AI-powered solutions for detecting and combating harmful information operations, leveraging NLP and computational social science to create threat detection pipelines that analyze content semantics, user behavior patterns, network dynamics, and other contextual signals.

Nataliia Romanyshyn, AI Specialist at Texty.org.ua

Nataliia focuses on the detection and analysis of Russian disinformation. Her expertise includes natural language processing, specifically topic modeling, named entity recognition, large language models, and multilingual NLP. She plays a key role in developing analytical frameworks that transform complex textual data into actionable insights aimed at uncovering disinformation mechanisms.

Yaroslav Peliushenko, Head of Analytics at Osavul

Yaroslav is the Head of Analysis at Osavul, a technology company developing AI-powered solutions for deep intelligence and countering information threats. At UNLP, he will share insights into how their team analyses and structures information, the frameworks they use, and the thinking behind their approach.

Yuliia Dukach, PhD, Data Journalist and Head of Disinformation Investigations at OpenMinds

Yuliia is an expert in disinformation research with over five years of experience in investigative data journalism. Yuliia applies advanced skills in Python and machine learning to analyze computational propaganda and online misinformation.

Vasyl Starko, Ukrainian Catholic University, Ukraine

Andriy Rysin, Independent researcher, USA

Topic: BRUK Team’s Resources for Ukrainian Corpus Creation

The talk will focus on the key resources and tools developed by the BRUK team for the automatic processing of Ukrainian texts, especially for building Ukrainian corpora. The resources include:
* BRUK (Ukrainian Brown Corpus, a projected one-million-word POS gold standard)
* VESUM (A Large Electronic Dictionary of Ukrainian, over 420,000 lemmas and counting, for POS tagging)
* USL (Ukrainian Semantic Lexicon for semantic tagging).
The tools come in the form of the NLP_UK suite for Ukrainian text tokenization, lemmatization, POS tagging, and cleaning. The application of NLP_UK to build multiple iterations of the GRAC corpus will be discussed.