18-05-2026 Low-Resource Dialect Adaptation of Large Language Models: A French Dialect Case-Study.Prof. Leila Kosseim

18-05-2026 Low-Resource Dialect Adaptation of Large Language Models: A French Dialect Case-Study.Prof. Leila Kosseim

07.04.2026

18-05-2026,15:30h. Room 55.309

Title:
Low-Resource Dialect Adaptation of Large Language Models: A French Dialect Case-Study

Abstract:

Despite the widespread adoption of Large Language Models (LLMs), their strongest capabilities remain largely confined to a small number of high-resource languages with abundant training data. Recently, Continual Pre-Training (CPT) has emerged as a promising method to bridge this gap by fine-tuning models for low-resource regional dialects.
In this talk, we will present our research on using CPT for dialect learning under tight data and computational budgets. We will detail our methodology for adapting three LLMs to the Québec French dialect using a very small dataset, leveraging compute-efficient continual pre-training and Low-Rank Adaptation (LoRA) to achieve this while updating under 1% of the model parameters.
Furthermore, we will share our benchmarking results on the COLE suite, demonstrating clear improvements on minority dialect benchmarks with minimal regression on prestige language benchmarks. Finally, we will break down our analysis showing that these gains are highly contingent on corpus composition. We will conclude by discussing how combining CPT with Parameter-Efficient Fine-Tuning (PEFT) provides a sustainable path for language resource creation, and introduce the release of the first Québec French LLMs on HuggingFace.

 

Bio:

Leila Kosseim is a Professor of Computer Science at Concordia University in Montreal, Canada, specializing in Natural Language Processing. She received her PhD from Université de Montréal in 1995, with a dissertation on Natural Language Generation. She joined Concordia University in 2001, where she co-founded the Computational Linguistics at Concordia (CLaC) Laboratory and has since supervised and graduated 10 PhD students and more than 20 Master’s students. Her research spans discourse analysis, emotion detection, subjectivity analysis, and natural language processing for low-resource languages. She also served as Vice-President (2017–2019), President (2019–2021), and Past-President (2021–2023) of the Canadian Artificial Intelligence Association.