Norm, use and interference: linguistic biases in Catalan language models

Authors

DOI:

https://doi.org/10.21814/lm.18.1.497

Keywords:

language models, biases, use, interference, Catalan

Abstract

Large Language Models are increasingly influencing written communication. This poses challenges for minority languages such as Catalan. This study quantifies the linguistic biases of six Catalan language models, analyzing their preferences for normative versus non-normative grammatical constructions, especially for cases where there can be interference from Spanish. Using a corpus of minimal pairs, we evaluate both monolingual and multilingual models by comparing their preferences for each (non-)normative variant. The results indicate that there is no difference between monolingual and multilingual models in their preference for normative constructions. However, cases where there can be interference from Spanish markedly reduce the preference for normative forms across all analyzed models. These findings suggest that the models' biases reflect the prevalence of non-normative usage in their training data, due to influence from Spanish. This underscores the importance of evaluating these technologies to inform language policy and understand their impact on language evolution.

References

Published

2026-01-30

Issue

Section

Research Articles

How to Cite

Norm, use and interference: linguistic biases in Catalan language models. (2026). Linguamática, 18(1), preprint. https://doi.org/10.21814/lm.18.1.497