Exploring the effectiveness of neural language models for identification and classification of lexical collocations

  • Radovan Milovic Universidad de Santiago de Compostela
Keywords: lexical collocations, lexical functions, neural language models, fine-tuning

Abstract

The majority of research on automated collocation processing has focused on using association measures. However, the focus has been slowly shifting to exploring the effectiveness of neural language models (NLMs). In this paper, we investigate the latter by fine-tuning BERT family models in English, Spanish, and Portuguese using annotated lexical resources with Lexical Functions (LFs). We examine the capabilities of language models for the identification and classification of lexical collocation in both monolingual and multilingual scenarios. The results of the overall performances varied, with f1 scores ranging from 0.30 to 0.51. We conclude that the multilingual model excels in cross-lingual learning by employing a combined training set of all three languages. Moreover, despite possible variability, the results demonstrate improved identification of Lexical Functions with a larger number of instances in the training set. Lastly, we conduct a qualitative analysis to investigate possible patterns of misidentification exhibited by the model.

Published
2024-06-27
How to Cite
Milovic, R. (2024). Exploring the effectiveness of neural language models for identification and classification of lexical collocations. Linguamática, 16(1), 17-28. https://doi.org/10.21814/lm.16.1.428
Section
New Perspectives