SLEEC assignment of descriptors to judgments of the Supreme Court of Justice of Portugal

Keywords: Descriptors, Legal Documents, Extreme Multi-label Classification, SLEEC

Abstract

Extreme Multi-label Classification (XML) involves predicting multiple labels for a given input, a fundamental problem in domains such as text categorization, recommendation systems, and image tagging. This task presents significant challenges for machine learning and information retrieval, particularly given the exponential growth of online data and the concomitant need for algorithms capable of handling large-scale datasets with numerous labels. Traditional classification methods are inadequate for this task due to the vast number of possible label combinations and the sparsity of label assignments. This paper reports the results of a project with the Supreme Court of Justice of Portugal (``Supremo Tribunal de Justiça Português'') to address the problem using Sparse Local Embeddings for Extreme Multi-label Classification (SLEEC), an embedding-based approach that showed promising results in legal datasets. Our goal was to associate descriptors, which categorize court judgments, with the judgments themselves. This work tackled various challenges, including a large number of descriptors, an unbalanced dataset, numerous tail labels, and extensive document lengths. Our experimental results demonstrate that our approach achieved a precision/recall variation ranging between 0.57 and 0.68, indicating promising performance in this complex task.

Published
2025-06-30
How to Cite
Zanatti, M., Ribeiro, R., Pinto, H. S., & Borbinha, J. (2025). SLEEC assignment of descriptors to judgments of the Supreme Court of Justice of Portugal. Linguamática, 17(1), 35-51. https://doi.org/10.21814/lm.17.1.481
Section
Research Articles