Identificação automática de unidades de informação em testes de reconto de narrativas usando métodos de similaridade semântica: avaliação  de métodos de similaridade semântica

Leandro dos Borges dos Santos; Sandra Maria Aluísio

doi:10.21814/lm.11.2.304

Automatic identification of information units in tests based on narrative retelling using semantic similarity methods

evaluating semantic similarity methods

Authors

Leandro dos Borges dos Santos University of São Paulo
Sandra Maria Aluísio University of São Paulo https://orcid.org/0000-0001-5108-2630 (unauthenticated)

DOI:

https://doi.org/10.21814/lm.11.2.304

Keywords:

neuropsychological tests, narrative retellings, semantic similarity methods

Abstract

Diagnoses of Alzheimer's Disease (AD) and Mild Cognitive Impairment (CCL) are based on the analysis of the patient's cognitive functions by administering cognitive and neuropsychological assessment batteries. The use of retelling narratives is common to help identify and quantify the degree of dementia. In general, one point is awarded for each unit recalled, and the final score represents the number of units recalled. In this paper, we evaluated two clinical tasks: the automatic identification of which elements of a retold narrative were recalled; and the binary classification of the narrative produced by a patient, having the units identified as attributes, aiming at an automatic screening of patients with cognitive impairment. We used two transcribed retelling data sets in which sentences were divided and manually annotated with the information units. These data sets were then made publicly available. They are: the Arizona Battery for Communication and Dementia Disorders (ABCD) that contains narratives of patients with CCL and Healthy Controls and the Avaliação da Linguagem no Envelhecimento (BALE), which includes narratives of patients with AD and CCLs as well as Healthy Controls. We evaluated two methods based on semantic similarity, referred to here as STS and Chunking, and transformed the multi-label problem of identifying elements of a retold narrative into binary classification problems, finding a cutoff point for the similarity value of each information unit. In this way, we were able to overcome two baselines for the two datasets in the SubsetAccuracy metric, which is the most punitive for the multi-label scenario. In binary classification, however, not all six machine learning methods evaluated performed better than the baselines methods. For ABCD, the best methods were Decision Trees and KNN, and for BALE, SVM with RBF kernel stood out.

Downloads

PDF (Portuguese)

Published

2020-01-04

Issue

Vol. 11 No. 2

Section

Research Articles

License

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

How to Cite

Automatic identification of information units in tests based on narrative retelling using semantic similarity methods: evaluating semantic similarity methods. (2020). Linguamática, 11(2), 47-63. https://doi.org/10.21814/lm.11.2.304

Download Citation