Detecting Paraphrases for Portuguese using Word and Sentence Embeddings

  • Marlo Souza Universidade Federal da Bahia
  • Leandro Manuel Pereira Sanches Federal University of Bahia
Keywords: Paraphrase Identification, Semantic Textual Similarity, Sentence Embeddings

Abstract

Paraphrase detection/identification is the task of determining whether two or more sentences of arbitrary length possess the same meaning. Methods to solve this task have many potential applications in Natural Language Processing systems. This work investigates the combination of different methods of sentence representation in a vector space model of language and linear classifiers to the problem of paraphrase identification for the Portuguese language. The results obtained in this work are inferior to those obtained for the related task of recognizing textual entailment in the ASSIN evaluation for the Portuguese language, but we point out that in this work we investigate the application of sentence embeddings to the problem of paraphrase detection, as such other features usually explored in systems for this task may be trivially incorporated into our method to improve performance.

Published
2019-01-24
How to Cite
Souza, M., & Sanches, L. M. P. (2019). Detecting Paraphrases for Portuguese using Word and Sentence Embeddings. Linguamática, 10(2), 31-44. https://doi.org/10.21814/lm.10.2.286
Section
POP - By Other Words