Solo Queue at ASSIN: Mix of Traditional and Emerging Approaches

  • Nathan Siegle Hartmann Universidade de São Paulo

Abstract

In this paper we present a proposal to automatically label the similarity between a pair of sentences and the results obtained on ASSIN 2016 sentence similarity shared-task. Our proposal consists of using a classical feature of bag-of-words, the TF-IDF model; and an emergent feature, obtained from processing word embeddings. The TF-IDF is used to relate texts which share words. Word embeddings are known by capture the syntax and semantics of a word. Following Mikolov et al. (2013), the sum of embedding vectors can model the meaning of a sentence. Using both features, we are able to capture the words shared between sentences and their semantics. We use linear regression to solve this problem, once the dataset is labeled as real numbers between 1 and 5. Our results are promising. Although the usage of embeddings has not overcome our baseline system, when we combined it with TF-IDF, our system achieved better results than only using TF-IDF. Our results achieved the first collocation of ASSIN 2016 for sentence similarity shared-task applied on brazilian portuguese sentences and second collocation when applying to Portugal portuguese sentences.

Published
2016-12-31
How to Cite
Hartmann, N. S. (2016). Solo Queue at ASSIN: Mix of Traditional and Emerging Approaches. Linguamática, 8(2), 59-64. Retrieved from https://linguamatica.com/index.php/linguamatica/article/view/v8n2-6
Section
Research Articles