Vector Space Semantic models (VSMs) have gained attention over the last years in a great variety of computational language modelling tasks. Some of the most popular approaches to computational semantic models use various training methods based on neural-networks language modelling to obtain dense vector representations, which are commonly known as neural embeddings or word embeddings. These/nneural models have been proved to capture what Turney (2006) calls attributional similarities as well as relational ...
Vector Space Semantic models (VSMs) have gained attention over the last years in a great variety of computational language modelling tasks. Some of the most popular approaches to computational semantic models use various training methods based on neural-networks language modelling to obtain dense vector representations, which are commonly known as neural embeddings or word embeddings. These/nneural models have been proved to capture what Turney (2006) calls attributional similarities as well as relational similarities between words. The goal of this master’s thesis is to explore the extent and the limitations of the word embeddings with regards to their capacity to encode the complex coherence relations that Discourse Markers signal along a given text. To that end, we have built different vector spaces of DMs using new Log-linear Models (CBOW and Skip-gram). The subsequent DMs representations have been evaluated by means of data mining techniques such as clustering and supervised classifications. The results obtained in this research show that only those DMs where the lexical effect is greater can be represented efficiently by word embeddings. Likewise, comparing both data mining techniques (clustering and supervised classification), we conclude that the relations among similar DMs can be induced better with a supervised methods previously trained on a given data.
+