Using annotated discourse information of a RST Spanish-Chinese treebank for translation and language learning tasks

Cao, Shuyuan

Using annotated discourse information of a RST Spanish-Chinese treebank for translation and language learning tasks

Enllaç permanent

http://hdl.handle.net/10803/664419

Descripció

Resum
As one of the essential elements for Natural Language Processing (NLP), discourse has called much attention during recent years. Many studies explore the role of how discourse elements affect in different NLP research areas, such as parsing, sentiment analysis, machine translation evaluation, among others. Besides, along with the discourse analysis development, different treebanks annotated with discourse information for different languages form a great contribution for advancing the NLP researches. Spanish and Chinese are two of the most spoken languages in the world; the language pair occupy an important position for NLP studies. Therefore, this study aims to make a discourse analysis between the two languages in terms of annotating discourse similarities and differences under the theoretical framework of Rhetorical Structure Theory (RST) by Mann and Thompson (1988). Our goal, which is the main objective of this study, based on the annotation results, the study seeks to develop a protocol that includes recommendations for Spanish-Chinese translation. In addition, with a globalized context in the current society, the communication between Spanish and Chinese is more and more intensive. Therefore, another intention of our study is to develop some resources for the language learning between Spanish-Chinese. To achieve our goals, for the development of the protocol, we firstly establish a Spanish-Chinese parallel corpus and annotate the discourse information of the entire corpus. Then we evaluate the annotation results following a qualitative method to guarantee the high quality of the annotation results. Lastly, we conclude the discourse similarities and differences to make the protocol. Regarding the language learning between the two languages, we fully use the manually annotated discourse markers (DM) to develop a question-answering module. In recent years, there have been few contrastive works of Spanish and Chinese for discourse analysis. Therefore, this PhD study aims to partially fill a knowledge gap in the study between Spanish and Chinese.
Como uno de los elementos esenciales para el Procesamiento del Lenguaje Natural (PLN), el discurso ha llamado mucho la atención durante los últimos años. Diversos estudios exploran el papel de cómo los elementos del discurso afectan en diferentes áreas de investigación del PLN, por ejemplo, el análisis sintáctico, el análisis de sentimientos, la evaluación de la traducción automática, entre otros. Además, junto con el desarrollo del análisis del discurso, diferentes treebanks anotados con infomación discursiva para diferentes idiomas forman una gran contribución para el avance de las investigaciones del PLN. El español y el chino son dos de los idiomas más hablados en el mundo, ambos ocupan un lugar importante para los estudios de PNL. Por lo tanto, este estudio pretende hacer un análisis del discurso entre las dos lenguas en términos de anotar similitudes y diferencias del discurso bajo el marco teórico Teoría de la Estructura Retórica (RST) de Mann y Thompson (1988). El objetivo principal de este estudio, basado en los resultados de la anotación, busca desarrollar un protocolo que incluya recomendaciones para la traducción entre el español y el chino. Además, en un contexto globalizado en la sociedad actual, la comunicación entre españoles y chinos es cada vez más intensa. Por lo tanto, la otra intención de nuestro estudio es desarrollar algunos recursos para el aprendizaje de idiomas entre los españoles y los chinos. Para lograr nuestros objetivos de desarrollo del protocolo, primero establecemos un corpus paralelo español-chino y anotamos la información discursiva de todo el corpus. Luego evaluamos los resultados de la anotación siguiendo un método cualitativo para garantizar la alta calidad de los resultados de anotación. Por último, concluimos las similitudes y diferencias del discurso para hacer este protocolo. Con respecto al aprendizaje de lenguas entre el español y el chino, utilizamos completamente los marcadores discursivos (MD) anotados manualmente para desarrollar un módulo de preguntas y respuestas. En los últimos años, han habido pocos trabajos que comparen el español y el chino. Por lo tanto, este estudio de doctorado tiene como objetivo llenar parcialmente una brecha de conocimiento entre el estudio de las lenguas española y china.
Programa de doctorat en Traducció i Ciències del Llenguatge
Director i departament
da Cunha Fanego, Iria , Iruskieta, Mikel , Universitat Pompeu Fabra. Departament de Traducció i Ciències del llenguatge
Col·leccions
TDX

Mostra el registre complet

Using annotated discourse information of a RST Spanish-Chinese treebank for translation and language learning tasks

Using annotated discourse information of a RST Spanish-Chinese treebank for translation and language learning tasks

Data

Autories

Resum

Director i departament

Col·leccions