Graph-structured data is ubiquitous in the field of Natural Language Processing.
For instance, directed acyclic graphs are used in semantic and syntactic dependency
representations. Thus, several applications in NLP use graph-structured data, such
as sequence labeling, neural machine translation and relation extraction. Most approaches
first linearize graphs and then apply off-the-shelf algorithms, which leave
out important information of node connectivity. It is clear that the state-of-the
art ...
Graph-structured data is ubiquitous in the field of Natural Language Processing.
For instance, directed acyclic graphs are used in semantic and syntactic dependency
representations. Thus, several applications in NLP use graph-structured data, such
as sequence labeling, neural machine translation and relation extraction. Most approaches
first linearize graphs and then apply off-the-shelf algorithms, which leave
out important information of node connectivity. It is clear that the state-of-the
art falls short in offering graph-to-graph transduction models. The motivation of
this thesis is to expand the limited literature in tree-to-tree learning and provide
an instrument capable of treebank transformations, that could enrich the corpora
available in NLP. The starting point is previous work on Gated Graph Neural Networks
[1], which we modified to output sequences per node as opposed to sequences
per graph. We also modified the general architecture by using two GGNNs, one
responsible for predicting heads and the other one for predicting edge types. For
testing, we used the Stanford dependency treebank and the Matsumoto dependency
treebank. These treebanks are substantially different especially in the granularity of
their dependency tagsets. The proposed model achieved over 95% Labeled Attachment
Score (LAS) when converting from one treebank to the other. As compared
to the baseline, which ignores graph data, it achieved an average improvement of
16.42% in LAS, which highlights the value of incorporating graph-structured data.
We also showed that feeding the network with each node’s position within the sentence
yielded a 2.32% LAS improvement. Thus, including sequential data proved
to be beneficial. We concluded that GGNNs are capable of tree-to-tree transduction
and that this research is a step forward in bringing attention to graph-to-graph
transduction in NLP.
+