Exploring neural paraphrasing to improve fluency of rule-based generation

Enllaç permanent

Descripció

  • Resum

    Data-to-text generation is a greatly significant task in the field of natural language processing. FORGe, a typical rule-based generator, has excellent performance in the task of mapping from RDF triples to text. As this generator is strongly reliant on rules, despite the fact that the generated text is with highly semantic accuracy, and faithful to the input RDF triples, the generated sentence could be somewhat rigid in terms of fluency. This thesis would like to explore a possible way to improve the fluency of output text generated by FORGe. A neural paraphrase method is suggested to act as a post-processing method to achieve our goal. This method can control the tradeoff between two models, the fluency and semantic similarity model, and the lexical and/or syntactic diversity model by setting a parameter . In this way, it can not only make the output semantically consistent with the input, but also diversify the lexical and syntactic items of the sentence. In order to verify our idea, we designed and conducted related experiments. Furthermore, the performance of deep-learning based generator OSU Neural NLG, which also performs well in English D2T tasks, is considered as a baseline. Since we need to evaluate all the generated text, an automatic evaluation method is used in order to ensure an uniform evaluation criterion on these outputs in terms of semantic accuracy. Based on the result of this automatic evaluation method, we also conducted manual verification to make the evaluation result more reliable and have reference value. Through our experimental results, we believe that applying this neural paraphrase method as a post-processing stage is promising in improving the fluency of the output text generated by FORGe.
  • Descripció

    Tutors: Leo Wanner i Simon Mille
    Treball fi de màster de: Master in Intelligent Interactive Systems
  • Mostra el registre complet