A deeper exploration of the standard PB-SMT approach to text simplification and its evaluation

Štajner, Sanja; Béchara, Hannah; Saggion, Horacio

A deeper exploration of the standard PB-SMT approach to text simplification and its evaluation

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Štajner, Sanja
dc.contributor.author Béchara, Hannah
dc.contributor.author Saggion, Horacio
dc.date.accessioned 2018-11-06T18:04:16Z
dc.date.available 2018-11-06T18:04:16Z
dc.date.issued 2015
dc.description Comunicació presentada a: the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing del 26 al 31 de juliol de 2015 a Beijing, Xina.ca
dc.description.abstract In the last few years, there has been a growing number of studies addressing the Text Simplification (TS) task as a monolingual machine translation (MT) problem which translates from ‘original’ to ‘simple’ language. Motivated by those results, we investigate the influence of quality vs quantity of the training data on the effectiveness of such a MT approach to text simplification. We conduct 40 experiments on the aligned sentences from English Wikipedia and Simple English Wikipedia, controlling for: (1) the similarity between the original and simplified sentences in the training and development datasets, and (2) the sizes of those datasets. The results suggest that in the standard PB-SMT approach to text simplification the quality of the datasets has a greater impact on the system performance. Additionally, we point out several important differences between cross-lingual MT and monolingual MT used in text simplification, and show that BLEU is not a good measure of system performance in text simplification task.en
dc.description.sponsorship The research described in this paper was partially funded by the project SKATER-UPFTALN (TIN2012-38584-C06-03), Ministerio de Econom´ıa y Competitividad, Secretar´ıa de Estado de Investigaci´on, Desarrollo e Innovaci´on, Spain, and the project ABLE-TO-INCLUDE (CIP-ICTPSP- 2013-7/621055). Hannah B´echara is supported by the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme FP7/2007-2013/ under REA grant agreement no. 31747.en
dc.format.mimetype application/pdf
dc.identifier.citation Štajner S, Béchara H, Saggion H. A Deeper exploration of the standard PB-SMT approach to text simplification and its evaluation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers); 2015 Jul 26-31; Beijing, China. Stroudsburg: ACL; 2015. p. 823-8.
dc.identifier.isbn 978-194164373-0
dc.identifier.uri http://hdl.handle.net/10230/35710
dc.language.iso eng
dc.publisher ACL (Association for Computational Linguistics)
dc.relation.ispartof Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers); 2015 Jul 26-31; Beijing, China. Stroudsburg: ACL; 2015. p. 823-8.
dc.relation.projectID info:eu-repo/grantAgreement/ES/3PN/TIN2012-38584-C06-03
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.title A deeper exploration of the standard PB-SMT approach to text simplification and its evaluation
dc.type info:eu-repo/semantics/conferenceObject
dc.type.version info:eu-repo/semantics/publishedVersion

Col·leccions

Congressos (Departament de Tecnologies de la Informació i les Comunicacions)