Kuncoro, AdhigunaBallesteros, MiguelKong, LingpengDyer, ChrisSmith, Noah A.2024-02-192024-02-192016Kuncoro A, Ballesteros M, Kong L, Dyer C, Smith NA. Distilling an ensemble of greedy dependency parsers into one MST parser. In: Su J, Duh K, Carreras X, editors. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing; 2016 Nov 1-5; Austin, Texas. [Texas]: Association for Computational Linguistics; 2016. p. 1744-53. DOI: 10.18653/v1/d16-1180http://hdl.handle.net/10230/59142Comunicació presentada a la 2016 Conference on Empirical Methods in Natural Language Processing, celebrada de l'1 al 5 de novembre de 2016 a Austin, Texas.We introduce two first-order graph-based dependency parsers achieving a new state of the art. The first is a consensus parser built from an ensemble of independently trained greedy LSTM transition-based parsers with different random initializations. We cast this approach as minimum Bayes risk decoding (under the Hamming cost) and argue that weaker consensus within the ensemble is a useful signal of difficulty or ambiguity. The second parser is a “distillation” of the ensemble into a single model. We train the distillation parser using a structured hinge loss objective with a novel cost that incorporates ensemble uncertainty estimates for each possible attachment, thereby avoiding the intractable crossentropy computations required by applying standard distillation objectives to problems with structured outputs. The first-order distillation parser matches or surpasses the state of the art on English, Chinese, and German.application/pdfengACL materials are Copyright © 1963–2023 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.Estadística bayesianaXarxes neuronals (Informàtica)Algorismes de grafsDistilling an ensemble of greedy dependency parsers into one MST parserinfo:eu-repo/semantics/conferenceObjecthttp://dx.doi.org/10.18653/v1/d16-1180info:eu-repo/semantics/openAccess