Analysis of inter-transcriber consistency in the Cat_ToBI prosodic labeling system

A set of tools to analyze inconsistencies observed in a Cat_ToBI labeling experiment are presented. We formalize and use the metrics that are commonly used in inconsistency tests. The metrics are systematically applied to analyze the robustness of every symbol and every pair of transcribers. The results reveal agreement rates for this study that are comparable to previous ToBI inter-reliability tests. The inter-transcriber confusion rates are transformed into distance matrices to use multidimensional scaling for visualizing the confusion between the different ToBI symbols and the disagreement between the raters. Potential different labeling criteria are identified and subsets of symbols that are candidates to be fused are proposed.
The authors are indebted to other researchers of the Grup d'Estudis de Prosòdia GrEP (Departament de Traducció i Ciències del Llenguatge, Universitat Pompeu Fabra) who contributed constructively to the discussion of this work while it was underway. Particular thanks are due to the subjects who performed the annotations of Catalan utterances (J. Borràs-Comes, V. Crespo-Sendra, R. Sichel-Bazin, E. Estebas-Vilaplana and the postgraduate students CA, CR, EP and GV) for their valuable comments and information.

This research has been funded by six research grants awarded by the Spanish Ministerio de Ciencia e Innovación, namely the Glissando project FFI2008-04982-C003-02, FFI2008-04982-C003-03, FFI2011-29559-C02-01, FFI2011-29559-C02-02, FFI2009-07648/FILO and CONSOLIDER-INGENIO 2010 Programme CSD2007-00012, and by a grant awarded by the Generalitat de Catalunya to the Grup d'Estudis de Prosòdia (2009SGR-701).
