A Corpus of Spanish clinical records annotated for abbreviation
identification

Citació

Aguado M, Bel N. A Corpus of Spanish clinical records annotated for abbreviation identification. Procesamiento del Lenguaje Natural. 2022;(68):99-109. DOI: 10.26342/2022-68-7

Enllaç permanent

Descripció

Resum
With the deployment of Electronic Health Records, much effort is being devoted to the development of Natural Language Processing tools that convert information described in these clinical records into structured data to be exploited. Clinical records main characteristic is that they are free text. They are normally written under pressure as memory notes and contain a high number of abbreviations that are an issue for automatic processing. In this article we present the IULA Spanish Clinical Records Corpus annotated for abbreviation identification.
Con la implementación de las historias clínicas electrónicas, se están dedicando muchos esfuerzos al desarrollo de herramientas de procesamiento del lenguaje natural que convierten la información descrita en estos registros clínicos en datos estructurados para ser explotados. La principal característica de las historias clínicas es que son texto libre. Normalmente se escriben deprisa, como notas de memoria y contienen un gran número de abreviaturas que son un problema para su procesamiento automático. En este artículo presentamos el Corpus de historias clínicas españolas del IULA, anotado para la identificación de abreviaturas.
DOI
http://dx.doi.org/10.26342/2022-68-7
Col·leccions
Articles (Departament de Traducció i Ciències del Llenguatge)

Fitxers