A Semi-supervised approach for gender identification
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Soler Company, Juanca
- dc.contributor.author Wanner, Leoca
- dc.date.accessioned 2017-09-06T14:03:58Z
- dc.date.available 2017-09-06T14:03:58Z
- dc.date.issued 2016
- dc.description Comunicació presentada a: LREC 2016, Tenth International Conference on Language Resources and Evaluation, celebrada del 23 al 28 de maig de 2016 a Portorož, Eslovènia.ca
- dc.description.abstract In most of the research studies on Author Profiling, large quantities of correctly labeled data are used to train the models. However, this does not reflect the reality in forensic scenarios: in practical linguistic forensic investigations, the resources that are available to profile the author of a text are usually scarce. To pay tribute to this fact, we implemented a Semi-Supervised Learning variant of the k nearest neighbors algorithm that uses small sets of labeled data and a larger amount of unlabeled data to classify the authors of texts by gender (man vs woman). We describe the enriched KNN algorithm and show that the use of unlabeled instances improves the accuracy of our gender identification model. We also present a feature set that facilitates the use of a very small number of instances, reaching accuracies higher than 70% with only 113 instances to train the model. It is also shown that the algorithm performs equally well using publicly available data.en
- dc.description.sponsorship The presentation of this work was partially supported by the ICT PhD program of Universitat Pompeu Fabra through a travel grant.en
- dc.format.mimetype application/pdf
- dc.identifier.citation Soler-Company J, Wanner L. A Semi-supervised approach for gender identification. In: Calzolari N, Choukri K, Declerck T, Goggi S, Grobelnik M, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S. LREC 2016, Tenth International Conference on Language Resources and Evaluation; 2016 23-28 May; Portorož, Slovenia. [Portorož]: LREC, 2016. p. 1282-7.
- dc.identifier.uri http://hdl.handle.net/10230/32753
- dc.language.iso eng
- dc.publisher LRECca
- dc.relation.ispartof Calzolari N, Choukri K, Declerck T, Goggi S, Grobelnik M, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S. LREC 2016, Tenth International Conference on Language Resources and Evaluation; 2016 23-28 May; Portorož, Slovenia. [Place unknown]: LREC, 2017. p. 1282-7.
- dc.rights © The European Language Resources Association. The LREC 2016 Proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.rights.uri https://creativecommons.org/licenses/by-nc/4.0/
- dc.subject.keyword Author profilingen
- dc.subject.keyword Gender identificationen
- dc.subject.keyword Semi supervised learningen
- dc.subject.keyword Text classificationen
- dc.subject.keyword Machine learningen
- dc.title A Semi-supervised approach for gender identificationca
- dc.type info:eu-repo/semantics/conferenceObject
- dc.type.version info:eu-repo/semantics/publishedVersion