How to handle health-related small imbalanced data in machine learning?
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Rauschenberger, Maria
- dc.contributor.author Baeza Yates, Ricardo
- dc.date.accessioned 2023-03-24T07:24:44Z
- dc.date.available 2023-03-24T07:24:44Z
- dc.date.issued 2020
- dc.description.abstract When discussing interpretable machine learning results, researchers need to compare them and check for reliability, especially for health-related data. The reason is the negative impact of wrong results on a person, such as in wrong prediction of cancer, incorrect assessment of the COVID-19 pandemic situation, or missing early screening of dyslexia. Often only small data exists for these complex interdisciplinary research projects. Hence, it is essential that this type of research understands different methodologies and mindsets such as the Design Science Methodology, Human-Centered Design or Data Science approaches to ensure interpretable and reliable results. Therefore, we present various recommendations and design considerations for experiments that help to avoid over-fitting and biased interpretation of results when having small imbalanced data related to health. We also present two very different use cases: early screening of dyslexia and event prediction in multiple sclerosis.
- dc.format.mimetype application/pdf
- dc.identifier.citation Rauschenberger M, Baeza-Yates R. How to handle health-related small imbalanced data in machine learning? i-com. 2020;19(3):215-26. DOI: 10.1515/icom-2020-0018
- dc.identifier.doi http://dx.doi.org/10.1515/icom-2020-0018
- dc.identifier.issn 1618-162X
- dc.identifier.uri http://hdl.handle.net/10230/56343
- dc.language.iso eng
- dc.publisher De Gruyter
- dc.relation.ispartof i-com. 2020;19(3):215-26.
- dc.relation.isreferencedby https://github.com/Rauschii/smalldataguidelines
- dc.rights © De Gruyter Published version available at https://www.degruyter.com/document/doi/10.1515/icom-2020-0018/html http://dx.doi.org/10.1515/icom-2020-0018
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.subject.keyword Machine Learning
- dc.subject.keyword Human-Centered Design
- dc.subject.keyword HCD
- dc.subject.keyword interactive systems
- dc.subject.keyword health
- dc.subject.keyword small data
- dc.subject.keyword imbalanced data
- dc.subject.keyword over-fitting
- dc.subject.keyword variances
- dc.subject.keyword interpretable results
- dc.subject.keyword guidelines
- dc.title How to handle health-related small imbalanced data in machine learning?
- dc.type info:eu-repo/semantics/article
- dc.type.version info:eu-repo/semantics/publishedVersion