Welcome to the UPF Digital Repository

How to handle health-related small imbalanced data in machine learning?

Show simple item record

dc.contributor.author Rauschenberger, Maria
dc.contributor.author Baeza Yates, Ricardo
dc.date.accessioned 2023-03-24T07:24:44Z
dc.date.available 2023-03-24T07:24:44Z
dc.date.issued 2020
dc.identifier.citation Rauschenberger M, Baeza-Yates R. How to handle health-related small imbalanced data in machine learning? i-com. 2020;19(3):215-26. DOI: 10.1515/icom-2020-0018
dc.identifier.issn 1618-162X
dc.identifier.uri http://hdl.handle.net/10230/56343
dc.description.abstract When discussing interpretable machine learning results, researchers need to compare them and check for reliability, especially for health-related data. The reason is the negative impact of wrong results on a person, such as in wrong prediction of cancer, incorrect assessment of the COVID-19 pandemic situation, or missing early screening of dyslexia. Often only small data exists for these complex interdisciplinary research projects. Hence, it is essential that this type of research understands different methodologies and mindsets such as the Design Science Methodology, Human-Centered Design or Data Science approaches to ensure interpretable and reliable results. Therefore, we present various recommendations and design considerations for experiments that help to avoid over-fitting and biased interpretation of results when having small imbalanced data related to health. We also present two very different use cases: early screening of dyslexia and event prediction in multiple sclerosis.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.publisher De Gruyter
dc.relation.ispartof i-com. 2020;19(3):215-26.
dc.relation.isreferencedby https://github.com/Rauschii/smalldataguidelines
dc.rights © De Gruyter Published version available at https://www.degruyter.com/document/doi/10.1515/icom-2020-0018/html http://dx.doi.org/10.1515/icom-2020-0018
dc.title How to handle health-related small imbalanced data in machine learning?
dc.type info:eu-repo/semantics/article
dc.identifier.doi http://dx.doi.org/10.1515/icom-2020-0018
dc.subject.keyword Machine Learning
dc.subject.keyword Human-Centered Design
dc.subject.keyword HCD
dc.subject.keyword interactive systems
dc.subject.keyword health
dc.subject.keyword small data
dc.subject.keyword imbalanced data
dc.subject.keyword over-fitting
dc.subject.keyword variances
dc.subject.keyword interpretable results
dc.subject.keyword guidelines
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.type.version info:eu-repo/semantics/publishedVersion

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account

Statistics

In collaboration with Compliant to Partaking