How to handle health-related small imbalanced data in machine learning?

dc.contributor.authorRauschenberger, Maria
dc.contributor.authorBaeza Yates, Ricardo
dc.date.accessioned2023-03-24T07:24:44Z
dc.date.available2023-03-24T07:24:44Z
dc.date.issued2020
dc.description.abstractWhen discussing interpretable machine learning results, researchers need to compare them and check for reliability, especially for health-related data. The reason is the negative impact of wrong results on a person, such as in wrong prediction of cancer, incorrect assessment of the COVID-19 pandemic situation, or missing early screening of dyslexia. Often only small data exists for these complex interdisciplinary research projects. Hence, it is essential that this type of research understands different methodologies and mindsets such as the Design Science Methodology, Human-Centered Design or Data Science approaches to ensure interpretable and reliable results. Therefore, we present various recommendations and design considerations for experiments that help to avoid over-fitting and biased interpretation of results when having small imbalanced data related to health. We also present two very different use cases: early screening of dyslexia and event prediction in multiple sclerosis.
dc.format.mimetypeapplication/pdf
dc.identifier.citationRauschenberger M, Baeza-Yates R. How to handle health-related small imbalanced data in machine learning? i-com. 2020;19(3):215-26. DOI: 10.1515/icom-2020-0018
dc.identifier.doihttp://dx.doi.org/10.1515/icom-2020-0018
dc.identifier.issn1618-162X
dc.identifier.urihttp://hdl.handle.net/10230/56343
dc.language.isoeng
dc.publisherDe Gruyter
dc.relation.ispartofi-com. 2020;19(3):215-26.
dc.relation.isreferencedbyhttps://github.com/Rauschii/smalldataguidelines
dc.rights© De Gruyter Published version available at https://www.degruyter.com/document/doi/10.1515/icom-2020-0018/html http://dx.doi.org/10.1515/icom-2020-0018
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.subject.keywordMachine Learning
dc.subject.keywordHuman-Centered Design
dc.subject.keywordHCD
dc.subject.keywordinteractive systems
dc.subject.keywordhealth
dc.subject.keywordsmall data
dc.subject.keywordimbalanced data
dc.subject.keywordover-fitting
dc.subject.keywordvariances
dc.subject.keywordinterpretable results
dc.subject.keywordguidelines
dc.titleHow to handle health-related small imbalanced data in machine learning?
dc.typeinfo:eu-repo/semantics/article
dc.type.versioninfo:eu-repo/semantics/publishedVersion

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Baeza-Yates_Ico_Hand.pdf
Size:
570.78 KB
Format:
Adobe Portable Document Format