Katalog Plus
Bibliothek der Frankfurt UAS
Bald neuer Katalog: sichern Sie sich schon vorab Ihre persönlichen Merklisten im Nutzerkonto: Anleitung.
Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden.  Login für vollen Zugriff.

Navigating severe class imbalance in population cohort data

Title: Navigating severe class imbalance in population cohort data
Authors: Fieggen, J; Segal, B; Walker, E; Thakur, A; Butler, C; Clifton, D; Clifton, L
Publisher Information: IEEE
Publication Year: 2025
Collection: Oxford University Research Archive (ORA)
Description: Class imbalance is a major challenge in predictive modelling for rare disease outcomes, particularly in large-scale population cohorts. Traditional machine learning models often struggle with imbalanced datasets, leading to biased performance metrics and poor generalisability. This study systematically evaluates multiple approaches to mitigate class imbalance in predicting Multiple myeloma using proteomic and clinical data from UK Biobank. We compare standard classification models (XGBoost and logistic regression) with synthetic resampling (SMOTE), anomaly detection techniques (isolation forests, local outlier factors, one-class SVM, and autoencoders), and a transformer-based foundation model (TabPFN), using standard classification performance metrics. Our results indicate that anomaly detection models generalise better than conventional classifiers (XGBoost and logistic regression), while SMOTE fails to improve, and may actively worsen, predictive performance. To address the precision-sensitivity trade-off, we introduce a sequential XGBoost ensemble classifier (SeqXGB) that prioritises high precision over sensitivity to minimise false positive predictions. Compared with a single XGBoost model, the SeqXGB approach successfully reduces false positives (420 vs 9), but significantly limits sensitivity (0.70 vs 0.15) in held-out test data. Our findings highlight that no single method is universally optimal for addressing class imbalance; rather, model selection should be guided by clinical application, balancing the risks of false positives and false negatives.
Document Type: conference object
Language: English
Relation: https://doi.org/10.1109/EMBC58623.2025.11254293
DOI: 10.1109/EMBC58623.2025.11254293
Availability: https://doi.org/10.1109/EMBC58623.2025.11254293; https://ora.ox.ac.uk/objects/uuid:ee8cb8c4-3868-4772-b83e-62af7c340f83
Rights: info:eu-repo/semantics/openAccess ; CC Attribution (CC BY)
Accession Number: edsbas.714F1126
Database: BASE