| Title: |
Complex models, marginal benefits--a multi-centre development and validation study of early warning scores across 2.16 million patient admissions addressing intercurrent medical interventions |
| Authors: |
Katsiferis, A; Scheidwasser, N; Nguyen, T; Lange, T; MP, Khurana; PB, Nielsen; KK, Iversen; CS, Meyhoff; EK, Aasvang; Moelgaard, J; AG, Zucco; TV, Varga; Bhatt, S |
| Source: |
Katsiferis , A , Scheidwasser , N , Nguyen , T , Lange , T , MP , K , PB , N , KK , I , CS , M , EK , A , Moelgaard , J , AG , Z , TV , V & Bhatt , S 2025 ' Complex models, marginal benefits--a multi-centre development and validation study of early warning scores across 2.16 million patient admissions addressing intercurrent medical interventions ' medRxiv . https://doi.org/10.1101/2025.10.12.25337794 |
| Publisher Information: |
medRxiv |
| Publication Year: |
2025 |
| Collection: |
University of Copenhagen: Research / Forskning ved Københavns Universitet |
| Description: |
Background The National Early Warning Score (NEWS) is a nationally recommended, clinically implemented system, used to prevent patient deterioration. While numerous studies have compared predictive models for clinical deterioration, large-scale evaluations of their potential clinical utility remain undetermined. Here, we compared NEWS’s clinical net benefit against simplified scoring rules and modern machine learning to determine whether simpler approaches are sufficient or if complex models provide meaningful advantages in clinical practice. Methods We included fifteen Danish hospitals with over 2·16 million patient admissions representing 829 610 unique patients over five years (2018 to 2023). We compared NEWS against both simpler and more complex approaches for predicting 24-hour mortality: NEWS-Light (NEWS without blood pressure and temperature), DEWS (NEWS-Light with age and sex; DEWS denotes the Demographic Early Warning Score), and a model based on eXtreme Gradient Boosting (XGB-EWS) incorporating vital signs, demographics, laboratory markers, plus medical history embeddings extracted using sentence transformers. We used propensity score weighting to mitigate intervention bias and evaluated performance using Area Under the Receiver Operating Characteristic Curve (AUC), calibration, and net benefit. Findings XGB-EWS achieved the highest discrimination (AUC 0·932, 95% Confidence Interval [0·929-0·936]), followed by DEWS (0·908 [0·904-0·912]), NEWS (0·902, [0·898-0·906]), and NEWS-Light (0·879, [0·873-0·885]). Decision curve analysis showed maximum net benefit differences of 1·8 additional correct mortality identifications per 10 000 patients between XGB-EWS and NEWS, and 1·7 per 10 000 between NEWS and NEWS-Light, across the evaluated risk thresholds. Interpretation Machine learning approaches provided marginal clinical utility improvements over traditional scoring systems, with NEWS-Light showing small performance decrements compared to full NEWS. The clinical significance of these differences must be ... |
| Document Type: |
report |
| File Description: |
application/pdf |
| Language: |
English |
| DOI: |
10.1101/2025.10.12.25337794 |
| Availability: |
https://researchprofiles.ku.dk/da/publications/0884886c-270b-4393-b0d9-495ac8917d80; https://doi.org/10.1101/2025.10.12.25337794; https://curis.ku.dk/ws/files/539588102/2025.10.12.25337794v1.full.pdf |
| Rights: |
info:eu-repo/semantics/restrictedAccess |
| Accession Number: |
edsbas.D42B1CAC |
| Database: |
BASE |