| Title: |
Assessing the quality of prediction models in health care using the Prediction model Risk Of Bias ASsessment Tool (PROBAST):an evaluation of its use and practical application |
| Authors: |
Kaul, Tabea; Damen, Johanna A A; Wynants,Laure; Van Calster,Ben; van Smeden, Maarten; Hooft, Lotty; Moons, Karel G M; Team 5a; Epi Methoden Team 5; Datascience; Infection & Immunity; Epidemiology & Health Economics; JC onderzoeksprogramma Methodology; Epi Methoden; Cancer |
| Publication Year: |
2025 |
| Subject Terms: |
Journal Article |
| Description: |
BACKGROUND AND OBJECTIVES: Since 2019, the Prediction model Risk Of Bias ASsessment Tool (PROBAST; www.probast.org) has supported methodological quality assessments of prediction model studies. Most prediction model studies are rated with a "High" risk of bias (ROB) and researchers report low interrater reliability (IRR) using PROBAST. We aimed to (1) assess the IRR of PROBAST ratings between assessors of the same study and understand reasons for discrepancies, (2) determine which items contribute most to domain-level ROB ratings, and (3) explore the impact of consensus meetings. STUDY DESIGN AND SETTING: We used PROBAST assessments from a systematic review of diagnostic and prognostic COVID-19 prediction models as a case study. Assessors included international experts in prediction model studies or their reviews. We assessed IRR using prevalence-adjusted bias-adjusted kappa (PABAK) before consensus meetings, examined bias ratings per domain-level ROB judgments, and evaluated the impact of consensus meetings by identifying rating changes after discussion. RESULTS: We analyzed 2167 PROBAST assessments from 27 assessor pairs covering 760 prediction models: 384 developments, 242 validations, and 134 mixed assessments (including both). The IRR using PABAK was higher for overall ROB judgments (development: 0.82 [0.76; 0.89]; validation: 0.78 [0.68; 0.88]) compared to domain- and item-level judgments. Some PROBAST items frequently contributed to domain-level ROB judgments, eg, 3.5 Outcome blinding and 4.1 Sample size. Consensus discussions mainly led to item-level and never to overall ROB rating changes. CONCLUSION: Within this case study, PROBAST assessments received high IRR at the overall ROB level, with some variation at item- and domain-level. To reduce variability, PROBAST assessors should standardize item- and domain-level judgments and hold well-structured consensus meetings between assessors of the same study. PLAIN LANGUAGE SUMMARY: The Prediction model Risk Of Bias ASsessment Tool (PROBAST; www.probast.org) ... |
| Document Type: |
article in journal/newspaper |
| File Description: |
text/plain |
| Language: |
English |
| ISSN: |
0895-4356 |
| Relation: |
https://dspace.library.uu.nl/handle/1874/461038 |
| Availability: |
https://dspace.library.uu.nl/handle/1874/461038 |
| Rights: |
info:eu-repo/semantics/OpenAccess |
| Accession Number: |
edsbas.2B432556 |
| Database: |
BASE |