| Title: |
Using Chao’s Estimator as a Stopping Criterion for Technology-Assisted Review |
| Authors: |
Bron, Michiel; van der Heijden, Peter G. M.; Feelders, Ad; Siebes, Arno; Sub Algorithmic Data Analysis; Leerstoel Heijden; Methodology and Statistics for the Behavioural and Social Sciences |
| Publication Year: |
2025 |
| Subject Terms: |
active learning; datasets; information retrieval; machine learning; population size estimation; stopping criteria; technology-assisted review; Information Systems; General Business,Management and Accounting; Computer Science Applications |
| Description: |
Technology-Assisted Review aims to reduce the human effort required for screening processes such as abstract screening for Systematic Literature Reviews. Human reviewers label documents as relevant or irrelevant during this process, while the system incrementally updates a prediction model based on the reviewers’ previous decisions. After each model update, the system proposes new documents it deems relevant, to prioritize relevant documents over irrelevant ones. A stopping criterion is necessary to guide users in stopping the review process to minimize the number of missed relevant documents and the number of read irrelevant documents. In this article, we propose and evaluate a new ensemble-based Active Learning strategy and a stopping criterion based on Chao’s Population Size Estimator that estimates the prevalence of relevant documents in the dataset. Our simulation study demonstrates that this criterion performs well on several datasets and is compared to other methods presented in the literature. |
| Document Type: |
article in journal/newspaper |
| File Description: |
application/pdf |
| Language: |
English |
| ISSN: |
1046-8188 |
| Relation: |
https://dspace.library.uu.nl/handle/1874/478889 |
| Availability: |
https://dspace.library.uu.nl/handle/1874/478889 |
| Rights: |
info:eu-repo/semantics/OpenAccess |
| Accession Number: |
edsbas.F1CBFB89 |
| Database: |
BASE |