Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden. Login für vollen Zugriff.

Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools

Title:	Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools
Authors:	Reese, Justin T; Chimirri, Leonardo; Bridges, Yasemin; Danis, Daniel; Caufield, J Harry; Gargano, Michael A; Kroll, Carlo; Schmeder, Andrew; Liu, Fengchen; Wissink, Kyran; McMurry, Julie A; Graefe, Adam SL; Niyonkuru, Enock; Korn, Daniel R; Casiraghi, Elena; Valentini, Giorgio; Jacobsen, Julius OB; Haendel, Melissa; Smedley, Damian; Mungall, Christopher J; Robinson, Peter N
Source:	European Journal of Human Genetics
Publisher Information:	eScholarship, University of California
Publication Year:	2026
Collection:	University of California: eScholarship
Subject Terms:	31 Biological Sciences (for-2020); 3105 Genetics (for-2020); Rare Diseases (rcdc); 4.1 Discovery and preclinical testing of markers and technologies (hrcs-rac); 0604 Genetics (for); 1103 Clinical Sciences (for); Genetics & Heredity (science-metrix); 3202 Clinical sciences (for-2020)
Subject Geographic:	1 - 7
Description:	Large language models (LLMs) show promise in supporting differential diagnosis, but their performance is challenging to evaluate due to the unstructured nature of their responses, and their accuracy compared to existing diagnostic tools is not well characterized. To assess the current capabilities of LLMs to diagnose genetic diseases, we benchmarked these models on 5213 previously published case reports using the Phenopacket Schema, the Human Phenotype Ontology and Mondo disease ontology. Prompts generated from each phenopacket were sent to seven LLMs, including four generalist models and three LLMs specialized for medical applications. The same phenopackets were used as input to a widely used diagnostic tool, Exomiser, in phenotype-only mode. The best LLM ranked the correct diagnosis first in 23.6% of cases, whereas Exomiser did so in 35.5% of cases. While the performance of LLMs for supporting differential diagnosis has been improving, it has not reached the level of commonly used traditional bioinformatics tools. Future research is needed to determine the best approach to incorporate LLMs into diagnostic pipelines.
Document Type:	article in journal/newspaper
Language:	unknown
Relation:	qt64v8j4b5; https://escholarship.org/uc/item/64v8j4b5
DOI:	10.1038/s41431-026-02054-5
Availability:	https://escholarship.org/uc/item/64v8j4b5; https://doi.org/10.1038/s41431-026-02054-5
Rights:	CC-BY
Accession Number:	edsbas.7D337EE
Database:	BASE