Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden. Login für vollen Zugriff.

Large language model (LLM)-based agentic artificial intelligence tool streamlines research processes in biomarker studies: a proof of concept

Title:	Large language model (LLM)-based agentic artificial intelligence tool streamlines research processes in biomarker studies: a proof of concept
Authors:	Ye, Y; Colombo, M; Meessen, J; Hajizadeh, N; De Vore, L; Krovvidi, S; Meresh, S; Ghaem Sigarchian, H; Jakka, S; Tyl, B
Source:	European Heart Journal - Digital Health ; volume 7, issue Supplement_1 ; ISSN 2634-3916
Publisher Information:	Oxford University Press (OUP)
Publication Year:	2026
Description:	Background/Introduction AI tools utilizing large language models (LLMs) can significantly accelerate literature reviews by automating repetitive tasks and analyses. However, initial evaluations have been limited to title and abstract screenings. Purpose This study evaluates the full-text screening performance of an agentic AI tool leveraging LLM technology to accurately identify relevant publications for a systematic review of circulating biomarkers in heart failure with reduced ejection fraction (HFrEF). Methods Within the iCARE4CVD public private partnership, we developed a knowledge model combined with an agentic AI tool that screened the full text of 5523 publications based on predefined selection criteria. The inclusion and exclusion criteria were decomposed into 136 specific tasks, each addressed by individual LLM agents using a Retrieval-Augmented Generation (RAG) approach. This process involved segmenting the full text into manageable chunks, vectorizing them, and using RAG to identify the most relevant segments for analysis by the LLM agents. Results were aggregated for automated validation of unusual responses by a critique LLM agent. The response informed then the final inclusion or exclusion decisions. We evaluated the performance of five LLMs based on privacy, openness, and effectiveness (precision and recall) to select the most accurate model. The AI tool was trained and validated against human-reviewed papers, arbitrated by a senior reviewer, with 197 papers used for training and 97 for validation (Fig 1). Performance metrics included sensitivity, specificity, false positive and negative rates, and Cohen’s κ to measure agreement between LLM and human reviewers. Results Our findings demonstrate significant improvement in sensitivity and specificity across the training (batches 1 and 2) and validation phases. In batch 1, sensitivity was 77.8% and specificity was 62.5%. These metrics improved in batch 2 to 81% and 79% respectively. Subsequently, the model settings were updated to prioritize ...
Document Type:	article in journal/newspaper
Language:	English
DOI:	10.1093/ehjdh/ztaf143.008
Availability:	https://doi.org/10.1093/ehjdh/ztaf143.008; https://academic.oup.com/ehjdh/article-pdf/7/Supplement_1/ztaf143.008/66378590/ztaf143.008.pdf
Rights:	https://creativecommons.org/licenses/by-nc-nd/4.0/
Accession Number:	edsbas.4920DAA
Database:	BASE