Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden. Login für vollen Zugriff.

Evaluation of ChatGPT as a supplementary tool for pituitary adenomas: An observational study based on simulated consultations

Title:	Evaluation of ChatGPT as a supplementary tool for pituitary adenomas: An observational study based on simulated consultations
Authors:	Chen, Yuhui; Chen, Yuyang; Chen, Li; Feng, Tianshun; Wang, Shousen
Source:	Medicine ; volume 104, issue 46, page e45928 ; ISSN 0025-7974 1536-5964
Publisher Information:	Ovid Technologies (Wolters Kluwer Health)
Publication Year:	2025
Description:	Chat Generative Pretrained Transformer (ChatGPT), a large language model developed by OpenAI, has shown potential in healthcare communication and patient education. However, its performance in specialized medical domains, such as pituitary adenomas (PAs), remains unclear. Therefore, this study aimed to evaluate the reliability and consistency of ChatGPT in answering PA-related questions. We hypothesized that ChatGPT would demonstrate high reliability in responding to general patient-oriented queries but lower reliability for specialized clinical questions. A total of 256 PA-related questions were collected from patients and families, clinical practice guidelines, and medical question banks. Each question was input into ChatGPT (GPT-4, March 2025 version), and the generated responses were independently reviewed by 2 senior neurosurgeons. Any discrepancies in their assessments were resolved by a third neurosurgeon with over 30 years of clinical experience. Responses were categorized as completely correct, partially correct but usable, partially correct, or incorrect. Responses rated as completely correct or partially correct but usable were considered reliable. Consistency was assessed based on the stability of response quality across similar question types. Comparisons were made by question type (general vs professional) and source using univariate analysis. Among the 256 responses, 143 (55.8%) were completely correct, 68 (26.6%) were partially correct but usable, 19 (7.4%) were partially correct, and 26 (10.2%) were incorrect. Overall, 82.4% of the responses were considered reliable, and 68.4% demonstrated consistency. Reliability was significantly higher for general questions than for professional ones (95.0% vs 78.6%, OR = 5.182, 95% CI: 1.545–17.378, P = .003), and for guideline-derived questions compared to question bank-derived ones (100.0% vs 75.7%, OR = 1.321, 95% CI: 1.214–1.437, P = .017). Differences in consistency across subgroups were not statistically significant. ChatGPT exhibits high reliability ...
Document Type:	article in journal/newspaper
Language:	English
DOI:	10.1097/md.0000000000045928
DOI:	10.1097/MD.0000000000045928
Availability:	https://doi.org/10.1097/md.0000000000045928; https://journals.lww.com/10.1097/MD.0000000000045928
Rights:	http://creativecommons.org/licenses/by/4.0/
Accession Number:	edsbas.2E42D489
Database:	BASE