| Title: |
Enhancing Patient Understanding of Perianal Fistula MRI Findings Using ChatGPT: A Randomized, Single Centre Study |
| Authors: |
Anand, Easan; Ghersin, Itai; Lingam, Gita; Devlin, Katie; Pelly, Theo; Singer, Daniel; Tomlinson, Chris; Munro, Robin EJ; Capstick, Rachel; Antoniou, Anna; Hart, Ailsa L; Tozer, Phil; Sahnan, Kapil; Lung, Phillip |
| Source: |
Diagnostics , 16 (1) , Article 72. (2026) |
| Publisher Information: |
MDPI AG |
| Publication Year: |
2026 |
| Collection: |
University College London: UCL Discovery |
| Subject Terms: |
artificial intelligence; Crohn’s disease; cryptoglandular fistula; large language models; magnetic resonance imaging; patient communication; perianal fistula |
| Description: |
Background/Objectives: Large Language Models (LLMs) may help translate complex Magnetic Resonance Imaging (MRI) fistula reports into accessible, patient-friendly summaries. This study evaluated the clinical utility, safety, and patient acceptability of Generative Pre-trained Transformer (GPT-4o) in generating such reports. / / Methods: A three-phase study was conducted at a single centre. Phase I involved prompt engineering and pilot testing of GPT-4o outputs for feasibility. Phase II assessed 250 consecutive MRI fistula reports from September 2024 to November 2024, each reviewed by a multi-disciplinary panel to determine hallucinations and thematic content. Phase III randomised patients to review either a simple or complex fistula case, each containing an original report and an Artificial Intelligence (AI)-generated summary (order randomised, origin blinded), and rate readability, trustworthiness, usefulness and comprehension. / / Results: Sixteen patients participated in Phase I pilot testing. In Phase II, hallucinations occurred in 11% of outputs, with unverified recommendations also identified. In Phase III, 61 patients (mean age 48, 41% female) evaluated paired original and AI-generated summaries. AI summaries scored significantly higher for readability, comprehension, and usefulness than original reports (all p < 0.001), with equivalent trust ratings. Mean Flesch-Kincaid scores were markedly higher for AI-generated summaries (66 vs. 26; p < 0.001). Clinicians highlighted improved anatomical structuring and accessible language, but emphasised risks of inaccuracies. A revised template incorporating Multi-Disciplinary Team (MDT)-focused action points and a lay summary section was co-developed. / / Conclusions: LLMs can enhance the readability and patient understanding of complex MRI reports but remain limited by hallucinations and inconsistent terminology. Safe implementation requires structured oversight, domain-specific refinement, and clinician validation. Future development should prioritise ... |
| Document Type: |
article in journal/newspaper |
| File Description: |
application/pdf |
| Language: |
English |
| Relation: |
https://discovery.ucl.ac.uk/id/eprint/10219498/ |
| Availability: |
https://discovery.ucl.ac.uk/id/eprint/10219498/1/diagnostics-16-00072.pdf; https://discovery.ucl.ac.uk/id/eprint/10219498/ |
| Rights: |
open |
| Accession Number: |
edsbas.8B6DEB3D |
| Database: |
BASE |