| Title: |
DOP015 Quantifying Inter-Rater Reliability of Mucosal Features in Ulcerative Colitis Endoscopy |
| Authors: |
Fernandez, P Blasco; Bossuyt, P; Daperno, M; Kopylov, U; Bouhnik, Y; Karmiris, K; Benmansour, F; Gutierrez Becker, B; Fraessle, S; Stimpel, B; Levitte, S; Gomariz, A |
| Source: |
Journal of Crohn’s and Colitis ; volume 20, issue Supplement_1 ; ISSN 1873-9946 1876-4479 |
| Publisher Information: |
Oxford University Press (OUP) |
| Publication Year: |
2026 |
| Description: |
Background Severity assessment in ulcerative colitis (UC), central to clinical trials and practice, relies on the Mayo Clinic Endoscopic Subscore (MCES). However, MCES is a composite score, and its interpretation is subjective. This study quantitatively characterizes the inter-rater reliability (IRR) of both MCES and other granular mucosal features derived from the Ulcerative Colitis Endoscopic Index of Severity (UCEIS)[1], which may offer more prognostic value. We present results from a large-scale annotation campaign with five expert gastroenterologists. Methods We randomly sampled 1,200 quality-controlled frames from 80 unique endoscopy videos from 40 patients in the Phase III Etrolizumab trial[2], using a sampling strategy stratified by anatomic location. Each frame was independently annotated by two experts (from a pool of five) for MCES and five other mucosal categories, with all labels detailed in Figure 1. IRR was quantified using Cohen’s Kappa (κ). Results IRR varied significantly (Figure 1). A key finding was the consistently high reliability for ‘normal’ or ‘absent’ labels across categories (e.g., MCES ‘0’ κ = 0.71, Bleeding ‘None’ κ = 0.70, Ulcers ‘None’ κ = 0.71), indicating that disagreement arises primarily from grading the severity of observed pathology. For MCES, intermediate labels ‘1’ (κ = 0.34) and ‘2’ (κ = 0.45) had low reliability. Similarly, Ulcers and Erosions showed low granular reliability (e.g., ‘Erosions’ κ = 0.45; ‘Superficial Ulcers’ κ = 0.28). Erythema showed the lowest overall reliability (κ = 0.38), with its ‘Mild’ label demonstrating near-random agreement (κ = 0.12). The intermediate ‘Patchy / Decreased’ vascular pattern (κ = 0.46) was also less reliable than its ‘Normal’ (κ = 0.65) and ‘Complete loss’ (κ = 0.71) counterparts. For Bleeding, the ‘Biopsy’ label was identified as a key confounder, as it was difficult to distinguish from other bleeding types in static frames. The very low κ for ‘Large’ pseudopolyps (κ=-0.01) is attributed to its low prevalence (1.2%). ... |
| Document Type: |
article in journal/newspaper |
| Language: |
English |
| DOI: |
10.1093/ecco-jcc/jjaf231.052 |
| Availability: |
https://doi.org/10.1093/ecco-jcc/jjaf231.052; https://academic.oup.com/ecco-jcc/article-pdf/20/Supplement_1/jjaf231.052/66498456/jjaf231.052.pdf |
| Rights: |
https://academic.oup.com/pages/standard-publication-reuse-rights |
| Accession Number: |
edsbas.5C7E36A6 |
| Database: |
BASE |