| Title: |
Signal or noise? Evaluating commonly used attribution methods for explaining deep neural networks in electrocardiogram classification |
| Authors: |
Arends, Bauke K.O.; van Amsterdam, Wouter A.C.; van der Harst, Pim; van Smeden, Maarten; van Es, René; van de Leur, Rutger R.; Team Onderzoek; Datascience; Cancer; Onderzoek Algemene Cardiologie; Circulatory Health; Infection & Immunity; Arts Assistenten Cardiologie |
| Publication Year: |
2026 |
| Subject Terms: |
Attribution methods; Computer vision; Electrocardiogram; Explainable artificial intelligence; Heatmap; Cardiology and Cardiovascular Medicine |
| Description: |
Aims Attribution-based explainability methods are widely used in electrocardiogram (ECG) analysis to interpret predictions from ‘black-box’ deep neural networks (DNNs). To be useful in clinical applications, attribution methods must produce explanations that are both clear and reflective of the model’s inner workings. This study evaluates 12 attribution methods in DNN-based ECG classification. Methods and results We analysed 12 attribution methods using a dataset of 873 710 median beat ECGs spanning nine diagnostic classes. Methods were applied to convolutional neural network-based models trained for ECG classification. Performance was evaluated across four experiments: inter-method similarity, self-consistency, dependence on model weights, and ability to identify features important for model inference. All task models achieved an area under the receiver operating curve above 0.95. Attribution methods demonstrated low correlation and high variability across inter-method comparisons. Self-consistency across random model initializations was moderate for most methods (mean correlation 0.41–0.65). Randomizing model weights led to rapid loss of correlation, although some methods did not converge to zero. Perturbation of input data revealed differences in how well attribution methods identified features relevant to model performance. Conclusion Attribution methods demonstrated limited reliability, instability across model variants and incomplete dependence on learned parameters, constraining their utility in high-stakes settings such as healthcare. These findings suggest that attribution techniques should be used cautiously and supported by task-specific sanity checks. Approaches grounded in rigorous validation, inherently interpretable modelling or counterfactual explanations may better support clinically meaningful insight. |
| Document Type: |
article in journal/newspaper |
| File Description: |
application/pdf |
| Language: |
English |
| ISSN: |
2634-3916 |
| Relation: |
https://dspace.library.uu.nl/handle/1874/469649 |
| Availability: |
https://dspace.library.uu.nl/handle/1874/469649 |
| Rights: |
info:eu-repo/semantics/OpenAccess |
| Accession Number: |
edsbas.D23CD75A |
| Database: |
BASE |