| Title: |
Simplifying causal gene identification in GWAS loci |
| Authors: |
Schipper, Marijn; Ulirsch, Jacob; Posthuma, Danielle; Ripke, Stephan; Heilbron, Karl |
| Contributors: |
Won, Sungho; Alexander von Humboldt-Stiftung; HORIZON EUROPE Framework Programme; Deutsche Forschungsgemeinschaft; Nederlandse Organisatie voor Wetenschappelijk Onderzoek; H2020 European Research Council; National Institute of Mental Health |
| Source: |
PLOS Genetics ; volume 22, issue 3, page e1012079 ; ISSN 1553-7404 |
| Publisher Information: |
Public Library of Science (PLoS) |
| Publication Year: |
2026 |
| Collection: |
PLOS Publications (via CrossRef) |
| Description: |
Genome-wide association studies (GWAS) help to identify disease-linked genetic variants, but pinpointing the most likely causal genes in GWAS loci remains challenging. Existing GWAS gene prioritization tools are powerful but often use complex black box models trained on datasets containing biases. Here, we used a data-driven approach to construct a truth set of causal genes in 200 GWAS loci. We found that a simple logistic regression model performed as well as a more complex XGBoost model, and that many commonly-used gene prioritization features could be removed without meaningfully affecting performance ( e.g. , expression quantitative trait locus colocalization and Mendelian randomization). We present CALDERA, a gene prioritization tool that uses a logistic regression model and uses just four input features. In independent benchmarking datasets of resolved GWAS loci, CALDERA achieved state-of-the-art performance in comparison with other methods (FLAMES, L2G, and cS2G). CALDERA outputs causal gene probabilities for all genes in a given GWAS locus and we show that these probabilities are well-calibrated. Applying CALDERA to 93 UK Biobank traits, we predicted 11,956 putative causal genes, potentially resolving up to 52% of loci. Overall, CALDERA provides a powerful solution for prioritizing potentially causal genes in GWAS loci that minimizes the data processing required to construct input features and generates an easily-interpretable output score. |
| Document Type: |
article in journal/newspaper |
| Language: |
English |
| DOI: |
10.1371/journal.pgen.1012079 |
| Availability: |
https://doi.org/10.1371/journal.pgen.1012079; https://dx.plos.org/10.1371/journal.pgen.1012079 |
| Rights: |
http://creativecommons.org/licenses/by/4.0/ |
| Accession Number: |
edsbas.85ED57A7 |
| Database: |
BASE |