Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden. Login für vollen Zugriff.

Overcoming the widespread flaws in the annotation of vertebrate selenoprotein genes in public databases

Title:	Overcoming the widespread flaws in the annotation of vertebrate selenoprotein genes in public databases
Authors:	Ticó, Max; Sullivan, Emerson; Guigó, Roderic; Mariotti, Marco
Contributors:	Friedberg, Iddo; Ministerio de Ciencia, Innovación y Universidades
Source:	PLOS Computational Biology ; volume 22, issue 1, page e1013885 ; ISSN 1553-7358
Publisher Information:	Public Library of Science (PLoS)
Publication Year:	2026
Collection:	PLOS Publications (via CrossRef)
Description:	Genome annotations provide the essential framework for genomic analyses, capturing our current knowledge of gene structure and function as inferred from computational predictions and experimental evidence. Even as automated annotation pipelines become more sophisticated, their accuracy in representing unconventional gene expression events remains largely untested. Here, we address this gap by examining the most common form of translational recoding: the insertion of selenocysteine (Sec), a non-canonical amino acid incorporated into selenoproteins, oxidoreductase enzymes carrying essential roles in redox homeostasis. Sec insertion occurs in response to UGA, normally interpreted as stop codon, but recoded in selenoprotein mRNAs. Owing to the dual function of UGA, the identification of selenoprotein genes poses a challenge. We show that the vertebrate selenoprotein genes are widely misannotated in major public databases. Only 11% and 5% of selenoprotein genes are well annotated in Ensembl and NCBI GenBank, respectively, due to the lack of dedicated selenoprotein annotation pipelines. In most cases (81% and 84%), overlapping flawed annotations are present which lack the Sec-encoding UGA. In contrast, NCBI RefSeq employs a dedicated selenoprotein pipeline, yet with some shortcomings: its selenoprotein annotations are correct in 77% of cases, and most errors affect families with a C-terminal Sec residue. We argue that selenoproteins must be correctly annotated in public databases and that must occur via automated pipelines, to keep the pace with genome sequencing. To facilitate this task, we present a new version of Selenoprofiles, an homology based tool for selenoprotein prediction that produces predictions with accuracy comparable to manual curation, and can be easily deployed and integrated in existing annotation pipelines.
Document Type:	article in journal/newspaper
Language:	English
DOI:	10.1371/journal.pcbi.1013885
Availability:	https://doi.org/10.1371/journal.pcbi.1013885; https://dx.plos.org/10.1371/journal.pcbi.1013885
Rights:	http://creativecommons.org/licenses/by/4.0/
Accession Number:	edsbas.8B0834AC
Database:	BASE