Katalog Plus
Bibliothek der Frankfurt UAS
Bald neuer Katalog: sichern Sie sich schon vorab Ihre persönlichen Merklisten im Nutzerkonto: Anleitung.
Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden.  Login für vollen Zugriff.

Application of information theoretic feature selection and machine learning methods for the development of genetic risk prediction models.

Title: Application of information theoretic feature selection and machine learning methods for the development of genetic risk prediction models.
Authors: Jalali-Najafabadi, Farideh; Stadler, Michael; Dand, Nick; Jadon, Deepak; Soomro, Mehreen; Ho, Pauline; Marzo-Ortega, Helen; Helliwell, Philip; Korendowych, Eleanor; Simpson, Michael A; Packham, Jonathan; Smith, Catherine H; Barker, Jonathan N; McHugh, Neil; Warren, Richard B; Barton, Anne; Bowes, John; BADBIR Study Group; BSTOP Study Group
Source: nlmid: 101563288 ; essn: 2045-2322
Publisher Information: Springer Nature; //doi.org/10.1038/s41598-021-00854-x
Publication Year: 2022
Collection: Apollo - University of Cambridge Repository
Subject Terms: Adolescent; Adult; Aged; 80 and over; Algorithms; Arthritis; Psoriatic; Child; Preschool; Cross-Sectional Studies; Genetic Predisposition to Disease; Humans; Infant; Newborn; Information Theory; Middle Aged; Prognosis; Risk Factors; Supervised Machine Learning; United Kingdom; Young Adult
Description: In view of the growth of clinical risk prediction models using genetic data, there is an increasing need for studies that use appropriate methods to select the optimum number of features from a large number of genetic variants with a high degree of redundancy between features due to linkage disequilibrium (LD). Filter feature selection methods based on information theoretic criteria, are well suited to this challenge and will identify a subset of the original variables that should result in more accurate prediction. However, data collected from cohort studies are often high-dimensional genetic data with potential confounders presenting challenges to feature selection and risk prediction machine learning models. Patients with psoriasis are at high risk of developing a chronic arthritis known as psoriatic arthritis (PsA). The prevalence of PsA in this patient group can be up to 30% and the identification of high risk patients represents an important clinical research which would allow early intervention and a reduction of disability. This also provides us with an ideal scenario for the development of clinical risk prediction models and an opportunity to explore the application of information theoretic criteria methods. In this study, we developed the feature selection and psoriatic arthritis (PsA) risk prediction models that were applied to a cross-sectional genetic dataset of 1462 PsA cases and 1132 cutaneous-only psoriasis (PsC) cases using 2-digit HLA alleles imputed using the SNP2HLA algorithm. We also developed stratification method to mitigate the impact of potential confounder features and illustrate that confounding features impact the feature selection. The mitigated dataset was used in training of seven supervised algorithms. 80% of data was randomly used for training of seven supervised machine learning methods using stratified nested cross validation and 20% was selected randomly as a holdout set for internal validation. The risk prediction models were then further validated in UK Biobank dataset ...
Document Type: article in journal/newspaper
File Description: application/pdf
Language: English
Relation: PMC8640070; https://www.repository.cam.ac.uk/handle/1810/333070
DOI: 10.17863/CAM.80494
Availability: https://www.repository.cam.ac.uk/handle/1810/333070; https://doi.org/10.17863/CAM.80494
Rights: Attribution 4.0 International ; https://creativecommons.org/licenses/by/4.0/
Accession Number: edsbas.C16DFA90
Database: BASE