Genome-wide selection of tag SNPs using multiple-marker correlation.
| Title: | Genome-wide selection of tag SNPs using multiple-marker correlation. |
|---|---|
| Authors: | Hao K; Algorithm and Data Analysis, Affymetrix, Inc., 3420 Central Expressway, Santa Clara, California, USA. ke_hao@163.com |
| Source: | Bioinformatics (Oxford, England) [Bioinformatics] 2007 Dec 01; Vol. 23 (23), pp. 3178-84. Date of Electronic Publication: 2007 Nov 15. |
| Publication Type: | Journal Article |
| Language: | English |
| Journal Info: | Publisher: Oxford University Press Country of Publication: England NLM ID: 9808944 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1367-4811 (Electronic) Linking ISSN: 13674803 NLM ISO Abbreviation: Bioinformatics Subsets: MEDLINE |
| Imprint Name(s): | Original Publication: Oxford : Oxford University Press, c1998- |
| MeSH Terms: | Algorithms* ; Expressed Sequence Tags*; Chromosome Mapping/*methods ; Genetic Markers/*genetics ; Linkage Disequilibrium/*genetics ; Polymorphism, Single Nucleotide/*genetics ; Sequence Analysis, DNA/*methods; Base Sequence ; Molecular Sequence Data ; Statistics as Topic |
| Abstract: | Motivations: The tag SNP approach is a valuable tool in whole genome association studies, and a variety of algorithms have been proposed to identify the optimal tag SNP set. Currently, most tag SNP selection is based on two-marker (pairwise) linkage disequilibrium (LD). Recent literature has shown that multiple-marker LD also contains useful information that can further increase the genetic coverage of the tag SNP set. Thus, tag SNP selection methods that incorporate multiple-marker LD are expected to have advantages in terms of genetic coverage and statistical power.; Results: We propose a novel algorithm to select tag SNPs in an iterative procedure. In each iteration loop, the SNP that captures the most neighboring SNPs (through pair-wise and multiple-marker LD) is selected as a tag SNP. We optimize the algorithm and computer program to make our approach feasible on today's typical workstations. Benchmarked using HapMap release 21, our algorithm outperforms standard pair-wise LD approach in several aspects. (i) It improves genetic coverage (e.g. by 7.2% for 200 K tag SNPs in HapMap CEU) compared to its conventional pair-wise counterpart, when conditioning on a fixed tag SNP number. (ii) It saves genotyping costs substantially when conditioning on fixed genetic coverage (e.g. 34.1% saving in HapMap CEU at 90% coverage). (iii) Tag SNPs identified using multiple-marker LD have good portability across closely related ethnic groups and (iv) show higher statistical power in association tests than those selected using conventional methods.; Availability: A computer software suite, multiTag, has been developed based on this novel algorithm. The program is freely available by written request to the author at ke_hao@merck.com |
| Substance Nomenclature: | 0 (Genetic Markers) |
| Entry Date(s): | Date Created: 20071117 Date Completed: 20071221 Latest Revision: 20091104 |
| Update Code: | 20260130 |
| DOI: | 10.1093/bioinformatics/btm496 |
| PMID: | 18006555 |
| Database: | MEDLINE |
Journal Article