CGTS: a site-clustering graph based tagSNP selection algorithm in genotype data
11 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

CGTS: a site-clustering graph based tagSNP selection algorithm in genotype data

-

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
11 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Recent studies have shown genetic variation is the basis of the genome-wide disease association research. However, due to the high cost on genotyping large number of single nucleotide polymorphisms (SNPs), it is essential to choose a small subset of informative SNPs (tagSNPs), which are able to capture most variation in a population, to represent the rest SNPs. Several methods have been proposed to find the minimum set of tagSNPs, but most of them still have some disadvantages such as information loss and block-partition limit. Results This paper proposes a new hybrid method named CGTS which combines the ideas of the clustering and the graph algorithms to select tagSNPs on genotype data. This method aims to maximize the number of the discarding nontagSNPs in the given set. CGTS integrates the information of the LD association and the genotype diversity using the site graphs, discards redundant SNPs using the algorithm based on these graph structures. The clustering algorithm is used to reduce the running time of CGTS. The efficiency of the algorithm and quality of solutions are evaluated on biological data and the comparisons with three popular selecting methods are shown in the paper. Conclusion Our theoretical analysis and experimental results show that our algorithm CGTS is not only more efficient than other methods but also can be get higher accuracy in tagSNP selection.

Informations

Publié par
Publié le 01 janvier 2009
Nombre de lectures 4
Langue English

Extrait

BMC Bioinformatics
BioMedCentral
Open Access Research CGTS: a site-clustering graph based tagSNP selection algorithm in genotype data Jun Wang, Maozu Guo* and Chunyu Wang
Address: Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin 150001, PR China Email: Jun Wang  wjking@hit.edu.cn; Maozu Guo*  maozuguo@hit.edu.en; Chunyu Wang  chunyu@hit.edu.cn * Corresponding author
fromThe Seventh Asia Pacific Bioinformatics Conference (APBC 2009) Beijing, China. 13–16 January 2009
Published: 30 January 2009 BMC Bioinformatics2009,10(Suppl 1):S71
doi:10.1186/1471-2105-10-S1-S71
<supplement><title><p>SelectedpapersfromtheSeventhAsia-PaciifcBioinformaticsConference(APBC2009)</p></title><editor>MichaelQZhang,MichaelSWatermanandXuegongZhang</editor><note>Research</note></supplement> This article is available from: http://www.biomedcentral.com/1471-2105/10/S1/S71 © 2009 Wang et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract Background:Recent studies have shown genetic variation is the basis of the genome-wide disease association research. However, due to the high cost on genotyping large number of single nucleotide polymorphisms (SNPs), it is essential to choose a small subset of informative SNPs (tagSNPs), which are able to capture most variation in a population, to represent the rest SNPs. Several methods have been proposed to find the minimum set of tagSNPs, but most of them still have some disadvantages such as information loss and block-partition limit. Results:This paper proposes a new hybrid method named CGTS which combines the ideas of the clustering and the graph algorithms to select tagSNPs on genotype data. This method aims to maximize the number of the discarding nontagSNPs in the given set. CGTS integrates the information of the LD association and the genotype diversity using the site graphs, discards redundant SNPs using the algorithm based on these graph structures. The clustering algorithm is used to reduce the running time of CGTS. The efficiency of the algorithm and quality of solutions are evaluated on biological data and the comparisons with three popular selecting methods are shown in the paper. Conclusion:Our theoretical analysis and experimental results show that our algorithm CGTS is not only more efficient than other methods but also can be get higher accuracy in tagSNP selection.
Background Recent studies show that the abundance of single nucle otide polymorphisms (SNPs) and haplotypes can provide the most complete information for genomewide associa tion studies. Through the analysis of SNPs and haplo types, most of the genetic variations among different people can be discovered. However, due to the excessive SNPs, which are about 10 million in the human genome
[13], it is costly to genotyping and studying all SNPs in a candidate region for a large number of individuals. Thus the SNP selecting strategy is proposed to find only a subset of SNPs, which are called tagSNPs or tagging SNPs, to rep resent the whole SNP set. These tagSNPs have high linkage disequilibrium (LD) values with the rest SNPs [4], and the genetic variation information they have are enough to support the further study, such like disease association
Page 1 of 11 (page number not for citation purposes)
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents