Fast splice site detection using information content and feature reduction
12 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Fast splice site detection using information content and feature reduction

-

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
12 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Accurate identification of splice sites in DNA sequences plays a key role in the prediction of gene structure in eukaryotes. Already many computational methods have been proposed for the detection of splice sites and some of them showed high prediction accuracy. However, most of these methods are limited in terms of their long computation time when applied to whole genome sequence data. Results In this paper we propose a hybrid algorithm which combines several effective and informative input features with the state of the art support vector machine (SVM). To obtain the input features we employ information content method based on Shannon's information theory, Shapiro's score scheme, and Markovian probabilities. We also use a feature elimination scheme to reduce the less informative features from the input data. Conclusion In this study we propose a new feature based splice site detection method that shows improved acceptor and donor splice site detection in DNA sequences when the performance is compared with various state of the art and well known methods.

Informations

Publié par
Publié le 01 janvier 2008
Nombre de lectures 1
Langue English

Extrait

BMC Bioinformatics
BioMedCentral
Open Access Research Fast splice site detection using information content and feature reduction 1 12 AKMA Baten*, SK Halgamugeand BCH Chang
1 Address: BiomechanicalEngineering Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University 2 of Melbourne, Victoria 3010, Australia andInstitute of Plant and Microbial Biology, Academia Sinica, Taiwan Email: AKMA Baten*  a.baten@pgrad.unimelb.edu.au; SK Halgamuge  saman@unimelb.edu.au; BCH Chang  bchang1@gate.sinica.edu.tw * Corresponding author
fromAsia Pacific Bioinformatics Network (APBioNet) Seventh International Conference on Bioinformatics (InCoB2008) Taipei, Taiwan. 20–23 October 2008
Published: 12 December 2008 BMC Bioinformatics2008,9(Suppl 12):S8
doi:10.1186/1471-2105-9-S12-S8
<supplement> <title> <p>Seventh International Conference on Bioinformatics (InCoB2008)</p> </title> <editor>Shoba Ranganathan, Wen-Lian Hsu, Ueng-Cheng Yang and Tin Wee Tan</editor> <note>Proceedings</note> </supplement> This article is available from: http://www.biomedcentral.com/1471-2105/9/S12/S8 © 2008 Baten et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract Background:Accurate identification of splice sites in DNA sequences plays a key role in the prediction of gene structure in eukaryotes. Already many computational methods have been proposed for the detection of splice sites and some of them showed high prediction accuracy. However, most of these methods are limited in terms of their long computation time when applied to whole genome sequence data. Results:In this paper we propose a hybrid algorithm which combines several effective and informative input features with the state of the art support vector machine (SVM). To obtain the input features we employ information content method based on Shannon's information theory, Shapiro's score scheme, and Markovian probabilities. We also use a feature elimination scheme to reduce the less informative features from the input data. Conclusion:In this study we propose a new feature based splice site detection method that shows improved acceptor and donor splice site detection in DNA sequences when the performance is compared with various state of the art and well known methods.
Background Over the past decades, the scientific community has expe rienced a major growth in numbers of sequence data. With the emergence of novel and efficient sequencing technology, DNA sequencing is now much faster. Sequencing of several genomes including the human genome have been completed successfully. This massive amount of sequence data demands sophisticated tools for the analysis of data.
Identifying genes accurately is one of the most important and challenging tasks in bioinformatics and it requires the prediction of the complete gene structure. Identification of splice sites is the core component of eukaryotic gene finding algorithms. Their success depends on the precise identification of the exonintron structure and the splice sites. Most of the eukaryotic protein coding genes are char acterized by exons and introns. Exons are the protein cod ing portion of a gene and they are segmented with intervening sequences of introns. The border between an
Page 1 of 12 (page number not for citation purposes)
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents