UMARS: Un-MAppable Reads Solution

biomed - Li Sung-Chou , Chan Wen-Ching , Lai Chun-Hung , Tsai Kuo-Wang , Hsu Chun-Nan , Jou Yuh-Shan , Chen Hua-Chien , Chen Chun-Hong , Lin , Lin Wen-Chang

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

10 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Un-MAppable Reads Solution (UMARS) is a user-friendly web service focusing on retrieving valuable information from sequence reads that cannot be mapped back to reference genomes. Recently, next-generation sequencing (NGS) technology has emerged as a powerful tool for generating high-throughput sequencing data and has been applied to many kinds of biological research. In a typical analysis, adaptor-trimmed NGS reads were first mapped back to reference sequences, including genomes or transcripts. However, a fraction of NGS reads failed to be mapped back to the reference sequences. Such un-mappable reads are usually imputed to sequencing errors and discarded without further consideration. Methods We are investigating possible biological relevance and possible sources of un-mappable reads. Therefore, we developed UMARS to scan for virus genomic fragments or exon-exon junctions of novel alternative splicing isoforms from un-mappable reads. For mapping un-mappable reads, we first collected viral genomes and sequences of exon-exon junctions. Then, we constructed UMARS pipeline as an automatic alignment interface. Results By demonstrating the results of two UMARS alignment cases, we show the applicability of UMARS. We first showed that the expected EBV genomic fragments can be detected by UMARS. Second, we also detected exon-exon junctions from un-mappable reads. Further experimental validation also ensured the authenticity of the UMARS pipeline. The UMARS service is freely available to the academic community and can be accessed via http://musk.ibms.sinica.edu.tw/UMARS/ . Conclusions In this study, we have shown that some un-mappable reads are not caused by sequencing errors. They can originate from viral infection or transcript splicing. Our UMARS pipeline provides another way to examine and recycle the un-mappable reads that are commonly discarded as garbage.

Informations

Publié par	biomed
Publié le	01 janvier 2011
Nombre de lectures	19
Langue	English
Poids de l'ouvrage	1 Mo

Extrait

Liet al.BMC Bioinformatics2011,12(Suppl 1):S9 http://www.biomedcentral.com/14712105/12/S1/S9

R E S E A R C H

Open Access

UMARS: UnMAppable Reads Solution 1,2,3†1,2,4†33 3 4,5 SungChou Li , WenChing Chan , ChunHung Lai , KuoWang Tsai , ChunNan Hsu , YuhShan Jou , 6 7 1,3* HuaChien Chen , ChunHong Chen , Wenchang Lin

FromThe Ninth Asia Pacific Bioinformatics Conference (APBC 2011) Inchon, Korea. 1114 January 2011

Abstract Background:UnMAppable Reads Solution (UMARS) is a userfriendly web service focusing on retrieving valuable information from sequence reads that cannot be mapped back to reference genomes. Recently, nextgeneration sequencing (NGS) technology has emerged as a powerful tool for generating highthroughput sequencing data and has been applied to many kinds of biological research. In a typical analysis, adaptortrimmed NGS reads were first mapped back to reference sequences, including genomes or transcripts. However, a fraction of NGS reads failed to be mapped back to the reference sequences. Such unmappable reads are usually imputed to sequencing errors and discarded without further consideration. Methods:We are investigating possible biological relevance and possible sources of unmappable reads. Therefore, we developed UMARS to scan for virus genomic fragments or exonexon junctions of novel alternative splicing isoforms from unmappable reads. For mapping unmappable reads, we first collected viral genomes and sequences of exonexon junctions. Then, we constructed UMARS pipeline as an automatic alignment interface. Results:By demonstrating the results of two UMARS alignment cases, we show the applicability of UMARS. We first showed that the expected EBV genomic fragments can be detected by UMARS. Second, we also detected exonexon junctions from unmappable reads. Further experimental validation also ensured the authenticity of the UMARS pipeline. The UMARS service is freely available to the academic community and can be accessed via http://musk.ibms.sinica.edu.tw/UMARS/. Conclusions:In this study, we have shown that some unmappable reads are not caused by sequencing errors. They can originate from viral infection or transcript splicing. Our UMARS pipeline provides another way to examine and recycle the unmappable reads that are commonly discarded as garbage.

Background Biomedical research has been greatly accelerated by the advances in sequencing technologies, especially genomic research. Recently, nextgeneration sequencing (NGS) technology, including Roche 454, Illumina GA and ABI SOLiD platforms, has emerged as a powerful tool for generating highthroughput sequencing data. Systematic evaluation revealed that these three platforms could possess high sequencing sensitivity because of the large number of reads obtained [1]. Therefore, NGS

* Correspondence: wenlin@ibms.sinica.edu.tw †Contributed equally 1 Institute of Biomedical Informatics, National YangMing University, Taipei, Taiwan Full list of author information is available at the end of the article

technology has been applied in many studies, including transcriptome profiling [24], SNP identification [5,6], genome sequencing and resequencing [7,8], biomarker detection [9], and metagenomics [10,11]. NGS technol ogy was also applied in miRNA identification and profil ing studies. Morin and colleagues identified 104 novel human miRNA genes and made a list of miRNAs differ entially expressed between embryo cell libraries [12]. Glazov discovered 449 new chicken miRNAs and 39 mir trons [13]. In addition, Wheeler not only sequenced miR NAs from several metazoan genomes but also studied miRNA’s evolution status [14]. In a typical analysis pipeline, the generated NGS sequence reads are first subject to adaptor trimming and then mapping back to reference sequences, including

© 2011 Li et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Univers
Ebooks
Livres audio
Presse
Podcasts
BD
Documents

UMARS: Un-MAppable Reads Solution

YouScribe

Le catalogue

Le service

Les conditions