Title |
TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data
|
---|---|
Published in |
BMC Medical Genomics, September 2018
|
DOI | 10.1186/s12920-018-0402-6 |
Pubmed ID | |
Authors |
Readman Chiu, Ka Ming Nip, Justin Chu, Inanc Birol |
Abstract |
RNA-seq is a powerful and cost-effective technology for molecular diagnostics of cancer and other diseases, and it can reach its full potential when coupled with validated clinical-grade informatics tools. Despite recent advances in long-read sequencing, transcriptome assembly of short reads remains a useful and cost-effective methodology for unveiling transcript-level rearrangements and novel isoforms. One of the major concerns for adopting the proven de novo assembly approach for RNA-seq data in clinical settings has been the analysis turnaround time. To address this concern, we have developed a targeted approach to expedite assembly and analysis of RNA-seq data. Here we present our Targeted Assembly Pipeline (TAP), which consists of four stages: 1) alignment-free gene-level classification of RNA-seq reads using BioBloomTools, 2) de novo assembly of individual targets using Trans-ABySS, 3) alignment of assembled contigs to the reference genome and transcriptome with GMAP and BWA and 4) structural and splicing variant detection using PAVFinder. We show that PAVFinder is a robust gene fusion detection tool when compared to established methods such as Tophat-Fusion and deFuse on simulated data of 448 events. Using the Leucegene acute myeloid leukemia (AML) RNA-seq data and a set of 580 COSMIC target genes, TAP identified a wide range of hallmark molecular anomalies including gene fusions, tandem duplications, insertions and deletions in agreement with published literature results. Moreover, also in this dataset, TAP captured AML-specific splicing variants such as skipped exons and novel splice sites reported in studies elsewhere. Running time of TAP on 100-150 million read pairs and a 580-gene set is one to 2 hours on a 48-core machine. We demonstrated that TAP is a fast and robust RNA-seq variant detection pipeline that is potentially amenable to clinical applications. TAP is available at http://www.bcgsc.ca/platform/bioinfo/software/pavfinder. |
Twitter Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United Kingdom | 1 | 50% |
Unknown | 1 | 50% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 2 | 100% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Unknown | 30 | 100% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 8 | 27% |
Researcher | 5 | 17% |
Student > Master | 4 | 13% |
Student > Doctoral Student | 2 | 7% |
Student > Bachelor | 2 | 7% |
Other | 3 | 10% |
Unknown | 6 | 20% |
Readers by discipline | Count | As % |
---|---|---|
Biochemistry, Genetics and Molecular Biology | 10 | 33% |
Agricultural and Biological Sciences | 6 | 20% |
Medicine and Dentistry | 4 | 13% |
Unspecified | 1 | 3% |
Immunology and Microbiology | 1 | 3% |
Other | 1 | 3% |
Unknown | 7 | 23% |