↓ Skip to main content

Inferring bona fide transfrags in RNA-Seq derived-transcriptome assemblies of non-model organisms

Overview of attention for article published in BMC Bioinformatics, February 2015
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • Good Attention Score compared to outputs of the same age (79th percentile)
  • High Attention Score compared to outputs of the same age and source (82nd percentile)

Mentioned by

twitter
8 X users
wikipedia
1 Wikipedia page

Citations

dimensions_citation
8 Dimensions

Readers on

mendeley
39 Mendeley
citeulike
1 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Inferring bona fide transfrags in RNA-Seq derived-transcriptome assemblies of non-model organisms
Published in
BMC Bioinformatics, February 2015
DOI 10.1186/s12859-015-0492-5
Pubmed ID
Authors

Stanley Kimbung Mbandi, Uljana Hesse, Peter van Heusden, Alan Christoffels

Abstract

De novo transcriptome assembly of short transcribed fragments (transfrags) produced from sequencing-by-synthesis technologies often results in redundant datasets with differing levels of unassembled, partially assembled or mis-assembled transcripts. Post-assembly processing intended to reduce redundancy typically involves reassembly or clustering of assembled sequences. However, these approaches are mostly based on common word heuristics and often create clusters of biologically unrelated sequences, resulting in loss of unique transfrags annotations and propagation of mis-assemblies. Here, we propose a structured framework that consists of a few steps in pipeline architecture for Inferring Functionally Relevant Assembly-derived Transcripts (IFRAT). IFRAT combines 1) removal of identical subsequences, 2) error tolerant CDS prediction, 3) identification of coding potential, and 4) complements BLAST with a multiple domain architecture annotation that reduces non-specific domain annotation. We demonstrate that independent of the assembler, IFRAT selects bona fide transfrags (with CDS and coding potential) from the transcriptome assembly of a model organism without relying on post-assembly clustering or reassembly. The robustness of IFRAT is inferred on RNA-Seq data of Neurospora crassa assembled using de Bruijn graph-based assemblers, in single (Trinity and Oases-25) and multiple (Oases-Merge and additive or pooled) k-mer modes. Single k-mer assemblies contained fewer transfrags compared to the multiple k-mer assemblies. However, Trinity identified a comparable number of predicted coding sequence and gene loci to Oases pooled assembly. IFRAT selects bona fide transfrags representing over 94% of cumulative BLAST-derived functional annotations of the unfiltered assemblies. Between 4-6% are lost when orphan transfrags are excluded and this represents only a tiny fraction of annotation derived from functional transference by sequence similarity. The median length of bona fide transfrags ranged from 1.5kb (Trinity) to 2kb (Oases), which is consistent with the average coding sequence length in fungi. The fraction of transfrags that could be associated with gene ontology terms ranged from 33-50%, which is also high for domain based annotation. We showed that unselected transfrags were mostly truncated and represent sequences from intronic, untranslated (5' and 3') regions and non-coding gene loci. IFRAT simplifies post-assembly processing providing a reference transcriptome enriched with functionally relevant assembly-derived transcripts for non-model organism.

X Demographics

X Demographics

The data shown below were collected from the profiles of 8 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 39 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Spain 2 5%
Netherlands 1 3%
Argentina 1 3%
Unknown 35 90%

Demographic breakdown

Readers by professional status Count As %
Researcher 15 38%
Student > Ph. D. Student 9 23%
Student > Master 3 8%
Student > Doctoral Student 2 5%
Student > Bachelor 2 5%
Other 4 10%
Unknown 4 10%
Readers by discipline Count As %
Agricultural and Biological Sciences 14 36%
Biochemistry, Genetics and Molecular Biology 8 21%
Computer Science 4 10%
Engineering 3 8%
Immunology and Microbiology 2 5%
Other 2 5%
Unknown 6 15%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 8. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 29 August 2022.
All research outputs
#4,503,295
of 24,350,163 outputs
Outputs from BMC Bioinformatics
#1,657
of 7,519 outputs
Outputs of similar age
#51,623
of 259,155 outputs
Outputs of similar age from BMC Bioinformatics
#24
of 136 outputs
Altmetric has tracked 24,350,163 research outputs across all sources so far. Compared to these this one has done well and is in the 81st percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 7,519 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.5. This one has done well, scoring higher than 77% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 259,155 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 79% of its contemporaries.
We're also able to compare this research output to 136 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 82% of its contemporaries.