↓ Skip to main content

Distinguishing low frequency mutations from RT-PCR and sequence errors in viral deep sequencing data

Overview of attention for article published in BMC Genomics, March 2015
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (84th percentile)
  • High Attention Score compared to outputs of the same age and source (87th percentile)

Mentioned by

twitter
16 X users
facebook
1 Facebook page

Citations

dimensions_citation
46 Dimensions

Readers on

mendeley
104 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Distinguishing low frequency mutations from RT-PCR and sequence errors in viral deep sequencing data
Published in
BMC Genomics, March 2015
DOI 10.1186/s12864-015-1456-x
Pubmed ID
Authors

Richard J Orton, Caroline F Wright, Marco J Morelli, David J King, David J Paton, Donald P King, Daniel T Haydon

Abstract

RNA viruses have high mutation rates and exist within their hosts as large, complex and heterogeneous populations, comprising a spectrum of related but non-identical genome sequences. Next generation sequencing is revolutionising the study of viral populations by enabling the ultra deep sequencing of their genomes, and the subsequent identification of the full spectrum of variants within the population. Identification of low frequency variants is important for our understanding of mutational dynamics, disease progression, immune pressure, and for the detection of drug resistant or pathogenic mutations. However, the current challenge is to accurately model the errors in the sequence data and distinguish real viral variants, particularly those that exist at low frequency, from errors introduced during sequencing and sample processing, which can both be substantial. We have created a novel set of laboratory control samples that are derived from a plasmid containing a full-length viral genome with extremely limited diversity in the starting population. One sample was sequenced without PCR amplification whilst the other samples were subjected to increasing amounts of RT and PCR amplification prior to ultra-deep sequencing. This enabled the level of error introduced by the RT and PCR processes to be assessed and minimum frequency thresholds to be set for true viral variant identification. We developed a genome-scale computational model of the sample processing and NGS calling process to gain a detailed understanding of the errors at each step, which predicted that RT and PCR errors are more likely to occur at some genomic sites than others. The model can also be used to investigate whether the number of observed mutations at a given site of interest is greater than would be expected from processing errors alone in any NGS data set. After providing basic sample processing information and the site's coverage and quality scores, the model utilises the fitted RT-PCR error distributions to simulate the number of mutations that would be observed from processing errors alone. These data sets and models provide an effective means of separating true viral mutations from those erroneously introduced during sample processing and sequencing.

X Demographics

X Demographics

The data shown below were collected from the profiles of 16 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 104 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Australia 2 2%
Netherlands 1 <1%
Brazil 1 <1%
Unknown 100 96%

Demographic breakdown

Readers by professional status Count As %
Researcher 29 28%
Student > Ph. D. Student 22 21%
Student > Master 14 13%
Student > Bachelor 5 5%
Student > Doctoral Student 4 4%
Other 10 10%
Unknown 20 19%
Readers by discipline Count As %
Agricultural and Biological Sciences 32 31%
Biochemistry, Genetics and Molecular Biology 24 23%
Immunology and Microbiology 10 10%
Medicine and Dentistry 7 7%
Mathematics 2 2%
Other 6 6%
Unknown 23 22%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 10. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 19 December 2023.
All research outputs
#3,581,772
of 25,516,314 outputs
Outputs from BMC Genomics
#1,218
of 11,274 outputs
Outputs of similar age
#44,314
of 278,680 outputs
Outputs of similar age from BMC Genomics
#35
of 275 outputs
Altmetric has tracked 25,516,314 research outputs across all sources so far. Compared to these this one has done well and is in the 85th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 11,274 research outputs from this source. They receive a mean Attention Score of 4.8. This one has done well, scoring higher than 89% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 278,680 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 84% of its contemporaries.
We're also able to compare this research output to 275 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 87% of its contemporaries.