↓ Skip to main content

Alternate-locus aware variant calling in whole genome sequencing

Overview of attention for article published in Genome Medicine, December 2016
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (88th percentile)
  • Above-average Attention Score compared to outputs of the same age and source (56th percentile)

Mentioned by

twitter
29 X users

Citations

dimensions_citation
17 Dimensions

Readers on

mendeley
57 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Alternate-locus aware variant calling in whole genome sequencing
Published in
Genome Medicine, December 2016
DOI 10.1186/s13073-016-0383-z
Pubmed ID
Authors

Marten Jäger, Max Schubach, Tomasz Zemojtel, Knut Reinert, Deanna M. Church, Peter N. Robinson

Abstract

The last two human genome assemblies have extended the previous linear golden-path paradigm of the human genome to a graph-like model to better represent regions with a high degree of structural variability. The new model offers opportunities to improve the technical validity of variant calling in whole-genome sequencing (WGS). We developed an algorithm that analyzes the patterns of variant calls in the 178 structurally variable regions of the GRCh38 genome assembly, and infers whether a given sample is most likely to contain sequences from the primary assembly, an alternate locus, or their heterozygous combination at each of these 178 regions. We investigate 121 in-house WGS datasets that have been aligned to the GRCh37 and GRCh38 assemblies. We show that stretches of sequences that are largely but not entirely identical between the primary assembly and an alternate locus can result in multiple variant calls against regions of the primary assembly. In WGS analysis, this results in characteristic and recognizable patterns of variant calls at positions that we term alignable scaffold-discrepant positions (ASDPs). In 121 in-house genomes, on average 51.8±3.8 of the 178 regions were found to correspond best to an alternate locus rather than the primary assembly sequence, and filtering these genomes with our algorithm led to the identification of 7863 variant calls per genome that colocalized with ASDPs. Additionally, we found that 437 of 791 genome-wide association study hits located within one of the regions corresponded to ASDPs. Our algorithm uses the information contained in the 178 structurally variable regions of the GRCh38 genome assembly to avoid spurious variant calls in cases where samples contain an alternate locus rather than the corresponding segment of the primary assembly. These results suggest the great potential of fully incorporating the resources of graph-like genome assemblies into variant calling, but also underscore the importance of developing computational resources that will allow a full reconstruction of the genotype in personal genomes. Our algorithm is freely available at https://github.com/charite/asdpex .

X Demographics

X Demographics

The data shown below were collected from the profiles of 29 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 57 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Italy 1 2%
Norway 1 2%
Unknown 55 96%

Demographic breakdown

Readers by professional status Count As %
Researcher 19 33%
Student > Ph. D. Student 8 14%
Student > Master 6 11%
Student > Bachelor 4 7%
Student > Doctoral Student 3 5%
Other 10 18%
Unknown 7 12%
Readers by discipline Count As %
Agricultural and Biological Sciences 25 44%
Biochemistry, Genetics and Molecular Biology 14 25%
Computer Science 3 5%
Medicine and Dentistry 2 4%
Arts and Humanities 1 2%
Other 1 2%
Unknown 11 19%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 14. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 02 February 2017.
All research outputs
#2,429,299
of 24,673,288 outputs
Outputs from Genome Medicine
#550
of 1,518 outputs
Outputs of similar age
#47,769
of 430,587 outputs
Outputs of similar age from Genome Medicine
#15
of 32 outputs
Altmetric has tracked 24,673,288 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 90th percentile: it's in the top 10% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 1,518 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 27.2. This one has gotten more attention than average, scoring higher than 63% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 430,587 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 88% of its contemporaries.
We're also able to compare this research output to 32 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 56% of its contemporaries.