↓ Skip to main content

A variant by any name: quantifying annotation discordance across tools and clinical databases

Overview of attention for article published in Genome Medicine, January 2017
Altmetric Badge

About this Attention Score

  • In the top 5% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (94th percentile)
  • Good Attention Score compared to outputs of the same age and source (68th percentile)

Mentioned by

news
1 news outlet
twitter
39 X users
facebook
1 Facebook page

Citations

dimensions_citation
62 Dimensions

Readers on

mendeley
135 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
A variant by any name: quantifying annotation discordance across tools and clinical databases
Published in
Genome Medicine, January 2017
DOI 10.1186/s13073-016-0396-7
Pubmed ID
Authors

Jennifer L. Yen, Sarah Garcia, Aldrin Montana, Jason Harris, Stephen Chervitz, Massimo Morra, John West, Richard Chen, Deanna M. Church

Abstract

Clinical genomic testing is dependent on the robust identification and reporting of variant-level information in relation to disease. With the shift to high-throughput sequencing, a major challenge for clinical diagnostics is the cross-identification of variants called on their genomic position to resources that rely on transcript- or protein-based descriptions. We evaluated the accuracy of three tools (SnpEff, Variant Effect Predictor, and Variation Reporter) that generate transcript and protein-based variant nomenclature from genomic coordinates according to guidelines by the Human Genome Variation Society (HGVS). Our evaluation was based on transcript-controlled comparisons to a manually curated set of 126 test variants of various types drawn from data sources, each with HGVS-compliant transcript and protein descriptors. We further evaluated the concordance between annotations generated by Snpeff and Variant Effect Predictor and those in major germline and cancer databases: ClinVar and COSMIC, respectively. We find that there is substantial discordance between the annotation tools and databases in the description of insertions and/or deletions. Using our ground truth set of variants, constructed specifically to identify challenging events, accuracy was between 80 and 90% for coding and 50 and 70% for protein changes for 114 to 126 variants. Exact concordance for SNV syntax was over 99.5% between ClinVar and Variant Effect Predictor and SnpEff, but less than 90% for non-SNV variants. For COSMIC, exact concordance for coding and protein SNVs was between 65 and 88% and less than 15% for insertions. Across the tools and datasets, there was a wide range of different but equivalent expressions describing protein variants. Our results reveal significant inconsistency in variant representation across tools and databases. While some of these syntax differences may be clear to a clinician, they can confound variant matching, an important step in variant classification. These results highlight the urgent need for the adoption and adherence to uniform standards in variant annotation, with consistent reporting on the genomic reference, to enable accurate and efficient data-driven clinical care.

X Demographics

X Demographics

The data shown below were collected from the profiles of 39 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 135 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 1 <1%
Sweden 1 <1%
Unknown 133 99%

Demographic breakdown

Readers by professional status Count As %
Researcher 38 28%
Student > Master 19 14%
Other 15 11%
Student > Ph. D. Student 13 10%
Student > Bachelor 9 7%
Other 18 13%
Unknown 23 17%
Readers by discipline Count As %
Biochemistry, Genetics and Molecular Biology 50 37%
Agricultural and Biological Sciences 29 21%
Computer Science 15 11%
Medicine and Dentistry 9 7%
Neuroscience 3 2%
Other 5 4%
Unknown 24 18%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 33. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 07 December 2017.
All research outputs
#1,149,309
of 24,598,501 outputs
Outputs from Genome Medicine
#235
of 1,517 outputs
Outputs of similar age
#25,060
of 427,815 outputs
Outputs of similar age from Genome Medicine
#11
of 32 outputs
Altmetric has tracked 24,598,501 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 95th percentile: it's in the top 5% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 1,517 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 27.2. This one has done well, scoring higher than 84% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 427,815 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 94% of its contemporaries.
We're also able to compare this research output to 32 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 68% of its contemporaries.