↓ Skip to main content

A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy

Overview of attention for article published in BMC Bioinformatics, December 2017
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • Good Attention Score compared to outputs of the same age (76th percentile)
  • Good Attention Score compared to outputs of the same age and source (77th percentile)

Mentioned by

twitter
10 X users

Citations

dimensions_citation
61 Dimensions

Readers on

mendeley
238 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
Published in
BMC Bioinformatics, December 2017
DOI 10.1186/s12859-017-2000-6
Pubmed ID
Authors

Daniel P. Wickland, Gopal Battu, Karen A. Hudson, Brian W. Diers, Matthew E. Hudson

Abstract

Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this method produces extensive missing data and requires complex bioinformatics analysis. GBS is most commonly used on crop plant genomes, and because crop plants have highly variable ploidy and repeat content, the performance of GBS analysis software can vary by target organism. Here we focus our analysis on soybean, a polyploid crop with a highly duplicated genome, relatively little public GBS data and few dedicated tools. We compared the performance of five GBS pipelines using low-coverage Illumina sequence data from three soybean populations. To address issues identified with existing methods, we developed GB-eaSy, a GBS bioinformatics workflow that incorporates widely used genomics tools, parallelization and automation to increase the accuracy and accessibility of GBS data analysis. Compared to other GBS pipelines, GB-eaSy rapidly and accurately identified the greatest number of SNPs, with SNP calls closely concordant with whole-genome sequencing of selected lines. Across all five GBS analysis platforms, SNP calls showed unexpectedly low convergence but generally high accuracy, indicating that the workflows arrived at largely complementary sets of valid SNP calls on the low-coverage data analyzed. We show that GB-eaSy is approximately as good as, or better than, other leading software solutions in the accuracy, yield and missing data fraction of variant calling, as tested on low-coverage genomic data from soybean. It also performs well relative to other solutions in terms of the run time and disk space required. In addition, GB-eaSy is built from existing open-source, modular software packages that are regularly updated and commonly used, making it straightforward to install and maintain. While GB-eaSy outperformed other individual methods on the datasets analyzed, our findings suggest that a comprehensive approach integrating the results from multiple GBS bioinformatics pipelines may be the optimal strategy to obtain the largest, most highly accurate SNP yield possible from low-coverage polyploid sequence data.

X Demographics

X Demographics

The data shown below were collected from the profiles of 10 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 238 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 238 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 52 22%
Researcher 40 17%
Student > Master 35 15%
Student > Bachelor 21 9%
Student > Doctoral Student 8 3%
Other 20 8%
Unknown 62 26%
Readers by discipline Count As %
Agricultural and Biological Sciences 106 45%
Biochemistry, Genetics and Molecular Biology 44 18%
Environmental Science 6 3%
Computer Science 3 1%
Chemistry 2 <1%
Other 10 4%
Unknown 67 28%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 7. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 18 December 2019.
All research outputs
#5,188,610
of 25,552,205 outputs
Outputs from BMC Bioinformatics
#1,794
of 7,718 outputs
Outputs of similar age
#103,400
of 450,040 outputs
Outputs of similar age from BMC Bioinformatics
#33
of 141 outputs
Altmetric has tracked 25,552,205 research outputs across all sources so far. Compared to these this one has done well and is in the 79th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 7,718 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.5. This one has done well, scoring higher than 76% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 450,040 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 76% of its contemporaries.
We're also able to compare this research output to 141 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 77% of its contemporaries.