PGen: large-scale genomic variations analysis workflow and browser in SoyKB

Overview of attention for article published in BMC Bioinformatics, October 2016

Altmetric Badge

Readers on

mendeley: 54 Mendeley

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	PGen: large-scale genomic variations analysis workflow and browser in SoyKB
Published in	BMC Bioinformatics, October 2016
DOI	10.1186/s12859-016-1227-y
Pubmed ID	27766951
Authors	Yang Liu, Saad M. Khan, Juexin Wang, Mats Rynge, Yuanxun Zhang, Shuai Zeng, Shiyuan Chen, Joao V. Maldonado dos Santos, Babu Valliyodan, Prasad P. Calyam, Nirav Merchant, Henry T. Nguyen, Dong Xu, Trupti Joshi
Abstract	With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most efficient analysis of soybean data using thorough testing and validation. This research serves as an example of best practices for development of genomics data analysis workflows by integrating remote HPC resources and efficient data management with ease of use for biological users. PGen workflow can also be easily customized for analysis of data in other species.

View on publisher site Alert me about new mentions

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 54 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	54	100%

Demographic breakdown

Readers by professional status	Count	As %
Student > Ph. D. Student	13	24%
Researcher	12	22%
Student > Master	8	15%
Student > Bachelor	3	6%
Student > Doctoral Student	3	6%
Other	6	11%
Unknown	9	17%

Readers by discipline	Count	As %
Agricultural and Biological Sciences	17	31%
Computer Science	16	30%
Biochemistry, Genetics and Molecular Biology	5	9%
Engineering	3	6%
Business, Management and Accounting	1	2%
Other	2	4%
Unknown	10	19%