Title |
GenAp: a distributed SQL interface for genomic data
|
---|---|
Published in |
BMC Bioinformatics, February 2016
|
DOI | 10.1186/s12859-016-0904-1 |
Pubmed ID | |
Authors |
Christos Kozanitis, David A. Patterson |
Abstract |
The impressively low cost and improved quality of genome sequencing provides to researchers of genetic diseases, such as cancer, a powerful tool to better understand the underlying genetic mechanisms of those diseases and treat them with effective targeted therapies. Thus, a number of projects today sequence the DNA of large patient populations each of which produces at least hundreds of terra-bytes of data. Now the challenge is to provide the produced data on demand to interested parties. In this paper, we show that the response to this challenge is a modified version of Spark SQL, a distributed SQL execution engine, that handles efficiently joins that use genomic intervals as keys. With this modification, Spark SQL serves such joins more than 50× faster than its existing brute force approach and 8× faster than similar distributed implementations. Thus, Spark SQL can replace existing practices to retrieve genomic data and, as we show, allow users to reduce the number of lines of software code that needs to be developed to query such data by an order of magnitude. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 3 | 15% |
Spain | 3 | 15% |
Russia | 1 | 5% |
France | 1 | 5% |
Australia | 1 | 5% |
India | 1 | 5% |
United Kingdom | 1 | 5% |
Italy | 1 | 5% |
Unknown | 8 | 40% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 11 | 55% |
Scientists | 8 | 40% |
Practitioners (doctors, other healthcare professionals) | 1 | 5% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Spain | 1 | 2% |
United States | 1 | 2% |
Netherlands | 1 | 2% |
Ukraine | 1 | 2% |
Unknown | 49 | 92% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 14 | 26% |
Researcher | 13 | 25% |
Student > Master | 9 | 17% |
Other | 4 | 8% |
Student > Doctoral Student | 2 | 4% |
Other | 5 | 9% |
Unknown | 6 | 11% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 22 | 42% |
Agricultural and Biological Sciences | 13 | 25% |
Biochemistry, Genetics and Molecular Biology | 7 | 13% |
Immunology and Microbiology | 2 | 4% |
Engineering | 2 | 4% |
Other | 2 | 4% |
Unknown | 5 | 9% |