Title |
ReCoil - an algorithm for compression of extremely large datasets of dna data
|
---|---|
Published in |
Algorithms for Molecular Biology, October 2011
|
DOI | 10.1186/1748-7188-6-23 |
Pubmed ID | |
Authors |
Vladimir Yanovsky |
Abstract |
The growing volume of generated DNA sequencing data makes the problem of its long term storage increasingly important. In this work we present ReCoil - an I/O efficient external memory algorithm designed for compression of very large collections of short reads DNA data. Typically each position of DNA sequence is covered by multiple reads of a short read dataset and our algorithm makes use of resulting redundancy to achieve high compression rate.While compression based on encoding mismatches between the dataset and a similar reference can yield high compression rate, good quality reference sequence may be unavailable. Instead, ReCoil's compression is based on encoding the differences between similar or overlapping reads. As such reads may appear at large distances from each other in the dataset and since random access memory is a limited resource, ReCoil is designed to work efficiently in external memory, leveraging high bandwidth of modern hard disk drives. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
Unknown | 1 | 100% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 1 | 100% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 4 | 10% |
Germany | 2 | 5% |
France | 2 | 5% |
Sweden | 1 | 2% |
Portugal | 1 | 2% |
Unknown | 32 | 76% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 18 | 43% |
Student > Ph. D. Student | 8 | 19% |
Student > Bachelor | 4 | 10% |
Student > Master | 4 | 10% |
Professor | 2 | 5% |
Other | 4 | 10% |
Unknown | 2 | 5% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 17 | 40% |
Agricultural and Biological Sciences | 16 | 38% |
Biochemistry, Genetics and Molecular Biology | 2 | 5% |
Engineering | 2 | 5% |
Medicine and Dentistry | 1 | 2% |
Other | 1 | 2% |
Unknown | 3 | 7% |