Title |
Clustering of reads with alignment-free measures and quality values
|
---|---|
Published in |
Algorithms for Molecular Biology, January 2015
|
DOI | 10.1186/s13015-014-0029-x |
Pubmed ID | |
Authors |
Matteo Comin, Andrea Leoni, Michele Schimd |
Abstract |
The data volume generated by Next-Generation Sequencing (NGS) technologies is growing at a pace that is now challenging the storage and data processing capacities of modern computer systems. In this context an important aspect is the reduction of data complexity by collapsing redundant reads in a single cluster to improve the run time, memory requirements, and quality of post-processing steps like assembly and error correction. Several alignment-free measures, based on k-mers counts, have been used to cluster reads. Quality scores produced by NGS platforms are fundamental for various analysis of NGS data like reads mapping and error detection. Moreover future-generation sequencing platforms will produce long reads but with a large number of erroneous bases (up to 15 %). |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
Unknown | 2 | 100% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 1 | 50% |
Scientists | 1 | 50% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
France | 2 | 4% |
Czechia | 1 | 2% |
United States | 1 | 2% |
Unknown | 52 | 93% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 17 | 30% |
Student > Ph. D. Student | 13 | 23% |
Student > Master | 8 | 14% |
Professor > Associate Professor | 3 | 5% |
Professor | 3 | 5% |
Other | 6 | 11% |
Unknown | 6 | 11% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 23 | 41% |
Agricultural and Biological Sciences | 15 | 27% |
Biochemistry, Genetics and Molecular Biology | 5 | 9% |
Immunology and Microbiology | 1 | 2% |
Social Sciences | 1 | 2% |
Other | 2 | 4% |
Unknown | 9 | 16% |