Title |
Identification and correction of systematic error in high-throughput sequence data
|
---|---|
Published in |
BMC Bioinformatics, November 2011
|
DOI | 10.1186/1471-2105-12-451 |
Pubmed ID | |
Authors |
Frazer Meacham, Dario Boffelli, Joseph Dhahbi, David IK Martin, Meromit Singer, Lior Pachter |
Abstract |
A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed "next-gen" sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of systematic error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 4 | 40% |
United Kingdom | 2 | 20% |
Germany | 1 | 10% |
Spain | 1 | 10% |
Unknown | 2 | 20% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 6 | 60% |
Members of the public | 4 | 40% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 19 | 4% |
United Kingdom | 10 | 2% |
Germany | 8 | 2% |
Spain | 5 | 1% |
Brazil | 5 | 1% |
France | 3 | <1% |
Australia | 2 | <1% |
Sweden | 2 | <1% |
Ghana | 1 | <1% |
Other | 14 | 3% |
Unknown | 398 | 85% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 147 | 31% |
Student > Ph. D. Student | 120 | 26% |
Student > Master | 42 | 9% |
Other | 32 | 7% |
Student > Bachelor | 23 | 5% |
Other | 72 | 15% |
Unknown | 31 | 7% |
Readers by discipline | Count | As % |
---|---|---|
Agricultural and Biological Sciences | 248 | 53% |
Biochemistry, Genetics and Molecular Biology | 76 | 16% |
Computer Science | 42 | 9% |
Medicine and Dentistry | 13 | 3% |
Mathematics | 13 | 3% |
Other | 39 | 8% |
Unknown | 36 | 8% |