Title |
Predicting cancer type from tumour DNA signatures
|
---|---|
Published in |
Genome Medicine, November 2017
|
DOI | 10.1186/s13073-017-0493-2 |
Pubmed ID | |
Authors |
Kee Pang Soh, Ewa Szczurek, Thomas Sakoparnig, Niko Beerenwinkel |
Abstract |
Establishing the cancer type and site of origin is important in determining the most appropriate course of treatment for cancer patients. Patients with cancer of unknown primary, where the site of origin cannot be established from an examination of the metastatic cancer cells, typically have poor survival. Here, we evaluate the potential and limitations of utilising gene alteration data from tumour DNA to identify cancer types. Using sequenced tumour DNA downloaded via the cBioPortal for Cancer Genomics, we collected the presence or absence of calls for gene alterations for 6640 tumour samples spanning 28 cancer types, as predictive features. We employed three machine-learning techniques, namely linear support vector machines with recursive feature selection, L 1-regularised logistic regression and random forest, to select a small subset of gene alterations that are most informative for cancer-type prediction. We then evaluated the predictive performance of the models in a comparative manner. We found the linear support vector machine to be the most predictive model of cancer type from gene alterations. Using only 100 somatic point-mutated genes for prediction, we achieved an overall accuracy of 49.4±0.4 % (95 % confidence interval). We observed a marked increase in the accuracy when copy number alterations are included as predictors. With a combination of somatic point mutations and copy number alterations, a mere 50 genes are enough to yield an overall accuracy of 77.7±0.3 %. A general cancer diagnostic tool that utilises either only somatic point mutations or only copy number alterations is not sufficient for distinguishing a broad range of cancer types. The combination of both gene alteration types can dramatically improve the performance. |
Twitter Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 4 | 50% |
United Kingdom | 2 | 25% |
Norway | 1 | 13% |
Russia | 1 | 13% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 4 | 50% |
Scientists | 4 | 50% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Unknown | 105 | 100% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 21 | 20% |
Student > Ph. D. Student | 14 | 13% |
Student > Master | 14 | 13% |
Student > Bachelor | 11 | 10% |
Student > Doctoral Student | 5 | 5% |
Other | 19 | 18% |
Unknown | 21 | 20% |
Readers by discipline | Count | As % |
---|---|---|
Biochemistry, Genetics and Molecular Biology | 23 | 22% |
Computer Science | 17 | 16% |
Agricultural and Biological Sciences | 11 | 10% |
Medicine and Dentistry | 9 | 9% |
Unspecified | 4 | 4% |
Other | 14 | 13% |
Unknown | 27 | 26% |