Report for: Genes sharing the protein family domain decrease the performance of classification with RNA-seq genomic signatures

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Genes sharing the protein family domain decrease the performance of classification with RNA-seq genomic signatures
Published in	Biology Direct, February 2018
DOI	10.1186/s13062-018-0205-x
Pubmed ID	29467011
Authors	Anna Leśniewska, Joanna Zyprych-Walczak, Alicja Szabelska-Beręsewicz, Michal J. Okoniewski
Abstract	The experience with running various types of classification on the CAMDA neuroblastoma dataset have led us to the conclusion that the results are not always obvious and may differ depending on type of analysis and selection of genes used for classification. This paper aims in pointing out several factors that may influence the downstream machine learning analysis. In particular those factors are: type of the primary analysis, type of the classifier and increased correlation between the genes sharing a protein domain. They influence the analysis directly, but also interplay between them may be important. We have compiled the gene-domain database and used it for analysis to see the differences between the genes that share a domain versus the rest of the genes in the datasets. The major findings are: pairs of genes that share a domain have an increased Spearman's correlation coefficients of counts; genes sharing a domain are expected to have a lower predictive power due to increased correlation. For most of the cases it can be seen with the higher number of misclassified samples; classifiers performance may vary depending on a method, still in most cases using genes sharing a domain in the training set results in a higher misclassification rate; increased correlation in genes sharing a domain results most often in worse performance of the classifiers regardless of the primary analysis tools used, even if the primary analysis alignment yield varies. The effect of sharing a domain is likely more a results of real biological co-expression than just sequence similarity and artifacts of mapping and counting. Still, this is more difficult to conclude and needs further research. The effect is interesting itself, but we also point out some practical aspects in which it may influence the RNA sequencing analysis and RNA biomarker use. In particular it means that a gene signature biomarker set build out of RNA-sequencing results should be depleted for genes sharing common domains. It may cause to perform better when applying classification. This article was reviewed by Dimitar Vassiliev and Susmita Datta.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.
As of 1 July 2024, you may notice a temporary increase in the numbers of X profiles with Unknown location. Click here to learn more.

Geographical breakdown

Country	Count	As %
Ukraine	1	50%
Unknown	1	50%

Demographic breakdown

Type	Count	As %
Scientists	2	100%

Mendeley readers

The data shown below were compiled from readership statistics for 16 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	16	100%

Demographic breakdown

Readers by professional status	Count	As %
Student > Bachelor	3	19%
Other	2	13%
Student > Ph. D. Student	2	13%
Researcher	2	13%
Professor > Associate Professor	2	13%
Other	1	6%
Unknown	4	25%

Readers by discipline	Count	As %
Medicine and Dentistry	3	19%
Biochemistry, Genetics and Molecular Biology	2	13%
Engineering	2	13%
Computer Science	1	6%
Nursing and Health Professions	1	6%
Other	2	13%
Unknown	5	31%

Attention Score in Context

This research output has an Altmetric Attention Score of 2. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 24 February 2018.

All research outputs

#14,377,572

of 23,025,074 outputs

Outputs from Biology Direct

#337

of 487 outputs

Outputs of similar age

#188,169

of 331,231 outputs

Outputs of similar age from Biology Direct

of 5 outputs

Altmetric has tracked 23,025,074 research outputs across all sources so far. This one is in the 35th percentile – i.e., 35% of other outputs scored the same or lower than it.

So far Altmetric has tracked 487 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 10.7. This one is in the 27th percentile – i.e., 27% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 331,231 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 40th percentile – i.e., 40% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 5 others from the same source and published within six weeks on either side of this one. This one has scored higher than 2 of them.

Genes sharing the protein family domain decrease the performance of classification with RNA-seq genomic signatures

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context