↓ Skip to main content

A novel pathway-based distance score enhances assessment of disease heterogeneity in gene expression

Overview of attention for article published in BMC Bioinformatics, June 2017
Altmetric Badge

About this Attention Score

  • Average Attention Score compared to outputs of the same age
  • Average Attention Score compared to outputs of the same age and source

Mentioned by

twitter
5 X users

Citations

dimensions_citation
3 Dimensions

Readers on

mendeley
37 Mendeley
citeulike
1 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
A novel pathway-based distance score enhances assessment of disease heterogeneity in gene expression
Published in
BMC Bioinformatics, June 2017
DOI 10.1186/s12859-017-1727-4
Pubmed ID
Authors

Xiting Yan, Anqi Liang, Jose Gomez, Lauren Cohn, Hongyu Zhao, Geoffrey L. Chupp

Abstract

Distance based unsupervised clustering of gene expression data is commonly used to identify heterogeneity in biologic samples. However, high noise levels in gene expression data and relatively high correlation between genes are often encountered, so traditional distances such as Euclidean distance may not be effective at discriminating the biological differences between samples. An alternative method to examine disease phenotypes is to use pre-defined biological pathways. These pathways have been shown to be perturbed in different ways in different subjects who have similar clinical features. We hypothesize that differences in the expressions of genes in a given pathway are more predictive of differences in biological differences compared to standard approaches and if integrated into clustering analysis will enhance the robustness and accuracy of the clustering method. To examine this hypothesis, we developed a novel computational method to assess the biological differences between samples using gene expression data by assuming that ontologically defined biological pathways in biologically similar samples have similar behavior. Pre-defined biological pathways were downloaded and genes in each pathway were used to cluster samples using the Gaussian mixture model. The clustering results across different pathways were then summarized to calculate the pathway-based distance score between samples. This method was applied to both simulated and real data sets and compared to the traditional Euclidean distance and another pathway-based clustering method, Pathifier. The results show that the pathway-based distance score performs significantly better than the Euclidean distance, especially when the heterogeneity is low and genes in the same pathways are correlated. Compared to Pathifier, we demonstrated that our approach achieves higher accuracy and robustness for small pathways. When the pathway size is large, by downsampling the pathways into smaller pathways, our approach was able to achieve comparable performance. We have developed a novel distance score that represents the biological differences between samples using gene expression data and pre-defined biological pathway information. Application of this distance score results in more accurate, robust, and biologically meaningful clustering results in both simulated data and real data when compared to traditional methods. It also has comparable or better performance compared to Pathifier.

X Demographics

X Demographics

The data shown below were collected from the profiles of 5 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 37 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 37 100%

Demographic breakdown

Readers by professional status Count As %
Researcher 12 32%
Student > Ph. D. Student 9 24%
Other 2 5%
Student > Doctoral Student 2 5%
Student > Bachelor 2 5%
Other 5 14%
Unknown 5 14%
Readers by discipline Count As %
Biochemistry, Genetics and Molecular Biology 5 14%
Medicine and Dentistry 5 14%
Mathematics 4 11%
Computer Science 3 8%
Agricultural and Biological Sciences 3 8%
Other 9 24%
Unknown 8 22%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 2. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 08 November 2017.
All research outputs
#14,068,665
of 22,982,639 outputs
Outputs from BMC Bioinformatics
#4,493
of 7,309 outputs
Outputs of similar age
#170,224
of 316,841 outputs
Outputs of similar age from BMC Bioinformatics
#61
of 115 outputs
Altmetric has tracked 22,982,639 research outputs across all sources so far. This one is in the 37th percentile – i.e., 37% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,309 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 35th percentile – i.e., 35% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 316,841 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 44th percentile – i.e., 44% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 115 others from the same source and published within six weeks on either side of this one. This one is in the 42nd percentile – i.e., 42% of its contemporaries scored the same or lower than it.