↓ Skip to main content

Interactive knowledge discovery and data mining on genomic expression data with numeric formal concept analysis

Overview of attention for article published in BMC Bioinformatics, September 2016
Altmetric Badge

About this Attention Score

  • Average Attention Score compared to outputs of the same age
  • Average Attention Score compared to outputs of the same age and source

Mentioned by

twitter
5 X users

Citations

dimensions_citation
8 Dimensions

Readers on

mendeley
33 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Interactive knowledge discovery and data mining on genomic expression data with numeric formal concept analysis
Published in
BMC Bioinformatics, September 2016
DOI 10.1186/s12859-016-1234-z
Pubmed ID
Authors

Jose M González-Calabozo, Francisco J Valverde-Albacete, Carmen Peláez-Moreno

Abstract

Gene Expression Data (GED) analysis poses a great challenge to the scientific community that can be framed into the Knowledge Discovery in Databases (KDD) and Data Mining (DM) paradigm. Biclustering has emerged as the machine learning method of choice to solve this task, but its unsupervised nature makes result assessment problematic. This is often addressed by means of Gene Set Enrichment Analysis (GSEA). We put forward a framework in which GED analysis is understood as an Exploratory Data Analysis (EDA) process where we provide support for continuous human interaction with data aiming at improving the step of hypothesis abduction and assessment. We focus on the adaptation to human cognition of data interpretation and visualization of the output of EDA. First, we give a proper theoretical background to bi-clustering using Lattice Theory and provide a set of analysis tools revolving around [Formula: see text]-Formal Concept Analysis ([Formula: see text]-FCA), a lattice-theoretic unsupervised learning technique for real-valued matrices. By using different kinds of cost structures to quantify expression we obtain different sequences of hierarchical bi-clusterings for gene under- and over-expression using thresholds. Consequently, we provide a method with interleaved analysis steps and visualization devices so that the sequences of lattices for a particular experiment summarize the researcher's vision of the data. This also allows us to define measures of persistence and robustness of biclusters to assess them. Second, the resulting biclusters are used to index external omics databases-for instance, Gene Ontology (GO)-thus offering a new way of accessing publicly available resources. This provides different flavors of gene set enrichment against which to assess the biclusters, by obtaining their p-values according to the terminology of those resources. We illustrate the exploration procedure on a real data example confirming results previously published. The GED analysis problem gets transformed into the exploration of a sequence of lattices enabling the visualization of the hierarchical structure of the biclusters with a certain degree of granularity. The ability of FCA-based bi-clustering methods to index external databases such as GO allows us to obtain a quality measure of the biclusters, to observe the evolution of a gene throughout the different biclusters it appears in, to look for relevant biclusters-by observing their genes and what their persistence is-to infer, for instance, hypotheses on their function.

X Demographics

X Demographics

The data shown below were collected from the profiles of 5 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 33 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
France 1 3%
Canada 1 3%
Unknown 31 94%

Demographic breakdown

Readers by professional status Count As %
Researcher 11 33%
Student > Ph. D. Student 5 15%
Student > Bachelor 3 9%
Student > Master 3 9%
Professor 1 3%
Other 1 3%
Unknown 9 27%
Readers by discipline Count As %
Computer Science 12 36%
Biochemistry, Genetics and Molecular Biology 4 12%
Business, Management and Accounting 2 6%
Agricultural and Biological Sciences 2 6%
Engineering 2 6%
Other 2 6%
Unknown 9 27%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 2. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 16 September 2016.
All research outputs
#13,989,437
of 22,888,307 outputs
Outputs from BMC Bioinformatics
#4,488
of 7,298 outputs
Outputs of similar age
#177,219
of 321,166 outputs
Outputs of similar age from BMC Bioinformatics
#59
of 120 outputs
Altmetric has tracked 22,888,307 research outputs across all sources so far. This one is in the 37th percentile – i.e., 37% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,298 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 35th percentile – i.e., 35% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 321,166 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 43rd percentile – i.e., 43% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 120 others from the same source and published within six weeks on either side of this one. This one is in the 45th percentile – i.e., 45% of its contemporaries scored the same or lower than it.