EMQIT: a machine learning approach for energy based PWM matrix quality improvement

Overview of attention for article published in Biology Direct, August 2017
Mentioned by

2 tweeters

Readers on

18 Mendeley
EMQIT: a machine learning approach for energy based PWM matrix quality improvement
Published in
Biology Direct, August 2017
DOI 10.1186/s13062-017-0189-y
Pubmed ID

Karolina Smolinska, Marcin Pacholczyk


Transcription factor binding affinities to DNA play a key role for the gene regulation. Learning the specificity of the mechanisms of binding TFs to DNA is important both to experimentalists and theoreticians. With the development of high-throughput methods such as, e.g., ChiP-seq the need to provide unbiased models of binding events has been made apparent. We present EMQIT a modification to the approach introduced by Alamanova et al. and later implemented as 3DTF server. We observed that tuning of Boltzmann factor weights, used for conversion of calculated energies to nucleotide probabilities, has a significant impact on the quality of the associated PWM matrix. Consequently, we proposed to use receiver operator characteristics curves and the 10-fold cross-validation to learn best weights using experimentally verified data from TRANSFAC database. We applied our method to data available for various TFs. We verified the efficiency of detecting TF binding sites by the 3DTF matrices improved with our technique using experimental data from the TRANSFAC database. The comparison showed a significant similarity and comparable performance between the improved and the experimental matrices (TRANSFAC). Improved 3DTF matrices achieved significantly higher AUC values than the original 3DTF matrices (at least by 0.1) and, at the same time, detected notably more experimentally verified TFBSs. The resulting new improved PWM matrices for analyzed factors show similarity to TRANSFAC matrices. Matrices had comparable predictive capabilities. Moreover, improved PWMs achieve better results than matrices downloaded from 3DTF server. Presented approach is general and applicable to any energy-based matrices. EMQIT is available online at http://biosolvers.polsl.pl:3838/emqit . This article was reviewed by Oliviero Carugo, Marek Kimmel and István Simon.

Twitter Demographics

Mendeley readers

Geographical breakdown

Country Count As %
Unknown 18 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 4 22%
Student > Bachelor 4 22%
Student > Master 2 11%
Professor > Associate Professor 2 11%
Professor 1 6%
Other 3 17%
Unknown 2 11%
Readers by discipline Count As %
Computer Science 6 33%
Engineering 4 22%
Medicine and Dentistry 3 17%
Agricultural and Biological Sciences 1 6%
Biochemistry, Genetics and Molecular Biology 1 6%
Other 0 0%
Unknown 3 17%

Attention Score in Context

This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 03 August 2017.
All research outputs
of 16,639,069 outputs
Outputs from Biology Direct
of 597 outputs
Outputs of similar age
of 274,659 outputs
Outputs of similar age from Biology Direct
of 1 outputs
Altmetric has tracked 16,639,069 research outputs across all sources so far. This one is in the 23rd percentile – i.e., 23% of other outputs scored the same or lower than it.
So far Altmetric has tracked 597 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 6.9. This one is in the 9th percentile – i.e., 9% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 274,659 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 31st percentile – i.e., 31% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 1 others from the same source and published within six weeks on either side of this one. This one has scored higher than all of them