↓ Skip to main content

Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data

Overview of attention for article published in BMC Bioinformatics, May 2015
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (80th percentile)
  • High Attention Score compared to outputs of the same age and source (81st percentile)

Mentioned by

twitter
16 X users

Citations

dimensions_citation
14 Dimensions

Readers on

mendeley
69 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data
Published in
BMC Bioinformatics, May 2015
DOI 10.1186/s12859-015-0573-5
Pubmed ID
Authors

Michal Dabrowski, Norbert Dojer, Izabella Krystkowiak, Bozena Kaminska, Bartek Wilczynski

Abstract

For many years now, binding preferences of Transcription Factors have been described by so called motifs, usually mathematically defined by position weight matrices or similar models, for the purpose of predicting potential binding sites. However, despite the availability of thousands of motif models in public and commercial databases, a researcher who wants to use them is left with many competing methods of identifying potential binding sites in a genome of interest and there is little published information regarding the optimality of different choices. Thanks to the availability of large number of different motif models as well as a number of experimental datasets describing actual binding of TFs in hundreds of TF-ChIP-seq pairs, we set out to perform a comprehensive analysis of this matter. We focus on the task of identifying potential transcription factor binding sites in the human genome. Firstly, we provide a comprehensive comparison of the coverage and quality of models available in different databases, showing that the public databases have comparable TFs coverage and better motif performance than commercial databases. Secondly, we compare different motif scanners showing that, regardless of the database used, the tools developed by the scientific community outperform the commercial tools. Thirdly, we calculate for each motif a detection threshold optimizing the accuracy of prediction. Finally, we provide an in-depth comparison of different methods of choosing thresholds for all motifs a priori. Surprisingly, we show that selecting a common false-positive rate gives results that are the least biased by the information content of the motif and therefore most uniformly accurate. We provide a guide for researchers working with transcription factor motifs. It is supplemented with detailed results of the analysis and the benchmark datasets at http://bioputer.mimuw.edu.pl/papers/motifs/ .

X Demographics

X Demographics

The data shown below were collected from the profiles of 16 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 69 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 3 4%
France 1 1%
Czechia 1 1%
Austria 1 1%
Spain 1 1%
Canada 1 1%
Unknown 61 88%

Demographic breakdown

Readers by professional status Count As %
Researcher 21 30%
Student > Ph. D. Student 14 20%
Student > Master 13 19%
Student > Bachelor 5 7%
Professor 3 4%
Other 8 12%
Unknown 5 7%
Readers by discipline Count As %
Agricultural and Biological Sciences 27 39%
Biochemistry, Genetics and Molecular Biology 18 26%
Computer Science 12 17%
Medicine and Dentistry 2 3%
Engineering 2 3%
Other 3 4%
Unknown 5 7%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 8. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 24 May 2016.
All research outputs
#4,059,319
of 22,803,211 outputs
Outputs from BMC Bioinformatics
#1,567
of 7,281 outputs
Outputs of similar age
#51,796
of 264,364 outputs
Outputs of similar age from BMC Bioinformatics
#24
of 134 outputs
Altmetric has tracked 22,803,211 research outputs across all sources so far. Compared to these this one has done well and is in the 82nd percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 7,281 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one has done well, scoring higher than 78% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 264,364 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 80% of its contemporaries.
We're also able to compare this research output to 134 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 81% of its contemporaries.