Report for: Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm
Published in	BMC Bioinformatics, July 2015
DOI	10.1186/s12859-015-0625-x
Pubmed ID	26160651
Authors	Theodore R. Gibbons, Stephen M. Mount, Endymion D. Cooper, Charles F. Delwiche
Abstract	Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 22 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
United States	5	23%
New Zealand	2	9%
Canada	2	9%
United Kingdom	2	9%
China	1	5%
France	1	5%
Germany	1	5%
Italy	1	5%
Spain	1	5%
Other	0	0%
Unknown	6	27%

Demographic breakdown

Type	Count	As %
Scientists	16	73%
Members of the public	6	27%

Mendeley readers

The data shown below were compiled from readership statistics for 85 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
United Kingdom	2	2%
United States	2	2%
Spain	1	1%
New Zealand	1	1%
Unknown	79	93%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	21	25%
Student > Ph. D. Student	16	19%
Student > Master	13	15%
Student > Bachelor	8	9%
Student > Doctoral Student	6	7%
Other	12	14%
Unknown	9	11%

Readers by discipline	Count	As %
Agricultural and Biological Sciences	35	41%
Biochemistry, Genetics and Molecular Biology	20	24%
Computer Science	7	8%
Engineering	2	2%
Chemistry	2	2%
Other	7	8%
Unknown	12	14%

Attention Score in Context

This research output has an Altmetric Attention Score of 11. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 13 December 2017.

All research outputs

#3,206,536

of 25,401,381 outputs

Outputs from BMC Bioinformatics

#995

of 7,699 outputs

Outputs of similar age

#39,099

of 277,343 outputs

Outputs of similar age from BMC Bioinformatics

#17

of 112 outputs

Altmetric has tracked 25,401,381 research outputs across all sources so far. Compared to these this one has done well and is in the 87th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.

So far Altmetric has tracked 7,699 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.5. This one has done well, scoring higher than 87% of its peers.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 277,343 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 85% of its contemporaries.

We're also able to compare this research output to 112 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 85% of its contemporaries.

Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context