↓ Skip to main content

Better ILP models for haplotype assembly

Overview of attention for article published in BMC Bioinformatics, February 2018
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
2 Dimensions

Readers on

mendeley
4 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Better ILP models for haplotype assembly
Published in
BMC Bioinformatics, February 2018
DOI 10.1186/s12859-018-2012-x
Pubmed ID
Authors

Maryam Etemadi, Mehri Bagherian, Zhi-Zhong Chen, Lusheng Wang

Abstract

The haplotype assembly problem for diploid is to find a pair of haplotypes from a given set of aligned Single Nucleotide Polymorphism (SNP) fragments (reads). It has many applications in association studies, drug design, and genetic research. Since this problem is computationally hard, both heuristic and exact algorithms have been designed for it. Although exact algorithms are much slower, they are still of great interest because they usually output significantly better solutions than heuristic algorithms in terms of popular measures such as the Minimum Error Correction (MEC) score, the number of switch errors, and the QAN50 score. Exact algorithms are also valuable because they can be used to witness how good a heuristic algorithm is. The best known exact algorithm is based on integer linear programming (ILP) and it is known that ILP can also be used to improve the output quality of every heuristic algorithm with a little decline in speed. Therefore, faster ILP models for the problem are highly demanded. As in previous studies, we consider not only the general case of the problem but also its all-heterozygous case where we assume that if a column of the input read matrix contains at least one 0 and one 1, then it corresponds to a heterozygous SNP site. For both cases, we design new ILP models for the haplotype assembly problem which aim at minimizing the MEC score. The new models are theoretically better because they contain significantly fewer constraints. More importantly, our experimental results show that for both simulated and real datasets, the new model for the all-heterozygous (respectively, general) case can usually be solved via CPLEX (an ILP solver) at least 5 times (respectively, twice) faster than the previous bests. Indeed, the running time can sometimes be 41 times better. This paper proposes a new ILP model for the haplotype assembly problem and its all-heterozygous case, respectively. Experiments with both real and simulated datasets show that the new models can be solved within much shorter time by CPLEX than the previous bests. We believe that the models can be used to improve heuristic algorithms as well.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 4 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 4 100%

Demographic breakdown

Readers by professional status Count As %
Researcher 2 50%
Professor > Associate Professor 1 25%
Student > Ph. D. Student 1 25%
Readers by discipline Count As %
Mathematics 2 50%
Biochemistry, Genetics and Molecular Biology 1 25%
Engineering 1 25%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 26 February 2018.
All research outputs
#20,466,701
of 23,025,074 outputs
Outputs from BMC Bioinformatics
#6,891
of 7,316 outputs
Outputs of similar age
#292,330
of 330,824 outputs
Outputs of similar age from BMC Bioinformatics
#86
of 94 outputs
Altmetric has tracked 23,025,074 research outputs across all sources so far. This one is in the 1st percentile – i.e., 1% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,316 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 330,824 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 94 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.