Report for: Better ILP models for haplotype assembly

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Better ILP models for haplotype assembly
Published in	BMC Bioinformatics, February 2018
DOI	10.1186/s12859-018-2012-x
Pubmed ID	29504891
Authors	Maryam Etemadi, Mehri Bagherian, Zhi-Zhong Chen, Lusheng Wang
Abstract	The haplotype assembly problem for diploid is to find a pair of haplotypes from a given set of aligned Single Nucleotide Polymorphism (SNP) fragments (reads). It has many applications in association studies, drug design, and genetic research. Since this problem is computationally hard, both heuristic and exact algorithms have been designed for it. Although exact algorithms are much slower, they are still of great interest because they usually output significantly better solutions than heuristic algorithms in terms of popular measures such as the Minimum Error Correction (MEC) score, the number of switch errors, and the QAN50 score. Exact algorithms are also valuable because they can be used to witness how good a heuristic algorithm is. The best known exact algorithm is based on integer linear programming (ILP) and it is known that ILP can also be used to improve the output quality of every heuristic algorithm with a little decline in speed. Therefore, faster ILP models for the problem are highly demanded. As in previous studies, we consider not only the general case of the problem but also its all-heterozygous case where we assume that if a column of the input read matrix contains at least one 0 and one 1, then it corresponds to a heterozygous SNP site. For both cases, we design new ILP models for the haplotype assembly problem which aim at minimizing the MEC score. The new models are theoretically better because they contain significantly fewer constraints. More importantly, our experimental results show that for both simulated and real datasets, the new model for the all-heterozygous (respectively, general) case can usually be solved via CPLEX (an ILP solver) at least 5 times (respectively, twice) faster than the previous bests. Indeed, the running time can sometimes be 41 times better. This paper proposes a new ILP model for the haplotype assembly problem and its all-heterozygous case, respectively. Experiments with both real and simulated datasets show that the new models can be solved within much shorter time by CPLEX than the previous bests. We believe that the models can be used to improve heuristic algorithms as well.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
United Kingdom	1	100%

Demographic breakdown

Type	Count	As %
Members of the public	1	100%

Mendeley readers

The data shown below were compiled from readership statistics for 4 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	4	100%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	2	50%
Professor > Associate Professor	1	25%
Student > Ph. D. Student	1	25%

Readers by discipline	Count	As %
Mathematics	2	50%
Biochemistry, Genetics and Molecular Biology	1	25%
Engineering	1	25%

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 26 February 2018.

All research outputs

#20,466,701

of 23,025,074 outputs

Outputs from BMC Bioinformatics

#6,891

of 7,316 outputs

Outputs of similar age

#292,330

of 330,824 outputs

Outputs of similar age from BMC Bioinformatics

#86

of 94 outputs

Altmetric has tracked 23,025,074 research outputs across all sources so far. This one is in the 1st percentile – i.e., 1% of other outputs scored the same or lower than it.

So far Altmetric has tracked 7,316 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 330,824 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 94 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.

Better ILP models for haplotype assembly

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context