↓ Skip to main content

Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming

Overview of attention for article published in BMC Bioinformatics, December 2017
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
14 Dimensions

Readers on

mendeley
5 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming
Published in
BMC Bioinformatics, December 2017
DOI 10.1186/s12859-017-1944-x
Pubmed ID
Authors

Vassily Lyubetsky, Roman Gershgorin, Konstantin Gorbunov

Abstract

Chromosome structure is a very limited model of the genome including the information about its chromosomes such as their linear or circular organization, the order of genes on them, and the DNA strand encoding a gene. Gene lengths, nucleotide composition, and intergenic regions are ignored. Although highly incomplete, such structure can be used in many cases, e.g., to reconstruct phylogeny and evolutionary events, to identify gene synteny, regulatory elements and promoters (considering highly conserved elements), etc. Three problems are considered; all assume unequal gene content and the presence of gene paralogs. The distance problem is to determine the minimum number of operations required to transform one chromosome structure into another and the corresponding transformation itself including the identification of paralogs in two structures. We use the DCJ model which is one of the most studied combinatorial rearrangement models. Double-, sesqui-, and single-operations as well as deletion and insertion of a chromosome region are considered in the model; the single ones comprise cut and join. In the reconstruction problem, a phylogenetic tree with chromosome structures in the leaves is given. It is necessary to assign the structures to inner nodes of the tree to minimize the sum of distances between terminal structures of each edge and to identify the mutual paralogs in a fairly large set of structures. A linear algorithm is known for the distance problem without paralogs, while the presence of paralogs makes it NP-hard. If paralogs are allowed but the insertion and deletion operations are missing (and special constraints are imposed), the reduction of the distance problem to integer linear programming is known. Apparently, the reconstruction problem is NP-hard even in the absence of paralogs. The problem of contigs is to find the optimal arrangements for each given set of contigs, which also includes the mutual identification of paralogs. We proved that these problems can be reduced to integer linear programming formulations, which allows an algorithm to redefine the problems to implement a very special case of the integer linear programming tool. The results were tested on synthetic and biological samples. Three well-known problems were reduced to a very special case of integer linear programming, which is a new method of their solutions. Integer linear programming is clearly among the main computational methods and, as generally accepted, is fast on average; in particular, computation systems specifically targeted at it are available. The challenges are to reduce the size of the corresponding integer linear programming formulations and to incorporate a more detailed biological concept in our model of the reconstruction.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 5 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 5 100%

Demographic breakdown

Readers by professional status Count As %
Student > Master 2 40%
Unspecified 1 20%
Student > Bachelor 1 20%
Professor 1 20%
Readers by discipline Count As %
Unspecified 1 20%
Biochemistry, Genetics and Molecular Biology 1 20%
Computer Science 1 20%
Agricultural and Biological Sciences 1 20%
Unknown 1 20%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 08 December 2017.
All research outputs
#20,454,971
of 23,011,300 outputs
Outputs from BMC Bioinformatics
#6,890
of 7,315 outputs
Outputs of similar age
#375,085
of 439,982 outputs
Outputs of similar age from BMC Bioinformatics
#112
of 134 outputs
Altmetric has tracked 23,011,300 research outputs across all sources so far. This one is in the 1st percentile – i.e., 1% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,315 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 439,982 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 134 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.