Progress in the field of protein three-dimensional structure prediction depends on the development of new and improved algorithms for measuring the quality of protein models. Perhaps the best descriptor of the quality of a protein model is the GDT function that maps each distance cutoff θ to the number of atoms in the protein model that can be fit under the distance θ from the corresponding atoms in the experimentally determined structure. It has long been known that the area under the graph of this function (GDT_A) can serve as a reliable, single numerical measure of the model quality. Unfortunately, while the well-known GDT_TS metric provides a crude approximation of GDT_A, no algorithm currently exists that is capable of computing accurate estimates of GDT_A.
We prove that GDT_A is well defined and that it can be approximated by the Riemann sums, using available methods for computing accurate (near-optimal) GDT function values.
In contrast to the GDT_TS metric, GDT_A is neither insensitive to large nor oversensitive to small changes in model's coordinates. Moreover, the problem of computing GDT_A is tractable. More specifically, GDT_A can be computed in cubic asymptotic time in the size of the protein model.
This paper presents the first algorithm capable of computing the near-optimal estimates of the area under the GDT function for a protein model. We believe that the techniques implemented in our algorithm will pave ways for the development of more practical and reliable procedures for estimating 3D model quality.