Abstract:
Uncovering the organization of a landscape that encapsulates all states of a dynamic system is a central task in many domains, as it promises to reveal, in an unsupervised manner, a system’s inner working. One domain where this task is crucial is in bioinformatics, where the energy landscape that organizes three-dimensional structures of a molecule by their energetics is a powerful construct. The landscape can be leveraged, among other things, to reveal macrostates where a molecule is biologically-active. This is a daunting task, as landscapes of complex actuated systems, such as molecules, are inherently high-dimensional. Nonetheless, our laboratories have made some progress via topological and statistical analysis of spatial data over the recent years. We have proposed what is essentially a dichotomy, methods that are more pertinent for visualization-driven discovery, and methods that are more pertinent for discovery of the biologically-active macrostates but not amenable to visualization. In this paper, we present a novel, hybrid method that combines strengths of these methods, allowing both visualization of the landscape and discovery of macrostates. We demonstrate what the method is capable of uncovering in comparison with existing methods over structure spaces sampled with conformational sampling algorithms. Though the direct evaluation in this paper is on protein energy landscapes, the proposed method is of broad interest in cross-cutting problems that necessitate characterization of fitness and optimization landscapes.