Statistical Shape Analysis & Modeling Group

Structure-based RNA function prediction using Elastic Shape Analysis

When one considers the atomic coordinates of RNA structures utilizing their backbone we can view them as 3D open curves. We are able to perform RNA structure alignment based on the geometry of the backbones and we further attempt structure-based function prediction using the SCOR functional classification as reference. We use Elastic Shape Analysis of these backbone RNA structures to functionally classify them. This task is accomplished by means of computing distance matrices and performing leave-one-out classification and computing the accuracy to compare our method with some benchmark method. We show that Elastic Shape Analysis have significant classification rate.

Our work consists in treating the RNA backbone structure as a continuous curve (being points in a non-Euclidean Space, more specifically, a differentiable Manifold) and removes the shape variability (rotation, translation, scaling, and registration) via group actions and comparing orbits of elements in the corresponding quotient space which are equivalent classes. The curve representation here is through a square root velocity function (SRVF) which allows for the stretching and bending of the shapes. This is somehow a desired property when one aligns nucleotide sequences since gaps are often included in the alignments. The distance we use to measure the dissimilarity of two structures is the length of the geodesic path between them in the quotient space of their shape manifold. It is important to emphasize that the distance has the desired mathematical properties of positive definiteness, symmetry and triangle inequality. It is the result of using an appropriate Riemannian Metric in the tangent space of the manifold. Figure [1] shows an example of elastic matching between RNA structures 430D and 1JTJ taken from the NR95-SCOR data. Also, figure [2] shows an example of a geodesic path from RNA structure 430D to 1JTJ. One of the advantages of being able to produce a well defined mathematical distance between shapes is the fact that we can formally perform statistics on shape spaces. For example we can produce sample averages and covariance matrices when necessary. With this, one may attempt the fitting of probability distributions and perform inferential statistics.

- Capriotti E, Marti-Renom M. (2008) RNA structure alignment by a unit-vector approach. Bioinformatics, 24; i112-i116.
- L. P. Chew, D. Huttenlocher, K. Kedem, and J. Kleinberg, Fast Detection of Common Geometric Substructure in Proteins, Journal of Computational Biology (JCB), 6:3 (1999), 313-325.
- Peter S. Klosterman, Donna K. Hendrix, Makio Tamura, Stephen R. Holbrook, and Steven E. Brenner, Three-Dimensional Motifs from the SCOR: Structural Classification of RNA Database - Extruded Strands, Base Triples, Tetraloops, and U-turn. Nucleic Acids Res. 2004, 32. 2342-2352.
- Makio Tamura, Donna K. Hendrix, Peter S. Klosterman, Nancy R. B. Schimmelman, Steven E. Brenner, and Stephen R. Holbrook, "SCOR: Structural Classification of RNA, Version 2.0", Nucleic Acids Research, 2004, 1, 32, 182-184.
- Ortiz,A.R. et al. (2002) MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci., 11, 2606-2621.
- Liu W, Srivastava A, Zhang J (2011) A Mathematical Framework for Protein Structure Comparison. PLoS Comput Biol 7(2): e1001075. doi:10.1371/ journal.pcbi.1001075
- Bastien O, Maréchal E. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores. BMC Bioinformatics. 2008;9:332.