TY - JOUR

T1 - RNA secondary structures in a polymer-zeta model how foldings should be shaped for sparsification to establish a linear speedup

AU - Jin, Emma Yu

AU - Nebel, M. E.

N1 - Export Date: 22 March 2017

PY - 2016

Y1 - 2016

N2 - Various tools used to predict the secondary structure for a given RNA sequence are based on dynamic programming used to compute a conformation of minimum free energy. For structures without pseudoknots, a worst-case runtime proportional to n3, with n being the length of the sequence, results since a table of dimension n2 has to be filled in while a single entry gives rise to a linear computational effort. However, it was recently observed that reformulating the corresponding dynamic programming recursion together with the bookkeeping of potential folding alternatives (a technique called sparsification) may reduce the runtime to n2 on average, assuming that nucleotides of distance d form a hydrogen bond (i.e. are paired) with probability (Formula Presented.) for some constants b > 0, c > 1. The latter is called the polymer-zeta model and plays a crucial role in speeding up the above mentioned algorithm. In this paper we discuss the application of the polymer-zeta property for the analysis of sparsification, showing that it must be applied conditionally on first and last positions to pair. Afterwards, we will investigate the combinatorics of RNA secondary structures assuming that the corresponding conditional probabilities behave according to a polymer-zeta probability model. We show that even if some of the structural parameters exhibit an almost realistic behavior on average, the expected shape of a folding in that model must be assumed to highly differ from those observed in nature. More precisely, we prove our polymer-zeta model to be appropriate for mRNA molecules but to fail in connection with almost every other family of RNA. Those findings explain the huge speedup of the dynamic programming algorithm observed empirically by Wexler et al. when applying sparsification in connection with mRNA data. © 2015, Springer-Verlag Berlin Heidelberg.

AB - Various tools used to predict the secondary structure for a given RNA sequence are based on dynamic programming used to compute a conformation of minimum free energy. For structures without pseudoknots, a worst-case runtime proportional to n3, with n being the length of the sequence, results since a table of dimension n2 has to be filled in while a single entry gives rise to a linear computational effort. However, it was recently observed that reformulating the corresponding dynamic programming recursion together with the bookkeeping of potential folding alternatives (a technique called sparsification) may reduce the runtime to n2 on average, assuming that nucleotides of distance d form a hydrogen bond (i.e. are paired) with probability (Formula Presented.) for some constants b > 0, c > 1. The latter is called the polymer-zeta model and plays a crucial role in speeding up the above mentioned algorithm. In this paper we discuss the application of the polymer-zeta property for the analysis of sparsification, showing that it must be applied conditionally on first and last positions to pair. Afterwards, we will investigate the combinatorics of RNA secondary structures assuming that the corresponding conditional probabilities behave according to a polymer-zeta probability model. We show that even if some of the structural parameters exhibit an almost realistic behavior on average, the expected shape of a folding in that model must be assumed to highly differ from those observed in nature. More precisely, we prove our polymer-zeta model to be appropriate for mRNA molecules but to fail in connection with almost every other family of RNA. Those findings explain the huge speedup of the dynamic programming algorithm observed empirically by Wexler et al. when applying sparsification in connection with mRNA data. © 2015, Springer-Verlag Berlin Heidelberg.

U2 - 10.1007/s00285-015-0894-z

DO - 10.1007/s00285-015-0894-z

M3 - Journal article

C2 - 26001743

VL - 72

SP - 527

EP - 571

JO - Journal of Mathematical Biology

JF - Journal of Mathematical Biology

SN - 0303-6812

IS - 3

ER -