RNA二级结构预测
RNA Secondary Structure PredictionDynamic Programming Approaches Sarah Aerni
http://www.tbi.univie.ac.at/
RNA二级结构预测
OutlineRNA folding Dynamic programming for RNA secondary structure prediction Covariance model for RNA structure prediction
RNA二级结构预测
RNA Basics3 2 Hydrogen Bonds – more stable
RNA bases A,C,G,U Canonical Base PairsA-U G-C G-U “wobble” pairing Bases can only pair with one other base.
Image: http:///
RNA二级结构预测
RNA Basicstransfer RNA (tRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) small interfering RNA (siRNA) micro RNA (miRNA) small nucleolar RNA (snoRNA)
http://www.genetics.wustl.edu/eddy/tRNAscan-SE/
RNA二级结构预测
RNA Secondary StructurePseudoknot Stem Interior Loop Single-Stranded
Bulge Loop Junction (Multiloop) Hairpin loop
Image– Wuchty
RNA二级结构预测
Sequence Alignment as a method to determine structureBases pair in order to form backbones and determine the secondary structure Aligning bases based on their ability to pair with each other gives an algorithmic approach to determining the optimal structure
RNA二级结构预测
Base Pair Maximization – Dynamic Programming AlgorithmS(i,j) is the folding of the subsequence of the RNA strand fromSimple Example: results in the index i to index j which highest number of base pairs
Maximizing Base Pairing
Bifurcationat i j i Unmatched atand j Umatched Base pair at
Images – Sean Eddy
RNA二级结构预测
Base Pair Maximization – Dynamic Programming AlgorithmAlignment MethodAlign RNA strand to itself Score increases for feasible base pairs
S(i, j 1, j) S(i + – 1)
Each score independent of overall structure Bifurcation adds extra dimension
Initializecannot sweeping first two Fill in squares pair, similar Bases can pair, diagonal Bases similar Dynamic Programming – arrays to 0 alignment todiagonally paths to unmatched matched alignment possible
S(i + 1, j – 1) +1Images – Sean Eddy
RNA二级结构预测
Base Pair Maximization – Dynamic Programming AlgorithmAlignment MethodAlign RNA strand to itself Score increases for feasible base pairs
Each score independent of overall structure Bifurcation adds extra dimension
Initializecannot sweeping first two Fill in squares pair, similar Bases can pair, diagonal Bases similar Dynamic Programming – Bifurcation – add values arrays to 0 diagonally alignment to matched paths possible for all k
k = 0 : Bifurcation Reminder: maxForthisk in all case S(i,k) + S(k + 1, j)Images – Sean Eddy
RNA二级结构预测
Base Pair Maximization DrawbacksBase pair maximization will not necessarily lead to the most stable structureMay create structure with many interior loops or hairpins which are energetically unfavorable
Comparable to aligning sequences with scattered matches – not biologically reasonable
RNA二级结构预测
Energy MinimizationThermodynamic StabilityEstimated using experimental techniques Theory : Most Stable is the Most likely
No Pseudknots due to algorithm limitations Uses Dynamic Programming alignment technique Attempts to maximize the score taking into account thermodynamics MFOLD and ViennaRNA
RNA二级结构预测
Energy
Minimization Results
Images – David Mount
Linear RNA strand folded back3 bases to create secondary All loops must have at least on itself in themException: Location where the beginning and end of RNA come Arcs represent base pairing together in circularized representation
structure Equivalent to having 3 base pairs between all arcs Circularized representation uses this requirement
RNA二级结构预测
Trouble with Pseudoknots
Images – David Mount
Pseudoknots cause a breakdown in the Dynamic Programming Algorithm. In order to form a pseudoknot, checks must be made to ensure base is not already paired – this breaks down the recurrence relations
RNA二级结构预测
Energy Minimization DrawbacksCompute only one optimal structure Usual drawbacks of purely mathematical approachesSimilar difficulties in other algorithmsProtein structure Exon finding
RNA二级结构预测
Alternative Algorithms CovariatonIncorporates Similarity-based methodEvolution maintains sequences that are important Covariation ensures Mutation in one Base pairing creates Expect areas of Change in sequence coincides to maintain base structure through base pairs (Covariance) pairbe ability to base yields pairing to same stable tRNA is pairing in tRNACross-species structure conservation example –breaks tRNA maintained impossible organisms structure inand RNA covarying between
Manual and automated approaches have structure is conserved down structure various species been used to identify covarying base pairs Models for structure based on resultsOrdered Tree Model Stochastic Context Free Grammar
RNA二级结构预测
Binary Tree Representation of RNA Secondary StructureRepresentation of RNA structure using Binary tree Nodes representBase pair if two bases are shown Loop if base and “gap” (dash) are shown
Pseudoknots still not represented Tree does not permit varying sequencesMismatches Insertions & DeletionsImages – Eddy et al.