Predicting RNA Structure and Function

Post on 08-Jan-2016

17 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA. - PowerPoint PPT Presentation

Transcript

PredictingRNA Structure and Function

Following the human genome sequencing

there is a high interest in RNA

“Just when scientists thought they had deciphered the roles played by the cell's leading actors, a familiar performer has turned up in a stunning variety of guises. RNA, long upstaged by its more glamorous sibling, DNA, is turning out to have star qualities of its own “ SCINECE NEWS 12: 2002

Ribozyme

The Ribosome : The protein factory of the cell mainly made of RNA

Non coding DNA (98.5% human genome)

• Intergenic

• Repetitive elements

• Promoters

• Introns

• untranslated region (UTR)

Some biological functions of ncRNA

• mRNA cellular localization

• Control of mRNA stability

• Control of splicing

• Control of translation

The function of the RNA molecule depends on its folded structure

RNA Structural levels

tRNA

Secondary Structure Tertiary Structure

Control of Iron levels by mRNA structure

G U A GC N N N’ N N’ N N’ N N’C N N’ N N’ N N’ N N’ N N’ 5’ 3’

conserved

Iron Responsive ElementIRE

Recognized byIRP1, IRP2

IRP1/2

5’ 3’F mRNA

5’ 3’TR mRNA

IRP1/2

F: Ferritin = iron storageTR: Transferin receptor = iron uptake

IRE

Low Iron IRE-IRP inhibits translation of ferritinIRE-IRP Inhibition of degradation of TR

High IronIRE-IRP off -> ferritin translated

Transferin receptor degradated

RNA Secondary Structure

U U

C G U A A UG C

5’ 3’

5’G A U C U U G A U C

3’

STEM

LOOP• The RNA molecule folds on itself. • The base pairing is as follows: G C A U G U hydrogen bond.

RNA Secondary structureShort Range Interactions

G G A U

U GC C GG A U A A U G CA G C U U

INTERNAL LOOP

HAIRPIN LOOP

BULGE

STEM

DANGLING ENDS5’ 3’

long range interactions of RNA secondary structural elements

Pseudo-knot

Kissing hairpins

Hairpin-bulge contact

These patterns are excluded from the prediction schemes as their computation is too intensive.

Predicting RNA secondary Structure

• Searching for a structure with Minimal

Free Energy (MFE)

• According to base pairing rules only

Watson Crick A-T G-C and wobble pairs G-T

can from stems

Simplifying Assumptions for Structure Prediction

• RNA folds into one minimum free-energy structure.

• There are no knots (base pairs never cross).

• The energy of a particular base pair in a double stranded regions is calculated independently– Neighbors do not influence the energy.

Solution : Searching for MFE with Dynamic ProgrammingZucker and Steigler 1981

Sequence dependent free-energy values of the base pairs

(nearest neighbor model) U U

C G G C A UG CA UCGAC 3’5’

U U

C G U A A UG CA UCGAC 3’5’

Assign negative energies to interactions between base pair regions.Energy is influenced by the previous base pair (not by the base pairs further down).

Sequence dependent free-energy values of the base pairs

(nearest neighbor model) U U

C G G C A UG CA UCGAC 3’5’

U U

C G U A A UG CA UCGAC 3’5’

Example values:GC GC GC GCAU GC CG UA -2.3 -2.9 -3.4 -2.1

These energies are estimated experimentally from small synthetic RNAs.

Adding Complexity to Energy Calculations

• Positive energy - added for destabilizing regions such as bulges, loops, etc.

• More than one structure can be predicted

Free energy computation

U UA A G C G C A G C U A A U C G A U A 3’A5’

-0.3

-0.3

-1.1 mismatch of hairpin-2.9 stacking

+3.3 1nt bulge -2.9 stacking

-1.8 stacking

5’ dangling

-0.9 stacking-1.8 stacking

-2.1 stacking

G= -4.6 KCAL/MOL

+5.9 4 nt loop

Prediction Tools based on Energy Calculation

Fold, Mfold Zucker & Stiegler (1981) Nuc. Acids Res.

9:133-148Zucker (1989) Science 244:48-52

RNAfoldVienna RNA secondary structure serverHofacker (2003) Nuc. Acids Res. 31:3429-3431

Insight from Multiple Alignment

Information from multiple sequence alignment (MSA) can help to predict the probability of positions i,j to be base-paired.

G C C U U C G G G CG A C U U C G G U CG G C U U C G G C C

Compensatory Substitutions

U U

C G U A A UG CA UCGAC 3’

G C

5’

Mutations that maintain the secondary structure

RNA secondary structure can be revealed by

identification of compensatory mutations

G C C U U C G G G CG A C U U C G G U CG G C U U C G G C C

U CU GC GN N’G C

Insight from Multiple Alignment

Information from multiple sequence alignment (MSA) can help to predict theprobability of positions i,j to be base-paired.

•Conservation – no additional information•Consistent mutations (GC GU) – support stem•Inconsistent mutations – does not support stem.•Compensatory mutations – support stem.

RNAalifold (Hofacker 2002)From the vienna RNA package

Predicts the consensus secondarystructure for a set of aligned RNA sequences by using modified dynamic programming algorithm that addalignment information to the standardenergy model

Improvement in prediction accuracy

Other related programs

• COVE

RNA structure analysis using the covariance model (implementation of the stochastic free grammar method)

• QRNA (Rivas and Eddy 2001)

Searching for conserved RNA structures

• tRNAscan-SE tRNA detection in genome sequences

Sean Eddy’s Lab WUhttp://www.genetics.wustl.edu/eddy

RNA families

• Rfam : General non-coding RNA database

(most of the data is taken from specific databases)

http://www.sanger.ac.uk/Software/Rfam/

Includes many families of non coding RNAs and functionalmotifs, as well as their alignment and their secondary structures

Rfam /Pfam

• Pfam uses the HMMER

(based on Hidden Markov Models)

• Rfam uses the INFERNAL

(based on Covariation Model)

Rfam (currently version 7.0)

• 503 different RNA families or functional

Motifs from mRNA, UTRs etc.

View and download multiple sequence alignments Read family annotation Examine species distribution of family members Follow links to otherdatabases

An example of an RNA family miR-1 MicroRNAs

mir-1 microRNA precursor family This family represents the microRNA (miRNA) mir-1 family. miRNAs are transcribed as ~70nt precursors (modelled here) and subsequently processed by the Dicer enzyme to give a ~22nt product. The products are thought to have regulatory roles through complementarity to mRNA.

Seed alignment (based on 7 sequences)

BACK TO PROTEINS

Predicting Protein function

• Expression data

• Protein Structure

32

33

2.0

-2.0

0

wt

other RNAprocessing

export

splicingtranscriptiondecay

splicing

Microarray data for yeast genes

34

Using SVMs to predict function based on expression data

Each dot represents a vector of the expression pattern taken from a microarray experiment . For example the expression pattern of all genes coding for proteins involved in splicing

Splicing factors

others

35

How do SVM’s work with expression data?In this example blue dots can be proteins involved in splicingand red are all the rest

kernel

The SVM is trained on experimentally verified data

?

After training the SVM we can use it to predict hypothetical genes based on their expression pattern

How do SVM’s work with expression data?In this example blue dots can be proteins involved in splicingand red are all the rest

Structural Genomics : a large scale structure determination project designed to cover all representative protein structures

Zarembinski, et al., Proc.Nat.Acad.Sci.USA, 99:15189 (1998)

ATP binding domain of protein MJ0577

Predicting function from structure

As a result of the Structure Genomic initiative many structures of proteins with unknown function will be solved

Wanted !Automated methods to predict function from the protein structures resulting from the structural genomic project.

Approaches for predicting function from structure

ConSurf - Mapping the evolution conservation on the protein structure http://consurf.tau.ac.il/

Approaches for predicting function from structure

PHPlus – Identifying positive electrostatic patches on the protein structure http://pfp.technion.ac.il/

Approaches for predicting function from structure

SHARP2 – Identifying positive electrostatic patches on the protein structure http://www.bioinformatics.sussex.ac.uk/SHARP2

42

ALL TOGETHER….

top related