ROC 2008 meeting 1 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program A Computational Method to Identify Amino Acid Residues in RNA-protein Interactions Michael Terribilini & Jae-Hyung Lee Cornelia Caragea, Deepak Reyon, Ben Lewis, Jeffry Sander, Robert Jernigan, Vasant Honavar and Drena Dobbs Bioinformatics and Computational Biology Program Center for Computational Intelligence, Learning, an d Discovery L.H. Baker Center for Bioinformatics & Biological S tatistics BCB NSF IGERT
22
Embed
A Computational Method to Identify Amino Acid Residues in RNA-protein Interactions
BCB. NSF IGERT. A Computational Method to Identify Amino Acid Residues in RNA-protein Interactions. Michael Terribilini & Jae-Hyung Lee Cornelia Caragea, Deepak Reyon, Ben Lewis, Jeffry Sander, Robert Jernigan, Vasant Honavar and Drena Dobbs Bioinformatics and Computational Biology Program - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ROC 2008 meeting 17/27/2008
Center for Computational Intelligence, Learning, and Discovery
Bioinformatics and Computational Biology Program
A Computational Method to Identify Amino Acid Residues in RNA-protein Interactions
Michael Terribilini & Jae-Hyung Lee
Cornelia Caragea, Deepak Reyon, Ben Lewis, Jeffry Sander, Robert Jernigan, Vasant Honavar and Drena Dobbs
Bioinformatics and Computational Biology ProgramCenter for Computational Intelligence, Learning, and Discovery
L.H. Baker Center for Bioinformatics & Biological Statistics
BCBNSF IGERT
ROC 2008 meeting 27/27/2008
Center for Computational Intelligence, Learning, and Discovery
Bioinformatics and Computational Biology Program
PROBLEM: Given the sequence of a protein (& possibly its structure), predict which amino acids participate in protei
n-RNA interactions
APPROACH: Generate datasets of known complexes from PDB to train & test machine learning algorithms (Naïve Bayes, SVM, etc.)
GOAL: Classify each amino acid in target protein as either interface or non-interface residue
Guiding hypothesis: Principal determinants of protein binding sites are reflected in local sequence features Observation: Binding site residues are often clustered within primary amino acid sequence
ROC 2008 meeting 37/27/2008
Center for Computational Intelligence, Learning, and Discovery
Bioinformatics and Computational Biology Program
Sequence-Based Classifier:• RB181 non-redundant dataset: 181 protein-RNA complexes from the PDB• Input: window of amino acid identities centered on target & contiguous in protein sequence• Classifier: Naïve Bayes• Leave-one-out cross validation
QSVSTSSFRYM
Ser 28
Structure-Based Classifier:• Calculate distance between each pair of residues in known structure • Input: identities of the nearest n spatial neighbors • Classifier: Naïve Bayes• Leave-one-out cross validation
SSFRLNKSGRT
Ser 28
PSSM-Based Classifier:• PSI-BLAST against NCBI nr database to generate PSSMs• Input: PSSM vectors for residues contiguous in sequence• Classifier: Support Vector Machine (SVM)• 10-fold cross validation
Ser 28
-3,7,8,… 5,-4,-6, … … … …,5,9,-1,…
QSVSTSSFRYM
20
PROBLEM: Given the sequence of a protein (& possibly its structure), predict
which amino acids participate in protein-RNA interactions
ROC 2008 meeting 47/27/2008
Center for Computational Intelligence, Learning, and Discovery
Bioinformatics and Computational Biology Program
Dataset of RNA-protein Interface Residues
Extract All Protein-RNA ComplexesSelect high resolution structures < 3.5Å Res
PDB
503 Complex
es
181 Chains48,791
Residues
Filter using PISCES< 30% pair-wise sequence identity
Identify Interface Residues using distance cutoff 5 Å
7,456 InterfaceResidues
(Positive examples)
41,335 Non-Interface Residues
(Negative examples)
PISCES: Wang and Dunbrack, 2003 Bioinformatics, 19:1589
ROC 2008 meeting 57/27/2008
Center for Computational Intelligence, Learning, and Discovery
Bioinformatics and Computational Biology Program
Complex Protein-Protein Protein-DNA Protein-RNA
Classifier2-stage classifier
SVM + Naïve Bayes Naïve Bayes Naïve Bayes
Accuracy 72 % 77 % 85 %
Specificity 58 % 37 % 51 %
Sensitivity 39 % 43 % 38 %
Correlationcoefficient 0.30 0.25 0.35
ReferenceYan et al., 2004Bioinformatics
Yan et al., 2006BMC Bioinformatics
Terribilini et al., 2006 RNA
Related workJones & Thornton,Ofran & Rostmany others
Jones et al.,Thornton et al.,Ahmad & Sarai
Jeong et al.,Miyano et al.,Go et al.
Performance in predicting interface residuesUsing only protein sequence as input
ROC 2008 meeting 67/27/2008
Center for Computational Intelligence, Learning, and Discovery
1Specificity (Precision for the positive, RNA-binding class)2Sensitivity (Recall for the positive, RNA-binding class)3Area Under the Curve (AUC) from a Receiver Operating Characteristic (ROC) curve
ROC 2008 meeting 87/27/2008
Center for Computational Intelligence, Learning, and Discovery
Predictions for Signal Recognition Particle 19kDa protein (PDB ID 1JID_A)
ROC 2008 meeting 97/27/2008
Center for Computational Intelligence, Learning, and Discovery
Bioinformatics and Computational Biology Program
RNABindR: An RNA Binding Site Prediction Server
ROC 2008 meeting 107/27/2008
Center for Computational Intelligence, Learning, and Discovery
Bioinformatics and Computational Biology Program
Applications
• Lentiviral Rev proteins
• Telomerase Reverse Transcriptase (TERT)
http://telomerase.asu.edu/
ROC 2008 meeting 117/27/2008
Center for Computational Intelligence, Learning, and Discovery
Bioinformatics and Computational Biology Program
Rev - a potential target for novel HIV therapies
• Rev is a multifunctional regulatory protein that plays an essential role in the production of infectious virus• A small nucleo-plasmic shuttling protein
(HIV Rev 115 aa; EIAV Rev 165 aa)• Recognizes a specific binding site on viral RNA
Rev Responsive Element (RRE)• Contains specific domains that mediate nuclear
localization, RNA binding and nuclear export• Rev's critical role in lentiviral replication makes it an attrac
tive target for antiviral (AIDs) therapy
ROC 2008 meeting 127/27/2008
Center for Computational Intelligence, Learning, and Discovery
Bioinformatics and Computational Biology Program
• Why?– Rev aggregates at concentrations needed for NMR or X-ray
crystallography– The only high resolution information available is for short peptide
fragments of HIV-1 Rev: a 22 amino acid fragment of Rev bound to a 34 nucleotide RRE RNA fragment
• What about insights from sequence comparisons? – HIV Rev sequence has low sequence identity with proteins with
known structure– Very little sequence similarity among different Rev family
members (e.g., EIAV vs HIV < 10%)
Problem: no high resolution Rev structure! - not even for HIV Rev, despite intense effort
ROC 2008 meeting 137/27/2008
Center for Computational Intelligence, Learning, and Discovery
Bioinformatics and Computational Biology Program
HIV-1 Rev: Predictions vs ExperimentsPrediction on RNA-binding protein HIV-1 Rev
human telomerase complex– Many other interacting proteins:
e.g., PPI1, RAP1, TEP1, HSP90
Lingner (1997) Science 276: 561-567
Adapted from P. J. Mason
ROC 2008 meeting 187/27/2008
Center for Computational Intelligence, Learning, and Discovery
Bioinformatics and Computational Biology Program
Human TERT: Preliminary docking of 3 modeled domains
Preliminary model (lacking TEN domain)
KurcinskiKolinskiKloczkowski
ROC 2008 meeting 197/27/2008
Center for Computational Intelligence, Learning, and Discovery
Bioinformatics and Computational Biology Program
Predicted vs Actual RNA-Binding Residue in Human TRBD
Predicted Actual
ROC 2008 meeting 207/27/2008
Center for Computational Intelligence, Learning, and Discovery
Bioinformatics and Computational Biology Program
Current & future work
Future: – Experimentally interrogate protein-RNA interfaces suggested by this work – Investigate these interfaces as potential therapeutic targets
Progress towards our Goals?
√ Model TERT domains from human√ Dock domains to generate a complete model for TERT protein Generate a working model for TERT-TR complex
Predict TR RNA tertiary structure, then dock with protein Underway…
ROC 2008 meeting 217/27/2008
Center for Computational Intelligence, Learning, and Discovery
Bioinformatics and Computational Biology Program
Conclusions
•A combined classifier that uses the query sequence plus additional information derived from the known structure & a PSSM generated using PSI-BLAST sequence homologs (trained and tested on RB181, a dataset of diverse protein-RNA interfaces), predicts interface residues with ~ 86% overall accuracy, CC = 0.43
•Combining structure prediction with machine learning has potential to provide valuable insights into structure & function of important large RNP complexes - especially those for which high-resolution experimental structural information is not yet available
•Computational methods can provide insight into protein-RNA interfaces, even for "recalcitrant" proteins whose structures are not yet available
ROC 2008 meeting 227/27/2008
Center for Computational Intelligence, Learning, and Discovery
Bioinformatics and Computational Biology Program
AcknowledgementsDobbs Lab @ Iowa State Universityhttp://ddobbs.public.iastate.edu/
Drena Dobbs, BCB & GDCB
– Michael Terribilini
– Jeffry Sander
– Peter Zaback
– Deepak Reyon
– Ben Lewis
Kolinski Lab @ University of Warsawhttp://biocomp.chem.uw.edu.pl/
Andrzej Kolinski, Chemistry– Mateusz Kurcinski
@ Iowa State University
Andrzej Kloczkowski, BBMB Robert Jernigan, BBMB Kai-Ming Ho, Physics
Iowa State University:Bioinformatics & Computational Biology Program (BCB)LH Baker Center for Bioinformatics & Biological StatisticsCenter for Integrated Animal Genomics (CIAG)Center for Computational Intelligence, Learning & Discovery (CILD)
Honavar Lab @ Iowa State Universityhttp://www.cs.iastate.edu/~honavar/aigroup.htm