Top Banner
Protein – Protein Protein – Protein Interactions Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang
25

Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

Dec 14, 2015

Download

Documents

Brooklynn Teed
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

Protein – Protein InteractionsProtein – Protein Interactions

Lisa Chargualaf

Simon Kanaan

Keefe Roedersheimer

Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang

Page 2: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

What are proteins?What are proteins?

Basis of most living functionsBuilding blocks of life

– Substrates– Products– Enzymes

One cell contains thousands of different proteins; the human body contains 50 to 100 thousand proteins!

Page 3: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

ProteinsProteins

Composed of sequences of amino acids– Variations of 20 primary/basic amino acids

Rules governing structure:– AAs close in the folded structure may/may not be close

in primary structure– Hydrophobic residues generally buried in core;

hydrophilic are usually exposed– Protein strings cannot form knots– Related proteins generally have similar structures

Similar structures can exist without having similar sequences

Page 4: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

What is a “protein–protein”, P-P, What is a “protein–protein”, P-P, interaction and why is it important?interaction and why is it important?

Derived from the nuclear material within a cell, proteins fold and interact in intricate arrangements that provide functionality to the components of a cell, which in turn work cooperatively to form whole body systems.

Protein-protein interactions serve as the chemical basis of all living organisms.

Understanding protein interactions helps us understand the protein network.

Page 5: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

What causes P-P What causes P-P interactions?interactions?

Many speculations arise when it comes to the driving force behind proteins interacting with each other– Primary sequence dictating

interaction between attached functional groups

– Protein domains drive proteins to fold and interact as they do.

Page 6: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

What are protein domains?What are protein domains?

significant portions of proteins composed of distinct peptides the key to intricate

arrangements

Page 7: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

Domains and ProteinsDomains and Proteins A single protein molecule can possess multiple domains,

causing difficulty in discovering a simple formula that dictates the manner by which protein-protein interactions occur.

Yet, certain affinities exist between certain protein domains and are frequently seen in living organisms.

This drives our research that seeks to extrapolate the mechanism of protein-protein interactions to focus on domain-domain interactions as a factor.

The model system used for these proceedings is the yeast cell, with several of its proteins serving as the test cases. This is done using a protein family data bank available online.

Page 8: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

Our “Formula” dictating which Our “Formula” dictating which P-P interactions occurP-P interactions occur

A data bank gives a list of protein interactions.

A protein interaction, (P1, P2), is explained by a domain pair, (D1, D2), if P1 includes one domain and P2 includes the other.

Find the minimum number of domain pairs that explains the databank. Equivalent to Minimum Set Cover problem.

Page 9: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

Minimum Set Cover ProblemMinimum Set Cover Problem

The problem of finding the minimum size set of sets whose union is equal to the union of all the sets.

NP complete problem.

Page 10: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

Why the Minimum Set of Why the Minimum Set of Domains?Domains?

Lets look at the following case:

– P1 contains domains D2– P2 contains domains D2 and D3– P3 contains domains D2 and D4– P4 contains domains D2 and D5

And lets assume the protein interactions are:

P1 - P1 P1 - P2 P1 - P3 P1 - P4

P-P interactions explained by:– (D2 - D2)– (D2 - D3)– (D2 - D4)– (D1 - D5)

Or by:– (D2 - D2)

Page 11: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

Mapping to MSCMapping to MSC Let

P1 - P1 = 0 P1 - P2 = 1 P1 - P3 = 2 P1 - P4 = 3

Each pair’s interactions D2-D2={0,1,2,3} D2-D3={1} D2-D4={2} D1-D5={3}

This maps to the integer MSC problem with a global set of {0,1,2,3} and subsets of {{0,1,2,3},{1},{2},{3}}

Solution is D2-D2, more difficult for larger problems.

Page 12: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

Implementation/AlgorithmImplementation/Algorithm

This base algorithm consists of functions that can record the protein structure and interaction information and store them into different data structures.

It also builds a domain-domain matrix. This matrix holds information about interacting domains. Each entry

in the matrix represents the number of times domains Di and Dj were observed as the possible cause in different protein-protein interactions.

Example: – P1:{D1, D2, D3} and P2 {D1, D5} interact.

(D1, D1), (D1, D5), (D2, D1), (D2, D5), (D3, D1) and (D3, D5).

Page 13: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

Exact ProblemsExact Problems

In the worst case, (# of domains)^2 number of domain interactions, corresponding to subsets.

Large number of proteins corresponding to a global set.

MSC is an NP complete problem, the exact solution requires considering all combinations of subsets.

Computationally expensive, impractical for more than ~10 domains. There are thousands in a real problem.

Page 14: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

Implementation/AlgorithmImplementation/Algorithm

Algorithm approximates the minimum set of domains pairs.

Algorithm needs to be able to choose d-d pairs in an educated, not a randomized fashion.

This educated way can be done using weight functions. Where each domain pair is given a weight, and the largest of the weights is chosen.

Page 15: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

Different FunctionsDifferent Functions

Different weight functions were considered.Decided on looking at two for now:

– MSC– MSC by probability

Also looked at running MSC twice with the addition of adding pairs with a high probability of interacting.

Page 16: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

MSCMSC

Assumption:– most common observed interacting domain pair among

the protein interactions is probably the cause of the protein interactions.

While there are P-P interactions to be explained{

– Chooses the most common observed interacting domain Di-Dj.– Removes Di-Dj

Removes all P-P interactions from the data being observed Undoes P-P interactions effect on matrix

}

Page 17: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

MSC by ProbabilityMSC by Probability

Assumption:– Incorporate the absence of p-p interactions. – Initialize matrix just like MSC.

go through every element in the matrix and divide that entry by the total number of proteins that contain the first domain times the number of proteins which contain the second domain.

Now each element now represents the probability that domains i and j interact.

– Then the weight function goes about choosing the highest probability in the matrix, seeing which proteins this domain pair explains, remove these proteins influence from the data and then performing the same tasks again.

Page 18: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

PredictionPrediction

Input set of proteins with known structure.Set of domains pairs obtained from

algorithm being observed.Go through each interacting domain pair

(Di, Dj)Every protein contained domain Di is

considered interacting with a protein containing Dj.

Page 19: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

TestingTesting

Running MSC approximation VS. MSC exact on very small sets to see how good the approximation really is to exact solution.

Page 20: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

TestingTesting

Building different size training data using swiss pfam A database among others.

Running The aproximation algorithms on these sets.

Running AM on the same sets.Attempting to use similar size sets to MLE

for comparisons sake.

Page 21: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

TestingTesting

Compares calculated P-P interactions with observed interactions. (number of matches, false positive, and false negative p-p interactions)

Calculate fold, specificity, and sensitivity in order to compare to previous research.

Page 22: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

ResultsResults

Page 23: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

ResultsResults

Page 24: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

ResultsResults

Page 25: Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.

Future WorkFuture Work

Finish Testing and comparing different Weight Functions.

Getting some stats by running different algorithms multiple times on different size data sets.

Testing MSC exact vs. different weight functions