Top Banner
Using Pictorial Structures to Identify Proteins in X- ray Crystallographic Electron Density Maps Frank DiMaio [email protected] Jude Shavlik [email protected] George N. Phillips, Jr. [email protected] ICML Bioinformatics Workshop 21 August 2003
22

Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio [email protected] Jude Shavlik [email protected].

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Frank DiMaio [email protected] Shavlik [email protected] N. Phillips, Jr. [email protected]

ICML Bioinformatics Workshop21 August 2003

Page 2: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Task Overview

Given • Electron density for a

region in a protein• Protein’s topology

Find• Atomic positions of

individual atoms in the density map

Page 3: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Pictorial Structures

A pictorial structure is…

a collection of image parts

together with…a deformable conformation of these parts

Page 4: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Pictorial Structures

Formally, a model consists of

Set of parts V={v1, …, vn}

Configuration L=(l1, …, ln)

Edges eij E, connect neighboring parts vi, vj

– Explicit dependency between li, lj

– G = (V,E) forms a Markov Random Field

Appearance parameters Ai for each part

Connection parameters Cij for each edge

v3

v4 v5

v6

v1 v2

e13 e23

e34

e35

e46

v4

Page 5: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Matching Algorithm Overview Want configuration L of model Θ maximizing

P(L|I,Θ) P(I|L,Θ) · P(L|Θ)

P(I|L,Θ) = Πi P(I|li,Θ) =1

Z1e- Σi matchi(li)

P(L|Θ) = Π (vi,vj)E P(li,lj|Cij) =1

Z2e- Σ(vi,vj)E dij(li,lj)

Equivalent to minimizing

Σi matchi(li) + Σ(vi,vj)E dij(li,lj)

Page 6: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Linear-Time Matching Algorithm A Dynamic Programming implementation runs in

quadratic time

Requires tree configuration of parts

Felzenszwalb & Huttenlocher (2000) developed linear-time matching algorithm

Additional constraint on part-to-part cost function dij

Basic “Trick”: Parallelize minimization computation over entire grid using a Generalized Distance Transform

Page 7: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Pictorial Structures for Map InterpretationBasic Idea: Build pictorial structure that is able to model all configurations of a molecule

Each part in “collection of parts” corresponds to an atom

Model has low-cost conformation for low-energy states of the molecule

Page 8: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

The Screw-Joint Model

Ideally, we would have

cost function = atomic energy

Problem: Impossible to represent atomic energy function using pairwise potentials while maintaining tree-structure

Solution: screw-joint model Ignore non-bonded interactions

Edges correspond to covalent bonds

Allow free rotation around bonds

Page 9: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Screw-Joint Model Details Each part’s configuration has six params (x,y,z,α,β,γ) with

(x,y,z) is part’s position α is part’s rotation (about bond connecting vi and vj)

(β,γ) is part’s orientation

vi

vj

vi

vj(xij,yij,zij)

(βi,γi)

(βj,γj) (xi,yi,zi)

(xj,yj,zj)

αj

αi

Part-to-part cost function dij based on child’s deviation from ideal

Matching cost function matchi based on 3x3x3 template match

Page 10: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Pictorial Structures for Map Interpretation

Ideally, we would … Build pictorial structure for the entire protein Run the matching algorithm to get best layout

However, computationally infeasible

Instead, we use two-phase algorithm that …a) computes best backbone trace

b) computes best sidechain conformation(current focus)

Page 11: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Sidechain Refinement Assume we have a rough Cα trace of the protein

Next use pictorial structure matching to place sidechains

Walk along chain one residue at a time, placing individual atoms

Cα, MET_80

Cα, ARG_81

Cα, ALA_82

Cα, PRO_83

Page 12: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Sidechain Refinement

Given: residue type approximate Cα locations

Find: most likely location for sidechain atoms in the residue

Example Alanine

N

C-1 Cα

Cα-1 O-1 C Cβ

O

Cα+1

N+1

O

N NO Matching

algorithm

Page 13: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Learning Model Parameters

O

N N

OC

N

CβAveraged 3D Template

Averaged Bond Geometry

Canonic Orientation

N

C-1 Cα

C Cβ

O N+1

Alanine Cα

C

N

r = 1.53θ = 0.0°φ = -19.3°

r = 1.51θ = 118.4°φ = -19.7°

Page 14: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Soft Maximums

Sometimes we may get an optimal match like the one to the right

When this occurs, explore the space of non-optimal solutions via soft maximums in DP

Basic Idea: Take a path with probability inversely proportional to its cost

ACTUAL PREDICTED 1

Page 15: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Soft Maximums

Figure to the right shows soft maximums

Red molecule eventually found

Annealing increases “softness” until legal structure found

Legal structure may not be “right”

ACTUAL PREDICTED 1

PREDICTED 2

Page 16: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Results

Only sidechain refinement implemented & tested Experimental Methodology

Assume Cα’s known to within 2Å

Trained on 1.7 Å resolution protein, tested on 1.9 Å resolution protein

Templates built for ALA, VAL, TYR, LYS

Model Parameters Grid spacing of 0.5 Å within diameter 10 Å sphere Rotational discretization:

12 rotational steps 84 orientations

Page 17: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Sidechain Placement

Compared predicted vs. actual location for 599 atoms on testset protein

29.9% atoms within 0.5Å

72.3% atoms within 1.0Å

93.0% atoms within 2.0Å

Recall 0.5Å grid spacing

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8

Accuracy (angstroms)

% a

tom

s

Page 18: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Predictive Accuracy Task

We used DP matching score as a predictor of amino acid type

Tested 49 ALA, LYS, TYR, VAL residues

Highest scoring normalized template determined type

61.2% accuracy (majority classification = 33%)

ala

lys

tyr

val

alalystyrval

0

2

1

7

1

7

6

0

9

2

3

2

0

8

1

0

actual

predicted

Page 19: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

The Good… PREDICTEDPREDICTED vs. ACTUALACTUAL

LYSINELYSINE

VALINE

TYROSINE

Page 20: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

… and the Bad PREDICTEDPREDICTED vs. ACTUALACTUAL

LYSINE

ALANINETYROSINE

VALINE

Page 21: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Future Work

Implement & integrate backbone tracing algorithm, to create complete two-tiered solution

Better strategies to handle illegal molecule configurations perturbation of branches involved in collisions

more accurate representation of atomic energy function, e.g. torsion angle

Better match function … make use of previous work?

More tests (larger training set, higher resolution)

Page 22: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu.

Acknowledgements

NLM grant 1T15 LM007359-01

NLM grant 1R01 LM07050-01

NIH grant P50 GM64598.