Top Banner
An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical Informatics University of Pittsburgh School of Medicine Pittsburgh PA USA Presented by Thahir P. Mohamed Advancing Practice, Instruction & Innovation through Informatics October 19-23, 2008
18

An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

Jan 16, 2016

Download

Documents

Lindsey Hensley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

An algorithm to guide selection of specific biomolecules to be studied

by wet-lab experimentsJessica Wehner and Madhavi Ganapathiraju

Department of Biomedical InformaticsUniversity of Pittsburgh School of Medicine

Pittsburgh PA USA

Presented byThahir P. Mohamed

Advancing Practice, Instruction & Innovation through InformaticsOctober 19-23, 2008

Page 2: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

2

Protein Structure

Primary Structure: Chain of amino acids

Secondary Structure: Sub-structures such as helixes and strands

Tertiary Structure: Atomic resolution of protein structure

Protein structure is essential for successful design of drugs

Page 3: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

3

Challenges in Protein Structure Prediction

• X-ray crystallography, NMR spectroscopy are wet-lab methods to determine structure.

• Very expensive

• Very time consuming

• Computational techniques are applied to predict protein structure

Page 4: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

4

Computational Protein Structure Prediction

• Machine Learning techniques applied to predict structure

• Experimentally determined structures are used to learn to predict new structures

• When not enough data to learn from:

• Active learning is applied to select the next protein to be studied experimentally

Page 5: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

5

Active Learning

Unlabeled Proteins

Possible Labels:

Page 6: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

6

Cluster Unlabeled Proteins

Clustered Protiens

Possible Labels:

Active Learning

Page 7: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

7

Cluster Unlabeled Proteins

Selection Algorithm

Clustered Proteins

Possible Labels:

Active Learning

Page 8: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

8

Cluster Unlabeled Proteins

Selection Algorithm

Clustered Proteins

Possible Labels:

Active Learning

Page 9: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

9

Prediction

Labeled Protiens

Cluster Unlabeled Proteins

Selection Algorithm

Possible Labels:

Active learning guides selection of data points for which you ask for labels

Active Learning

Page 10: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

10Membrane Protein Structure Prediction

Membrane Protein importance and challenges

Membrane Proteins: 30% of genes cell regulation and signaling pathways 60% of drug targets

Yet, Difficult to study experimentally 1% of known protein structures

Active learning can be used as a tool against the limited number of known MP structures despite the large number of

known MP sequences

Page 11: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

11

‘Features’ Representation

Data reduction is performed by SVD, resulting in a final 4 features per window.

1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7

Residue: A L H W R A A G A A T V L L V I V E R G A P G A Q L I

Topology: - - - - - M M M M M M M M M M M M - - - - - - - - - -

Charge: - - p – p - - - - - - - - - - - - n p - - - - - - - -

E-Prop: D d . . A D D . D D a d d d d d d D A . D D . D a d d

Properties

ChargeSizePolarityAromaticityElectronic Properties

Page 12: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

12Clustering the Data

Dim 1Dim 2

Dim

3

Neural Network Self Organizing Map (SOM)

• Finds centroids of clusters in the data

Page 13: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

13

Design 1:Density-based Selection

• Find the most dense cluster– Choose N points closest to its centroid

– Find labels for these points (TM or NTM)

– Find the majority label, say L

– Assign L to all points in the cluster

• Repeat for next dense cluster

Clusters with no known structures are marked for study by experiments

Page 14: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

14

Design 1 Results• Increase the number of data points for which we ask

structure • Compare how accuracy varies between guided selection

(via active learning) versus random selection.

0102030405060708090

1 4 7 10 13 16 19 22 25 28 31 34 37 40

Pe

rce

nt

Number of labels per node

Density based PRECISION Density based FSCORE

Random based PRECISION Random based FSCORE

A total of only 10 labels per node ~ 1% data

Page 15: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

15

Design 2:Protein – based Selection

• Pick a random protein

• Find labels for all windows in this protein

• For each node containing labels, find the mode L of all labels it contains

• Assign L to remaining data in node

• Repeat and update for new protein, until half have been selected

Page 16: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

16

Protein-based results

Repeated for different permutations of protein selection order, and observed several metrics.

Pe

rce

nt

Page 17: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

Conclusions17

• We developed a framework that allows us to select a few proteins or fragments of proteins which, when annotated with experimental methods, may be used to label remaining protein sequences.

• We have shown that it is possible to achieve higher accuracy values with guided selection of data compared to random selection of data.

Page 18: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.

Acknowledgements

Madhavi GanapathirajuJessica Wehner

JW funded through NIH-NSF Bioengineering & Bioinformatics Summer

Institute

Visit us at

Department of Biomedical Informatics University of Pittsburgh

Thank you!

Cathedral of Learning, University of Pittsburgh

www.dbmi.pitt.edu/madhavi