Top Banner
Machine Learning Methods of P rotein Secondary Structure Pr ediction Presented by Chao Wang
22

Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

Jan 18, 2018

Download

Documents

What is secondary structure?
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

Machine Learning Methods of Protein Secondary Structure Prediction

Presented by Chao Wang

Page 2: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

• What is secondary structure?

• How to evaluate secondary structure prediction?

• How secondary structure prediction affects the accuracy of tertiary structure prediction?

• Our perspective: ``elite''

Page 3: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

What is secondary structure?

Page 4: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

• Hydrogen bond: a non-covalent bond

A hydrogen bond is identified if E in the following equation is less than -0.5 kcal/mol

Page 5: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

8-state annotation by DSSP

Page 6: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

Prediction

• Early methods of secondary-structure prediction were restricted to predicting the three predominate states: helix, sheet, or random coil. These methods were based on the helix- or sheet-forming propensities of individual amino acids, sometimes coupled with rules for estimating the free energy of forming secondary structure elements. Such methods were typically ~60% accurate in predicting which of the three states (helix/sheet/coil) a residue adopts.

Page 7: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

A significant increase in accuracy (to nearly ~80%) was made by exploiting multiple sequence alignment; knowing the full distribution of amino acids that occur at a position (and in its vicinity, typically ~7 residues on either side) throughout evolution provides a much better picture of the structural tendencies near that position. For illustration, a given protein might have a glycine at a given position, which by itself might suggest a random coil there. However, multiple sequence alignment might reveal that helix-favoring amino acids occur at that position (and nearby positions) in 95% of homologous proteins spanning nearly a billion years of evolution. Moreover, by examining the average hydrophobicity at that and nearby positions, the same alignment might also suggest a pattern of residue solvent accessibility consistent with an α-helix. Taken together, these factors would suggest that the glycine of the original protein adopts α-helical structure, rather than random coil. Several types of methods are used to combine all the available data to form a 3-state prediction, including neural networks, hidden Markov models and support vector machines. Modern prediction methods also provide a confidence score for their predictions at every position.

Page 8: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

Outline

• CNF model by Jinbo• Multi-step learning model by Yaoqi• Iterative deep learning model by Yaoqi• Our perspective: Elite.

– A new enperiment to detect how elite affects secondary structure prediction.

Page 9: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

• Methods– How to model the probability– Feature Selection

• Results– vs. other methods– Improvement

Page 10: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

Protein 8-class secondary structure prediction using conditional neural fields

Zhiyong Wang, Feng Zhao, Jian Peng, and Jinbo XuProteomics. 2011

Page 11: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
Page 12: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

Model

Page 13: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
Page 14: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
Page 15: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

Training & Prediction

Page 16: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

Features

Page 17: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

Training/testing set

Page 18: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

Results

• Outperform SSpro8 on each state

Page 19: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

• Regularization factor effect: insensitive, optimal when the factor is set to 9.

Page 20: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

• Neff effective: for SS prediction, it may not be the best strategy to use evolutionary information in as many homologs as possible. Instead, we should use a subset of sequence homologs to build sequence profile when there are many sequence homologs available.

Page 21: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

J Comput Chem. 2012

Page 22: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.