Identification of amino acid residues in protein- protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence- and structure- based features employed Angshuman Bagchi, Ph.D Assistant Professor of Biochemistry Department of Biochemistry and Biophysics University of Kalyani Formerly postdoctoral fellow in Buck Institute, Stanford University, California, USA Purdue University, Indianapolis, USA Email: [email protected]
14
Embed
Identification of amino acid residues in protein-protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence-
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Identification of amino acid residues in protein-protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence- and structure- based features employed
Angshuman Bagchi, Ph.D
Assistant Professor of Biochemistry
Department of Biochemistry and Biophysics University of Kalyani
Formerly postdoctoral fellow in Buck Institute, Stanford University, California, USA
•A support vector machine (SVM) is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis.
•Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other.
•An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible.
•New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.
• Surface residue: An amino acid with its accessible surface area (ASA) > 15% of its total area
• Interface residue: A surface residue with at least one heavy atom located within a distance of 5Å from any of the heavy atoms of its interacting partner
• Dataset: 274 high resolution X-ray hetero-complex structure files with 10597 interface residues (+ve) and 27333 non-interface surface residues (-ve) (Jo-Lan et al., Proteins, 2006)
Features
• Sequence based: Obtained from sequence conservations using PSI-BLAST
• Structure based (2ndary Structure, Charge, Solvent accessibility, B-factor etc.): Obtained using S-BLEST (Mooney et al., Proteins, 2005), DSSP (Kabasch & Sander, Biopolymers, 1983), PDB files
Development of PPI predictorThe dataset was divided into the following two categories with equal number of PPI (positive) and non-PPI (negative) examples. This balanced dataset was used for the training purposes.
The dataset used is structure (interface residues as positives and non-interface surface residues as negatives)
The dataset used is sequence (interface residues as positives and non-interface surface residues as negatives)
Case Study
Top-scoring amino acid residues from the crystal structure of the antibody N10-staphylococcal nuclease complex (PDB ID: 1NSN). The backbone of the antibody N10 is presented in black whereas the staphylococcal nuclease is shown as surface in cyan. The top scoring amino acid residues are highlighted.