t Hollingsworth (Department of Biochemistry & Biophysics, Oregon State University r: Dr. P. Andrew Karplus (Department Of Biochemistry & Biophysics, OSU) llaboration With: Dr. Weng-Keen Wong (Department Of Computer Science, OSU) Dr. Donald Berkholz (Department of Biochemistry and Molecular Biology, Mayo Dr. Dale Tronrud (Department of Biochemistry & Biophysics, OSU)
22
Embed
Scott Hollingsworth (Department of Biochemistry & Biophysics, Oregon State University) Mentor: Dr. P. Andrew Karplus (Department Of Biochemistry & Biophysics,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Scott Hollingsworth (Department of Biochemistry & Biophysics, Oregon State University)
Mentor: Dr. P. Andrew Karplus (Department Of Biochemistry & Biophysics, OSU)In Collaboration With: Dr. Weng-Keen Wong (Department Of Computer Science, OSU)
Dr. Donald Berkholz (Department of Biochemistry and Molecular Biology, Mayo Clinic)Dr. Dale Tronrud (Department of Biochemistry & Biophysics, OSU)
Each protein has an individual structure
Structure flows from function
Understand structure, understand function
Ptr Tox A
Phi & Psi (φ, ψ) Phi and psi describe the
conformation of the planar peptide (amino acid) in regards to other peptides
One amino acid – two angles
Ramachandran PlotVoet, Voet & Pratt Biochemistry
(Upcoming 4th Edition)
φ
ψ
Use of Protein Geometry Database (PGD) to identify linear group existence (i.e. α-helix, β-sheet, π-helix…) Simple repeating structures Methods: manual searches Hollingsworth et al. 2009. “On the
occurrence of linear groups in proteins.” Protein Sci. 18:1321-25
α-Helix
310 Helix
Linear groups are only part of the picture Not all common protein motifs are repeating structures Many have changing conformations
Goal of this research: Identify all common motifs in proteins
Too complex for manual searches Enter machine learning
Form of artificial intelligence
Can identify clusters within a dataset Cluster – significant grouping of data points
Visual example…
Topographical map of OregonData value: Elevation
Highest points (Individual peaks)
Mt. Hood(11,239 Feet)
Mt. Jefferson(10,497 Feet)
Three Sisters(10,358-10,047 Feet)
Topographical map of OregonData value: Elevation
Highest points (Individual peaks)
Topographical map of OregonData value: Elevation
Mountain ranges (Broad patterns)
C A
S C
A D
E S
C O
A S
T
R A
N G
E
S I S K I Y O U S( K A L A M A T H )
B L U E M T S
W A L L O W A S
S T
E E
N S
S T R A W B E R R I E S
O C H O C O
M A H O G A N Y M T S
J A C K A S S M T S
H A R T M T N
T U A L A T I N H I L L S
T R O U T C R E E KM T S
P A U L I N AM T S
Similar approach with our data2-Dimensional Example
φ
ψ
Similar approach with our data2-Dimensional Example
α-helix
β
PII
αL
φψ
Complications…
Our Data: 4-dimensional dataset 4D to 2D distance conversions
What has and hasn’t been observed? No definitive source Abundance / Peak Heights
Machine learning programs can identify both previously documented and unknown common motifs and their abundances
1) Create and prep datasets with resolution of at least 1.2Å or higher, 1.75Å or higher
2) Run cuevas
3) Analyze identified clusters Automated process using Python
to remove bias
4) Analyze context of motifs
2D-visual example of cuevas clustering
Goal: Definitive list of the most common protein motifs In order of abundance
“Everest” Method Locate “highest” peak
first▪ Bad pun : “Mt. Alpha-rest”
Locate second highest peak
Locate third…….
Identifying motifs Search for peaks while
looking for ranges
Results: Definitive list of common
protein motifs in order of abundance
The list…
Points Per ResidueCircle r=10 Degree2 φi ψi φi+1 ψi+1 i i+1 Cluster Size Motif Name New Motif