Top Banner
Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington
13

Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Jan 14, 2016

Download

Documents

Hilary Fleming
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Modelling proteins and proteomes using Linux clustersRam Samudrala

University of Washington

Page 2: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Examples of biological problems

Protein structure prediction/docking simulations- need to run different trajectories that sometimes

talk with each other

Molecular dynamics simulations- need more cohesive parallelisation

Polarisable force fields - need true parallelisation

Bioinformatics searches/exploration- trivially parallelisable

Page 3: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Computational issues

Need efficient methods to start/stop jobs

Need load/balancing queuing system

Need fast communications at times

Need stability (months/years uptimes)

Need low maintainance/management overhead

Need low installation overhead

Needs to be cheap!

Page 4: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Hardware and operating system

256 AMD and Intel CPUs (1-2.5 GHz)

0.5-1 GB RAM, 100-200 GB HD, dual processor MBs

100Mbps ethernet connectivity for 64 processor sets

White boxes are good but use up space – 1u racks ideal

Minimal Linux installation – create clone “CD” – copy on all machines

Page 5: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Our solution

No single solution – user implements their own

Completely decentralised

Analyse problem and determine parallelisable parts

Implementation specific to problem

Use local scratch space for computation

Redundant storage of data for faster access

Limit problem space to specific problems

Page 6: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Problem specific implementation

MCSA/GA: socket-based communication of trajectories; multiple trajectories on different CPUs

Docking: sample different ligands/regions of the proteinon different CPUs

MD: Pairwise force-fields are additive

PFF: ?

Bioinformatics: trivial parallelisation; communication by disk

Page 7: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Semi-exhaustive segment-based foldingEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

generatefragments from database14-state , model

… …

minimisemonte carlo with simulated annealingconformational space annealing, GA

… …

filter all-atom pairwise interactions, bad contactscompactness, secondary structure

Page 8: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

T170/sfrp3 – 4.8 Å for all 69 aa

Ab initio prediction at CASP

Page 9: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Comparative modelling at CASP

T182 – 1.0 Å (249 aa; 41% id)

Page 10: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Prediction of SARS CoV proteinase inhibitors

Ekachai Jenwitheesuk

Page 11: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Bioverse – S. typhimurium protein-protein interaction network

Jason McDermott

Page 12: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Bioverse – H. sapiens protein-protein interaction network

Jason McDermott

Page 13: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Future directions

Network connection with multiple ethernet cards based on traffic analysis

Gigabit ethernet (switches are still expensive)

Better network filesystems