Queensland Parallel Supercomputing Foundation 1. Professor Mark Ragan (Institute for Molecular Bioscience) 2. Dr Thomas Huber (Department of Mathematics)

percom

1. Professor Mark Ragan1. Professor Mark Ragan (Institute for Molecular (Institute for Molecular Bioscience)Bioscience)2. Dr Thomas Huber2. Dr Thomas Huber (Department of Mathematics) (Department of Mathematics)

Computational Biology andComputational Biology andBioinformatics EnvironmentBioinformatics Environment

ComBinEComBinE

National Facility Projects

percom

Comparison of protein families among completely sequenced

microbial genomes

The scientific problem:

Handcrafted analyses suggest that gene transfer

in nature may be not only from parents to

offspring (“vertical”), but also from one lineage

to another (“lateral” or “horizontal”)

From microbial genomics we have complete

inventories of genes & proteins in ~ 80 genomes

Comparative analysis should identify all cases

of vertical and lateral gene transfer

percom

Computational requirement for 80 genomes:

1012 BLAST comparisons

5000 T-Coffee alignments

5000 Bayesian inference trees

107 topological comparisons

Find all interestingly large protein families in all microbial genomes

Generate structure-sensitive multiple alignments

Infer phylogenetic trees with appropriate statistics

Compare trees, look for topological incongruence

The approach

percom

Computations on APAC National Facility

Motif-based multiple alignment30-50 sequences = 2-5 hours per run

Will need ~5000 runs @ 4 - 60 seqs

Bayesian inferenceParameterisation of (MC)3 search

NF used for trials of up to 106 Markov

chain generations (~200 hours / run)

1.5-2.0 Gb RAM per run

Usage of NF:

Code not yet

parallelised

With each run

costing a few 10s of

hours and need for

1000s analyses, it’s

more efficient to use

many processors

simultaneously

percom

Parameterisation of Metropolis-coupled Markov chain Monte Carlo optimisation

through protein tree space

-13000

-12000

-11000

-10000

0 100000 200000 300000 400000 500000

Number of Markov chain generations

Ln-likelihood as function of number of generations

-14000

-12000

-10000

0 100000 200000 300000 400000 500000 600000

Number of generations

Log-likelihood as a function of number of Markov chain generations

Approach to stationarity under Jones et al. (1992) and General time-reversible models of protein sequence change

Bayesian inference (MrBayes 2.0) applied to 34-sequence Elongation Factor 1 dataset. Eight simultaneousMarkov chains, discrete approximation of gamma distribution ( = 0.29), chain temperature 0.1000

percom

With thanks to collaborators

Mark Borodovsky, Georgia Tech

Robert Charlebois, NGI Inc. (Ottawa)

Tim Harlow, University of Queensland

Jeffrey Lawrence, University of Pittsburgh

Thomas Rand, St Mary’s University

percom

1. Professor Mark Ragan1. Professor Mark Ragan (Institute for Molecular (Institute for Molecular Bioscience)Bioscience)2. Dr Thomas Huber2. Dr Thomas Huber (Department of Mathematics) (Department of Mathematics)

Computational Biology andComputational Biology andBioinformatics EnvironmentBioinformatics Environment

ComBinEComBinE

National Facility Projects

percom

Protein Structure Prediction

Two Lineages• The bioinformatics approach

– Compare sequence to other sequence– huge datasets (0.5*106 sequences)

– Match sequence with known structure– (Low resolution force field development)

• The biophysics approach– Simulations that mimic natural

behaviour

percom

behaviour

Hardware Requirements:

CPU: minutes/seqMem: 1 GB

CPU: hours/seqMem: 100s MB

CPU: 100s hoursMem: 10s MB

percom

behaviour

Parallelism:

Trivial parallel

Hard parallel High bandwidth + low latency requirement

percom

Force splitting and multiple time step integration

(Ian Lenane)

MD SimulationPropagating Molecular

Models in TimeStart With Old System State

Add Information On Energy

And Force

New System State

Apply Numerical Integrator

Mechanical Description

Newton’s Laws of Motion

Time step required: 10-15s

Time scale wanted: >10-3s System is split in

different domains• Fast varying forces (cheap

to calculate) are integrated more frequent

• Slow varying forced (expensive to calculate) are integrated less frequent

+ More efficient integration

+ Easy to expand to parallel simulations

percom

Path simulations(Ben Gladwin)

1 1( , )x y

2 2( , )x yWhat if start and end points are given?• proteins: unfolded folded

• Molecular machines: 1 cycle

• Shortest path calculations– Floyd, Dijkstra

• Hamilton’s principle of least action

)(5.0 }arg{min )()( 2t

tt qUmvdtS

+ Computationally very attractive• Extremely long time steps• Very well suited for parallel architectures

(Floyd algorithm parallelized, but performance problems >4PE on -GS NUMA architecture)

percom

National Facility supercomputer use

• 2001 CPU quota: 2*5250 + 8000 service units – Total use 12000 units (3000 units in parallel)

• 2002 CPU quota: 4 * 6000 service units– First quarter: 2000 units

– Second quarter: 85 units

• Collaborators• Dr A. Torda (ANU) Low resolution force fields /

protein structure prediction

• Prof. D. Hume, A/Prof. B. Kobe and Dr. J. Martin (UQ) Structural genomics project

• Prof. K. Burrage, I. Lenane and B. Galdwin (UQ) Numerical integration and path simulations

• Special Thanks• Mrs J. Jenkinson and Dr D. Singleton (NF/ANUSF)

Queensland Parallel Supercomputing Foundation 1. Professor Mark Ragan (Institute for Molecular Bioscience) 2. Dr Thomas Huber (Department of Mathematics)

Documents

OpenMP Optimization National Supercomputing Service Swiss...

Sponsored by Mark & Jamie Ragan

Blogger relations ragan

Macroeconomics Canadian 15th Edition Ragan Test Bank · PDF....

projects PITTSBURGH SUPERCOMPUTING CENTER …PITTSBURGH...

Ragan Content Summit Kiersten Lawson

Supercomputing Final Report - Google Drive - The...

Casey Ragan

The Measurement Challenge in Bioscience Bioscience...

BioScience Trends. 2021; 15(2):107-117. 107 BioScience ...

About ragan-corliss

Ragan Allen_ Portfolio_2016

Ragan Amandascanned1336

Nasaandsocialmedia ragan-110510134620-phpapp02

Ragan Amanda1336scanned

Ragan wallake’s culinary journey