Massively Parallel Solutions for Molecular Sequence Analysis Bertil Schmidt School of Computer Engineering, Nanyang Technological University, Singapore.

Massively Parallel Solutions for Molecular Sequence Analysis

Bertil SchmidtBertil Schmidt School of Computer Engineering,

Nanyang Technological University , Singapore

Heiko SchröderHeiko SchröderSchool of Computer Science and Information Technology,

RMIT University, Melbourme, Australia

Manfred SchimmlerManfred SchimmlerInstitut für Datentechnik und Kommunikationsnetze,

TU Braunschweig, Germany

Contents

MotivationSmith-Waterman Algorithm Parallelization on the Hybrid

ArchitectureParallelization on the Fuzion 150Performance EvaluationConclusion and Future Work

Genetic sequence databases are growing exponentially Growth rate will continue, since multiple concurrent

genome projects have begun, with more to come

Motivation

Motivation

Discovered sequences are analyzed by comparison with databases

Complexity of sequence comparison is proportional to the product of query size times database size

Analysis too slow on sequential computersAnalysis too slow on sequential computersTwo possible approaches

HeuristicsHeuristics, e.g. BLAST,FastA, but the more efficient the heuristics, the worse the quality of the results

Parallel ProcessingParallel Processing, get high-quality results in reasonable time

Full Genome Comparison

related Organisms, but Tuberculosis causes a disease find common and different parts

16106 pairwise sequence comparisons Many Genome-Genome Comparisons will be required in the near future

3918 ProteinSequences1.329.298

AminoAcids

4289 ProteinSequences1.359.008

AminoAcids

Protein Sequence Alignment

BLAST, FastA, Smith-Waterman

GGHSRLILSQLGEEG.RLLAIDRDPQAIAVAKT....IDDPRFSII

GGHAERFL.E.GLPGLRLIGLDRDPTALDVARSRLVRFAD.RLTLV|||::::| : |::| ||:::||||:|:|||:: ::| |::::

BLAST

FastA

Smith-Waterman

Slower

Faster

SearchSpeed

DataQuality

Lower Higher

Smith-Waterman Algorithm

Optimal local alignment of two sequences Performs an exhaustive search for the

optimal local alignment Complexity O(nm) for sequence lengths n and m

Based on the 'dynamic programming' (DP) algorithm Fill the DP matrix using a substitution (mutation) matrix Find the maximal value (score) in the matrix Trace back from the score until a 0 value is reached

Smith-Waterman Algorithm Aligning S1 and S2 of length n and m using Recurrences:

21 ,11,

)2,1()1,1(

),(

),(

0

max),( ljli

SSSbtjiH

jiF

jiEjiH

ji

0),0(),0(

0)0,()0,(

jFjH

iEiH

),1(

),1(max),( ,

)1,(

)1,(max),(

jiF

jiHjiF

jiE

jiHjiE

Calculate three possible ways to extend the alignment by one AminoAcid (AA) in each sequence by one AA in the first sequence and align it with a gap in the second by one AA in the second sequence and align it with a gap in the first

Smith-Waterman AlgorithmAlign S1=ATCTCGTATGATGATCTCGTATGATG S2=GTCTATCACGTCTATCAC

GTCTATCAC

A T C T C G T A T G A T G

0 0 0 0 0 2 1 0 0 2 1 00000000000

0 0 0 0 0 0 0 0 0 0 0 0 02

0 2 1 2 1 1 4 3 2 1 1 3 20021021

1224321

4323654

3654554

4554657

3444556

3546545

3475576

2569876

1458876

03677

109

2258799

2147788

108

97

534

2

0

else 1

)( if 2),(

yxyxSbt

=1, =1

A T C T C G T A T G A T GA T C T C G T A T G A T G

G T C G T C T A T C A CT A T C A C

)2,1()1,1(

1)1,(

1),1(

0

max),(

ji SSSbtjiH

jiH

jiHjiH

Parallel Architectures for Bioinformatics

Embedded Massively Parallel Accelerators Systola 1024: PC add-on board with

1024 processors (ISATEC, Germany)

Fuzion 150: 1536 processors on a single chip (Clearspeed Technology, UK)

Parallel Architectures for Bioinformatics

High speed Myrinet switchHigh speed Myrinet switch

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Supercomputer performance at low cost combines SIMD and MIMD paradigm within a parallel architecture Hybrid ComputerHybrid Computer

Previous Applications

Scientific ComputingVolume VisualizationAutomatic Visual Quality ControlCryptographyComputer TomographyVideo CompressionRange of Transforms (Fourier, Wavelet,

Hough, Radon)Computer Graphics

Architecture of Systola 1024

Interface processors

ISA

RAM NORTH

host computer bus

Controller

RAM WEST

program memory

Instruction Systolic Array: 32 32 mesh of

processing elements wavefront instruction

execution

14

Instruction Systolic Array

+

row selectors

columnselectorsinstructions

*

-

+

-

*-

+*+

+*-+

+*

* +-+

+*-

+* +*

+*-

++*

*-*-+

+*

+*

-

-

-

+*

+*- +*- -

wavefront instruction execution fast accumulation operations (e.g. row sum, broadcast, ringshift)

Parallelization of Smith-Waterman

matrix cells along a single diagonal are computed in parallel comparison is performed in A+B1 steps on A PEs

GTCTATCAC

A T C T C G T A T G A T G

0 0 0 0 0 2 1 0 0 2 1 00000000000

0 0 0 0 0 0 0 0 0 0 0 0 02

0 2 1 2 1 1 4 3 2 1 1 3 20021021

1224321

4323654

3654554

4554657

3444556

3546545

3475576

2569876

1458876

03677

109

2258799

2147788

000 0

02

0

01

14

2

2

2

0

3

2

1

3

2

1

52

43

B

A

P1 P2 P13

Mapping onto Systola 1024

a30a31 a0

a63 a62 a32

a992a1022a1023

bk….b1b0bk….b1b0…c1c0 X

bb: subject sequence

aa: query sequence (equal to 1024)

Subject sequences can be pipelined with only 1 step delay k steps for subject sequence of length k

Efficient routing on the ISA: Row Ringshift and Broadcast

Performance Evaluation

Scan times in seconds for TrEMBL 14 (351’834 Protein Sequences) for various query sequence lengths

Query sequence length 256 512 1024 2048 4096

Systola 1024speedup to PIII 850

2945

5776

11376

22416

46116

Cluster of 16 Systolasspeedup to PIII 850

2081

3886

7391

14294

29094

Parallel implementation scales linearly with sequence length and number of PCs

Computing time dominates data transfer time

Fuzion 150 Architecture

0.25-m, single-chip, SIMD architecture 1536 PEs @ 200 MHz 300 GOPS 600 GB/s on-chip, 6.4 GB/s off-chip bandwidth Multithreading (control units interact via semaphores) developed by Clearspeed Technology (UK) for graphics, networking processing

Linear SIMD Array1536 PEs

each with 2 Kbytes DRAM

Linear SIMD Array1536 PEs

each with 2 Kbytes DRAM

FUZION BusFUZION Bus

32-bit EPU(ARC)

32-bit EPU(ARC)

VideoI/O

VideoI/O

DisplayDisplay

Instruction FetchInstruction Fetch

SIMD ControllerSIMD Controller

Local MemoryLocal

Memory1,2 or 4

Channels (6.4 GB/s)

HostHost AGP Rambus

Fuzion 150 Architecture

PE(0,0)

PE(0,1)

PE(0,255)

Fuz

ion

Bus

PE(1,0)

PE(1,1)

PE(1,255)

PE(5,0)

PE(5,1)

PE(5,255)

Local MemoryLocal

Memory

Block 5

Block 1

Block 0

ALU(8 bits)

Register file32 Bytes

PE Memory2 KByte DRAM

Right PE

Instructions

Block I/O Channel

Left PE

Mapping onto the Fuzion 150 Block 5

Block 1

Block 0

bb: subject sequence

bk….b1b0bk….b1b0

a1a0 a255

a511 a510 a256

a1280a1534a1535aa: query sequence (equal to 1536)

…c1c0 X

No fast global communication 2-step local communication Subject sequence can be pipelined with only step delay

Mapping onto the Fuzion 150

Reduce communication time Assign 16 AAs to each PE query lengths up to

24576 AAs can be processed within a single pass

Partitioning for query lengths <24576: each subarray of corresponding size computes

the alignment of the same query sequence with different subject sequences

Performance Evaluation

Scan times in seconds for TrEMBL 14 (351’834 Protein Sequences) for various query sequence lengths

Query sequence length 256 512 1024 2048 4096

Fuzion 150speedup to PIII 850

12136

22151

42157

82163

162165

Parallel implementation scales linearly with sequence length Computing time dominates data transfer time

Performance Evaluation Normalized time Comparison for a 10 Mbase

search on different parallel architectures with different query length

1

10

100

SAMBA Fuzion 150 Kestrel 16K-PEMasPar

Se

con

ds 512

1024

2048

4faster than 16K-PE MasPar 6faster than Kestrel 5faster than SAMBA (special-purpose 3-board

architecture)

Performance Evaluation for Full Genome Comparison Scan times for pairwise protein sequence comparison of

Mycobacterium Tuberculosis and Escherichia Coli

Cluster of Systola 1024speedup to PIII 850

17 min79

Fuzion 150speedup to PIII 850

11 min133

Comparison has to be performed for several parameters (Substitution matrices, gap penalties) Mycobacterium Smegmatis will be published later this year Results of the comparison will be interpreted with the Centre for Molecular Cell Biology, NUS,

Singapore

Conclusions and Future Work

Demonstrated how fine-grained parallel architectures can be applied efficiently for Comparative Genomics

Significant runtime savings for genome comparisons and database searching More Discovery Is Possible at a good price-performance ratio

Other Computational Biology applications of interest to us: ClustalW HMM pattern matching algorithms, such as inverted repeats,

short tandem repeats, etc Availability of accelerators as a special-resource in a Grid

Environment

Contents

Protein StructureProtein Structure PredictionApproach based on Local Protein

StructureRefinementsConclusions and Future Work

Protein Structure

Proteins are large molecules composed of smaller molecules called amino acids

There are 20 kinds of amino acids found in natural proteins

All share a common structure

R side chain

carboxyl groupamine group

alpha carbon(with attached hydrogen)

Protein Structure

From Primary to Tertiary Structure

A protein’s 3D shape is determined by its primary amino acid sequence (Anfinsen, 1963)

Predicting tertiary structure from amino acid sequence is an unsolved problem Difficult to model the energies

that stabilize a protein molecule Conformational search space is

enormous

Prediction Methods

Given an amino acid sequence: search a set of known folds by aligning sequence

and a template fold representative predict the fold that gets the best scoring

alignment

Target amino acid sequence

Template

Fold library

YLAADTYK

Template amino acid sequence FISSETCN MEPSSYV TGLIRKN

Target/template Score: 7 21 2

Prediction Methods

This method is very effective when target and template have >30% sequence identity

Approximately 1/3 of protein sequences can be assigned folds and modeled this way

Our aim is to contribute to determine tertiary structures in case matching sequences cannot be found

Local structure and prediction

What is Local structure ? describes environment of an amino acid an amino acid’s relationship to neighbors

we use this information to predict structure from primary sequence

Dihedral Angles

The 6 atoms in each peptide unit lie in the same plane and free to rotate

The structure of a protein is almost totally determined, if all angles and are known

Idea of our Approach

Stiff free local predictability database of sub-chain structures reduction of the number of degrees of freedom by 10, reduces the computation time

significantly in combination with a global optimization algorithm (e.g. GA or SA)

Side chains

Back bone

C

and

C

N

Classification of Dihedral Angles

Selected PDB structures

Dihedral angle

extraction

Histogram for each

amino acids pair

stiff

multiple

flexible


-100 -50 0 50 100 1500

20

40

60

80

100

120

ALA-ALA

Freq

uenc

y

ALA-ALA

Freq

uenc

y

-160 -140 -120 -100 -80 -600

10

20

30

40

50

60

LEU-ARG

Freq

uenc

yFr

eque

ncy

LEU-ARG

-160 -140 -120 -100 -80 -60 -40 -200

5

10

15

20

GLY-ILE

Freq

uenc

yFr

eque

ncy

GLY-ILE

Stiff

multiple

flexible


Selected PDB structures

Dihedral angle

extraction

Histogram for each

amino acids pair

stiff

multiple

flexibleStiff angles: determine mean valueMultiple angles: determine sequence of mean values,

one for each peak in decreasing order of these peaksFlexible angles: determine mean value and mark as

flexible

Prediction based on Classification

Given a sequence of amino acids, find the subsequence in which all angles are of type stiff

predict structure of these subsequences, using the mean values of the corresponding histograms

Prediction based on Classification

Part of a protein predicted with this method (backbone of a helix, original structure on the left, predicted structure on the right)

Successfully predicted certain stiff structures of subsequences up to the length of 15

Refinement of the method

For multiple angles: consider sequences of length 3 or 4:

extract sequences (C,A,B,D) and determine the histogram of angles and related to the peptide chain between A and B

if histogram for for amino acids (A,B) is multiple, check if angle for (A,B,C,D) is stiff

with longer subsequences the occurrences of these sequences drops dramatically

Refinement of the method

For multiple angles: if an amino acid sequence has only a

small number of multiple edges, it is possible to try all combinations of possible peaks

many combinations lead to collisions in part of the protein, and thus can be eliminated

Conclusion and Future Work

Presented a method to predict stiff structures of subsequences up to the certain length

Presented a refinement of the method to handle multiple angles

how to handle flexible angles ?Using the local prediction as an input for a

global optimization method, e.g. based on Simulated Annealing

Massively Parallel Solutions for Molecular Sequence Analysis Bertil Schmidt School of Computer Engineering, Nanyang Technological University, Singapore.

Documents

t c t c g t

c slide

t g g t c t

gtctatcac g t c t

c atctcgtatgatg

aminoacids slide

sequence yby

motivation slide