Massively Parallel Solutions for Molecular Sequence Analysis Bertil Schmidt Bertil Schmidt School of Computer Engineering, Nanyang Technological University , Singapore Heiko Schröder Heiko Schröder School of Computer Science and Information Technology, RMIT University, Melbourme, Australia Manfred Schimmler Manfred Schimmler Institut für Datentechnik und
42
Embed
Massively Parallel Solutions for Molecular Sequence Analysis Bertil Schmidt School of Computer Engineering, Nanyang Technological University, Singapore.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Massively Parallel Solutions for Molecular Sequence Analysis
Bertil SchmidtBertil Schmidt School of Computer Engineering,
Nanyang Technological University , Singapore
Heiko SchröderHeiko SchröderSchool of Computer Science and Information Technology,
RMIT University, Melbourme, Australia
Manfred SchimmlerManfred SchimmlerInstitut für Datentechnik und Kommunikationsnetze,
TU Braunschweig, Germany
Contents
MotivationSmith-Waterman Algorithm Parallelization on the Hybrid
ArchitectureParallelization on the Fuzion 150Performance EvaluationConclusion and Future Work
Genetic sequence databases are growing exponentially Growth rate will continue, since multiple concurrent
genome projects have begun, with more to come
Motivation
Motivation
Discovered sequences are analyzed by comparison with databases
Complexity of sequence comparison is proportional to the product of query size times database size
Analysis too slow on sequential computersAnalysis too slow on sequential computersTwo possible approaches
HeuristicsHeuristics, e.g. BLAST,FastA, but the more efficient the heuristics, the worse the quality of the results
Parallel ProcessingParallel Processing, get high-quality results in reasonable time
Full Genome Comparison
related Organisms, but Tuberculosis causes a disease find common and different parts
16106 pairwise sequence comparisons Many Genome-Genome Comparisons will be required in the near future
Optimal local alignment of two sequences Performs an exhaustive search for the
optimal local alignment Complexity O(nm) for sequence lengths n and m
Based on the 'dynamic programming' (DP) algorithm Fill the DP matrix using a substitution (mutation) matrix Find the maximal value (score) in the matrix Trace back from the score until a 0 value is reached
Smith-Waterman Algorithm Aligning S1 and S2 of length n and m using Recurrences:
21 ,11,
)2,1()1,1(
),(
),(
0
max),( ljli
SSSbtjiH
jiF
jiEjiH
ji
0),0(),0(
0)0,()0,(
jFjH
iEiH
),1(
),1(max),( ,
)1,(
)1,(max),(
jiF
jiHjiF
jiE
jiHjiE
Calculate three possible ways to extend the alignment by one AminoAcid (AA) in each sequence by one AA in the first sequence and align it with a gap in the second by one AA in the second sequence and align it with a gap in the first
matrix cells along a single diagonal are computed in parallel comparison is performed in A+B1 steps on A PEs
GTCTATCAC
A T C T C G T A T G A T G
0 0 0 0 0 2 1 0 0 2 1 00000000000
0 0 0 0 0 0 0 0 0 0 0 0 02
0 2 1 2 1 1 4 3 2 1 1 3 20021021
1224321
4323654
3654554
4554657
3444556
3546545
3475576
2569876
1458876
03677
109
2258799
2147788
000 0
02
0
01
14
2
2
2
0
3
2
1
3
2
1
52
43
B
A
P1 P2 P13
Mapping onto Systola 1024
a30a31 a0
a63 a62 a32
a992a1022a1023
bk….b1b0bk….b1b0…c1c0 X
bb: subject sequence
aa: query sequence (equal to 1024)
Subject sequences can be pipelined with only 1 step delay k steps for subject sequence of length k
Efficient routing on the ISA: Row Ringshift and Broadcast
Performance Evaluation
Scan times in seconds for TrEMBL 14 (351’834 Protein Sequences) for various query sequence lengths
Query sequence length 256 512 1024 2048 4096
Systola 1024speedup to PIII 850
2945
5776
11376
22416
46116
Cluster of 16 Systolasspeedup to PIII 850
2081
3886
7391
14294
29094
Parallel implementation scales linearly with sequence length and number of PCs
Computing time dominates data transfer time
Fuzion 150 Architecture
0.25-m, single-chip, SIMD architecture 1536 PEs @ 200 MHz 300 GOPS 600 GB/s on-chip, 6.4 GB/s off-chip bandwidth Multithreading (control units interact via semaphores) developed by Clearspeed Technology (UK) for graphics, networking processing
Linear SIMD Array1536 PEs
each with 2 Kbytes DRAM
Linear SIMD Array1536 PEs
each with 2 Kbytes DRAM
FUZION BusFUZION Bus
32-bit EPU(ARC)
32-bit EPU(ARC)
VideoI/O
VideoI/O
DisplayDisplay
Instruction FetchInstruction Fetch
SIMD ControllerSIMD Controller
Local MemoryLocal
Memory1,2 or 4
Channels (6.4 GB/s)
HostHost AGP Rambus
Fuzion 150 Architecture
PE(0,0)
PE(0,1)
PE(0,255)
Fuz
ion
Bus
PE(1,0)
PE(1,1)
PE(1,255)
PE(5,0)
PE(5,1)
PE(5,255)
Local MemoryLocal
Memory
Block 5
Block 1
Block 0
ALU(8 bits)
Register file32 Bytes
PE Memory2 KByte DRAM
Right PE
Instructions
Block I/O Channel
Left PE
Mapping onto the Fuzion 150 Block 5
Block 1
Block 0
bb: subject sequence
bk….b1b0bk….b1b0
a1a0 a255
a511 a510 a256
a1280a1534a1535aa: query sequence (equal to 1536)
…c1c0 X
No fast global communication 2-step local communication Subject sequence can be pipelined with only step delay
Mapping onto the Fuzion 150
Reduce communication time Assign 16 AAs to each PE query lengths up to
24576 AAs can be processed within a single pass
Partitioning for query lengths <24576: each subarray of corresponding size computes
the alignment of the same query sequence with different subject sequences
Performance Evaluation
Scan times in seconds for TrEMBL 14 (351’834 Protein Sequences) for various query sequence lengths
Query sequence length 256 512 1024 2048 4096
Fuzion 150speedup to PIII 850
12136
22151
42157
82163
162165
Parallel implementation scales linearly with sequence length Computing time dominates data transfer time
Performance Evaluation Normalized time Comparison for a 10 Mbase
search on different parallel architectures with different query length
1
10
100
SAMBA Fuzion 150 Kestrel 16K-PEMasPar
Se
con
ds 512
1024
2048
4faster than 16K-PE MasPar 6faster than Kestrel 5faster than SAMBA (special-purpose 3-board
architecture)
Performance Evaluation for Full Genome Comparison Scan times for pairwise protein sequence comparison of
Mycobacterium Tuberculosis and Escherichia Coli
Cluster of Systola 1024speedup to PIII 850
17 min79
Fuzion 150speedup to PIII 850
11 min133
Comparison has to be performed for several parameters (Substitution matrices, gap penalties) Mycobacterium Smegmatis will be published later this year Results of the comparison will be interpreted with the Centre for Molecular Cell Biology, NUS,
Singapore
Conclusions and Future Work
Demonstrated how fine-grained parallel architectures can be applied efficiently for Comparative Genomics
Significant runtime savings for genome comparisons and database searching More Discovery Is Possible at a good price-performance ratio
Other Computational Biology applications of interest to us: ClustalW HMM pattern matching algorithms, such as inverted repeats,
short tandem repeats, etc Availability of accelerators as a special-resource in a Grid
Environment
Contents
Protein StructureProtein Structure PredictionApproach based on Local Protein
StructureRefinementsConclusions and Future Work
Protein Structure
Proteins are large molecules composed of smaller molecules called amino acids
There are 20 kinds of amino acids found in natural proteins
All share a common structure
R side chain
carboxyl groupamine group
alpha carbon(with attached hydrogen)
Protein Structure
From Primary to Tertiary Structure
A protein’s 3D shape is determined by its primary amino acid sequence (Anfinsen, 1963)
Predicting tertiary structure from amino acid sequence is an unsolved problem Difficult to model the energies
that stabilize a protein molecule Conformational search space is
enormous
Prediction Methods
Given an amino acid sequence: search a set of known folds by aligning sequence
and a template fold representative predict the fold that gets the best scoring