Virdi Sabegh Singh (Advisor Dr. Robert A. Walker) Computer Science Department Kent State University Solving the Longest Common Subsequence (LCS) problem using the Associative ASC Processors with Reconfigurable 2D Mesh
Jan 03, 2016
Virdi Sabegh Singh(Advisor Dr. Robert A. Walker)Computer Science Department
Kent State University
Solving the Longest Common Subsequence (LCS) problem using the Associative ASC Processors with Reconfigurable 2D Mesh
Presentation Outline String matching and its variations Motivation of LCS Role of LCS in Molecular Biology Overview of LCS Discussion on Folklore algorithm Parallel Algorithms for LCS Discussion on ASC processor Brief introduction on Coterie Network
Presentation Outline Reconfigurable Network in the ASC
Processor Modifying the Network for LCS Algorithm Longest Common Subsequence on
Reconfigurable 2D Mesh Exact match
Longest Common Subsequence on Reconfigurable 2D Mesh Approximate match
Summary and Future work
Presentation Outline String Matching and its variations Motivation of LCS Role of LCS in Molecular Biology Overview of LCS Discussion on Folklore algorithm Parallel Algorithms for LCS Discussion on ASC processor Brief introduction on Coterie Network
String Matching Fundamental operation in computing Comparison of characters, words etc. to
determine their similarity Interest is in the area of bioinformatics, in
particular searching genetic databases String are enormous, efficient string
processing is therefore a requirement
String MatchingVariations Is Exact match the only solution? What if the pattern does not occur in the
text? Find the longest subsequence that occurs
both in the pattern and in the text. Longest Common Subsequence, Longest
Common Substring, Sequence alignment, Edit distance Problem are all variation of SM problem
Sequence alignment Procedure of comparing 2 or more sequences Searches series of individual character pattern in the
same order in the sequence
LCS Find a common string for both the sequences preserving
symbol order
Sequence alignment vs. LCS
GGHSRLILSQLGEEG.RLLAIDRDPQAIAVAKT....IDDPRFSII
GGHAERFL.E.GLPGLRLIGLDRDPTALDVARSRLVRFAD.RLTLV|||::::| : |::| ||:::||||:|:|||:: ::| |::::
Presentation Outline String matching and its variations Motivation of LCS Role of LCS in Molecular Biology Overview of LCS Discussion on Folklore algorithm Parallel Algorithms for LCS Discussion on ASC processor Brief introduction on Coterie Network
Motivation of LCS
Molecular Biology File comparison Screen redisplay Cheater finder Plagiarism
detection Codes and Error
Control
Spell checking Human speech Gas Chromatography Bird song analysis Data compression Speech recognition
Presentation Outline String matching and its variations Motivation of LCS Role of LCS in Molecular Biology Overview of LCS Discussion on Folklore algorithm Parallel Algorithms for LCS Discussion on ASC processor Brief introduction on Coterie Network
Role of LCS in Molecular biology DNA sequences (genes) represented by
four letters ACGT, corresponding to the four submolecules forming DNA
When biologists find a new sequences, they typically want to know what other sequences it is most similar to
One way of computing how similar (homologous) two sequences are is to find the length of their longest common subsequence
Role of LCS in Molecular biology This is a simplification, since in the biological
situation one would typically take into account not only the length of the LCS, but also i.e., how gaps occur when the LCS is embedded in the two original sequences.
An obvious measure for the closeness of two strings is to find the maximum number of identical symbols (preserving symbol order)
This by definition, is the longest common subsequence of the strings
Presentation Outline String matching and its variations Motivation of LCS Role of LCS in Molecular Biology Overview of LCS Discussion on Folklore algorithm Parallel Algorithms for LCS Discussion on ASC processor Brief introduction on Coterie Network
Longest Common Subsequences
Formally, we compare two strings, X[1..m] and Y[1..n], which are elements of the set Σ*; here Σ denotes the input alphabet containing σ symbols
The LCS of strings X and Y, lcs(X,Y) is a common subsequences of maximal length
Special case of the edit distance problem The distance between X and Y is defined as the minimal
number of elementary operations needed to transform the source string X to the target string Y
In practical applications, operation are restricted to insertions, deletions and substitutions
For each operation, an application dependent cost is assigned
Longest Common Subsequences LCS(X,Y) typically solved with the dynamic
programming technique and filling an mxn table
Table elements acts as a vertices in a graph, and the simple dependencies between the table values defines the edges
The task is to find the longest path between the vertices in the upper left and lower right corner of the table
Presentation Outline String matching and its variations Motivation of LCS Role of LCS in Molecular Biology Overview of LCS Discussion on Folklore algorithm Parallel Algorithms for LCS Discussion on ASC processor Brief introduction on Coterie Network
Folklore Algorithm Foundation of most of the LCS algorithms Given two strings, find the LCS common to both
strings. Example:
String 1: AGACTGAGGTA String 2: ACTGAG
AGACTGAGGTA - -ACTGAG - - - list of possible alignments - -ACTGA - G- - A- -CTGA - G- - A- -CTGAG - - -
The time complexity of this algorithm is clearly O(nm);
Folklore Algorithm Complexity does not depend on the sequences u
and v themselves but only on their lengths By choosing carefully the order of computing the
d(i,j)'s one can execute the above algorithm in space O(n+m)
The bottleneck in efficient parallelization of LCS problem are the calculating the value of diagonal elements, as shown
As seen, the value of {i,j} depend upon the previous element {i-1,j-1}, when a match is found.
We may have more then one LCS for the same problem
In order to find the best LCS, we associate some parameter
The Smith-Waterman Algorithm uses the same concept that of Folklore algorithm, but gives us the optimal result (LCS)
Folklore Algorithm
Folklore Algorithm
1 1 1 1 1
11
2111
1 222222
111111
3
1
1
1
44443222
3333
43332
5
55
43332 6
5
4
3
2 2
666
5 5
4
3
0 0 0 0 0 0 0 0 0 0 0 0
A G A C T G A G G T A
0
0
0
0
0
0
A
C
T
G
A
G
Presentation Outline String matching and its variations Motivation of LCS Role of LCS in Molecular Biology Overview of LCS Discussion on Folklore algorithm Parallel Algorithms for LCS Discussion on ASC processor Brief introduction on Coterie Network
Parallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the
length of the text string, and m is the length of pattern string
Efficient Parallel algorithm do exist to solve this computational extensive task Some algorithm runs in O(max{n,m}) using O(min{n,m}) processors O(logn) using O(mn/logn) processors There are constant time algorithm for this LCS
problem using the DP approach, using some assumptions
Computation Model Various Network Models have been used to solve
this LCS problem PRAM model, Suffix Tree, 2D-Mesh Network,
Mesh with Reconfigurable buses, Mesh with Multiple buses etc
Algorithm which runs in constant time, assume that most of the operation are done in constant time
In parallel version, one of the important task is to distribute data efficiently and easy manner
Presentation Outline String matching and its variations Motivation of LCS Role of LCS in Molecular Biology Overview of LCS Discussion on Folklore algorithm Parallel Algorithms for LCS Discussion on ASC processor Brief introduction on Coterie Network
The ASC Processor A scalable design implemented on a
million gate Altera FPGA SIMD-like architecture Searches data by content instead of
address 8-bit Instruction Stream (IS) control unit
with 8-bit Instruction and Data addresses, 32-bit instructions
m
em
ory
an
d s
up
po
rtin
g c
ircu
itry
PE and Memory
Netw
ork
PE and Memory
PE and Memory
PE and Memory
CommonRegisters
ResponderResolution
Unit
PE Array
ControlUnit
Instr
ucti
on
Bu
s
Data
Bu
s
From Control Unit
The ASC Architecture
The ASC Architecture Each PE listens to the IS through the
broadcast and reduction network PEs can communicate amongst
themselves using the PE Network PE may either execute or ignore the
microcode instruction broadcast by IS under the control of the Mask Stack
The ASC Features Associative Search
Each PE can search its local memory for a key under the control of IS
Responder Resolution A special circuit signals if ‘at least one’ record
was found Masked Operation
Local Mask Stacks can turn on or off the execution of instruction from IS
Communication between PE’s In 2D mesh network,
Communication between P.E’s themselves take place in two different ways
By using the nearest neighbors mesh interconnection network
Powerful variation on the nearest-neighbor mesh called the “Coterie network”, developed in response to the requirement for nonlocal communication
Processors in a group share common properties and purpose, we call the group a coterie, and hence the name coterie network
Presentation Outline String matching and its variations Motivation of LCS Role of LCS in Molecular Biology Overview of LCS Discussion on Folklore algorithm Parallel Algorithms for LCS Discussion on ASC processor Brief introduction on Coterie Network
Coteries[ Weems & Herbordt ]“A small often selected group of persons who
associate with one another frequently” Features:
Related to other Reconfigurable broadcast network Describable using hypergraphs And they are dynamic in nature
Advantages: Propagation of information quickly over long
distances at electrical speed Support of one-to-many communication within
coterie, reconfigurability of the coterie
Coterie Network Provides method of performing operations on
regions of an image in parallel Used extensively for Matrix Arithmetic, FFT,
Convex Hull Computation, Simulating a pyramid processors, General Permutation Routing and Parallel Prefix
Note that the coterie network is separate from the nearest-neighbor mesh, which we refer to as the SEWN network
Coterie network results in a new mode of parallelism that falls between SIMD and MIMD
PE’s form Coteries
5 x 5 coterie network with switches shown in “arbitrary” settings. Shaded areas denotes coterie (the set of PEs Sharing same circuit)
Coterie’s Physical Structure In the physical
implementation, each PE controls set of switches Four of these switches
control access in the different directions (N,S,E,W)
Two switches H and V are used to emulated horizontal and vertical buses
The two switches NE and NW are used to creation of eight way connected region
Coteries Structure
NWNE
WSES
V
H E
S
W
: Switch
N
Coterie Network The isolated group of processors called
coterie’s, have access only to the multicast within a coterie
When the switches are set, connected processors form a Coterie
The coterie network switches are set by loading the corresponding bits of the mesh control register in each P.E
Basic Coterie structure algorithm The complexity is assumed to be O(1)
unless otherwise stated Transfer of data between two adjacent coteries Symmetry breaking between a pair of nodes in
a coterie Two nodes within a coterie exchange
information
Presentation Outline Reconfigurable Network in the ASC
Processor Modifying the Network for LCS Algorithm Longest Common Subsequence on
Reconfigurable 2D Mesh Exact match
Longest Common Subsequence on Reconfigurable 2D Mesh Approximate match
Summary and Future work
Reconfigurable Network in the ASC Processor Scalable design with
Reconfigurable network Can be used as dedicated
ASIC or Co-processor Implemented on Altera
APEX20KC1000, single CPU, 50 pipelined PE & linear PE interconnection network
Key to reconfigurability is the Data Switchinside each PE S
N
W E
DATA SWITCH
Reconfigurable Network in the ASC Processor Linear network, PE communicates both
ways 2D Reconfigurable Network, PE
communicates with all of its neighbors (N-E-S-W)
Data switch has bypass mode to allow PE communication to skip non-responder, so as to support Associative computing
Presentation Outline Reconfigurable Network in the ASC
Processor Modifying the Network for LCS
Algorithm Longest Common Subsequence on
Reconfigurable 2D Mesh Exact match
Longest Common Subsequence on Reconfigurable 2D Mesh Approximate match
Summary and Future work
Modifying the Network for LCS Algorithm Coterie Network, one of the powerful network But we don’t need full features of the same for
the LCS Algorithm Augmented ASC with new 2D Mesh, with row and
column broadcast buses Modified linear network into 2D Mesh Added features inspired by Coterie network A PE can communicate now, with any of its four
neighbors Bypass mode augmented to support H and V
bypass as well
Presentation Outline Reconfigurable Network in the ASC
Processor Modifying the Network for LCS Algorithm Longest Common Subsequence on
Reconfigurable 2D Mesh Exact match
Longest Common Subsequence on Reconfigurable 2D Mesh Approximate match
Summary and Future work
LCS Algorithm on Reconfigurable 2D Mesh We assume, initially all the internal switch
of the PEs are open Each PEs have a Match Register “M” and
Length Register “L”, initially having value 0 Let the Text string T=T(1)T(2)…T(n) been
fed into row 1 of the Reconfigurable 2D Mesh
PE(0,j) stores T(j), where 0<=j<=n, as shown
This steps take unit time.
LCS Algorithm on Reconfigurable 2D Mesh
A G A C T G A C T G A
LCS Algorithm on Reconfigurable 2D Mesh Broadcast each character of the text string
along the column, using column broadcast bus
In case of Coterie network Form coteries along the column Perform operation multicast in all coteries This step takes unit time.
LCS Algorithm on Reconfigurable 2D Mesh
A G A C T G A C T G A
A G A C T G A C T G A
A G A C T G A C T G A
A G A C T G A C T G A
A G A C T G A C T G A
A G A C T G A C T G A
LCS Algorithm on Reconfigurable 2D Mesh
Let the Pattern string P=P(1)P(2)…P(m) been fed into column 1 of the Reconfigurable 2D Mesh
PE(i,0) stores P(j), where 0<=i<=m, as shown
This steps take unit time
LCS Algorithm on Reconfigurable 2D Mesh
A
C
T
G
A
C
PE’s form Coteries Broadcast each character of the Pattern
string along the row, using row broadcast bus
In case of Coterie network Form coteries along the rows Perform operation multicast in all coteries This step takes unit time
LCS Algorithm on Reconfigurable 2D Mesh
A
C
T
G
A
C
A
C
T
G
A
C
A
C
T
G
A
C
A
C
T
G
A
C
A
C
T
G
A
C
A
C
T
G
A
C
A
C
T
G
A
C
A
C
T
G
A
C
A
C
T
G
A
C
A
C
T
G
A
C
A
C
T
G
A
C
LCS Algorithm on Reconfigurable 2D Mesh After this step each PE’s with index [i,j]
have P[i] T[j]. Now each PE’s compares the content held
in his internal Register. It set the value 1 if they are equal else 0 in
its Match register M. This step takes unit time. Next figure shows the value after this
operation
LCS Algorithm on Reconfigurable 2D Mesh
1 0 1 0 0
00
0000
0 010001
100010
1
0
0
1
00010001
1000
00010
0
01
00100 1
0
1
0
0 0
000
0 1
0
0
A G A C T G A C T G A
A
C
T
G
A
C
Parallel VLDC SM Algorithm on MCCRB Network A Parallel SM algorithm With VLDC
proposed by K.L. Chung in 1995 Uses the Mesh-Connected Computer with
reconfigurable buses system. Runs in O(1) time Pattern of size m , Text of size n uses,
O(nm) PE’s.
LCS Algorithm on Reconfigurable 2D Mesh Now expect the PE’s with index[0,j], where
0<=j<=n, all PEs having value 0 in its Match register M closes the N-E switch.
PE’s with value 1 in its Match Register M closes the W-S switch as shown
Both the steps takes unit time
LCS Algorithm on Reconfigurable 2D Mesh
1 0 1 0 0
00
0000
0 010001
100010
1
0
0
1
00010001
1000
00010
0
01
00100 1
0
1
0
0 0
000
0 1
0
0
A G A C T G A C T G A
A
C
T
G
A
C
LCS Algorithm on Reconfigurable 2D Mesh Sequential Version:
Each PE at the beginning (bottom) of an LCS sends a token to its West neighbor
A PE receiving a token adds 1 to its token if its Match Register “M” Contains 1, and passes the token on if its W-S bypass switch is set and stores it in its Length Register “L”
Perform operation MAX on the entire network The PE with the largest value in its Length
register “L” is the start of the LCS Complexity being the length of the LCS found
LCS Algorithm on Reconfigurable 2D Mesh
1 0 6 0 0
00
0000
0 040005
100050
4
0
0
1
00030003
3000
00020
0
02
00100 1
0
2
0
0 0
000
0 1
0
0
A G A C T G A C T G A
A
C
T
G
A
C
LCS Algorithm on Reconfigurable 2D Mesh Parallel Version:
Each PE a the beginning (bottom) sends its [row, column] id to its west neighbor
PE receiving an ID passes it on Or is it’s the end of an LCS subtracts its own ID
from the received ID Store the value in the Length Register “L” Perform operation Max on the network PE having largest value in its Length Register
“L” is the start of the LCS Complexity, Constant time
LCS Algorithm on Reconfigurable 2D Mesh
1,1 1,2 1,3 1,4 1,5
02,1
0003,1
0 02,80002,4
1,111,101,91,81,71,6
3,5
4,1
6,1
5,1
0004,60004,2
3,9000
0005,30
0
05,7
006,400 6,8
0
4,10
0
0 0
000
0 5,11
0
0
A G A C T G A C T G A
A
C
T
G
A
C
LCS Algorithm on Reconfigurable 2D Mesh
1 6 5
3
A G A C T G A C T G A
A
C
T
G
A
C
LCS Algorithm on Reconfigurable 2D Mesh Exact match implemented on Altera
APEX1000KC FPGA Sufficient to hold 6 x 11 arrays of PEs,
used in the example Ran at a clock speed of 37 MHz, with
respect to the number of PEs Larger network can be easily supported,
due to ASC scalability
LCS Algorithm on Reconfigurable 2D Mesh The algorithm described above solve the
LCS problem for exact match Doesn’t address approximate match The next example demonstrate this
problem For the string:
Text : AGACTGAGGTA Pattern : ACCAGG LCS being : ACAGG
Presentation Outline Reconfigurable Network in the ASC
Processor Modifying the Network for LCS Algorithm Longest Common Subsequence on
Reconfigurable 2D Mesh Exact match
Longest Common Subsequence on Reconfigurable 2D Mesh Approximate match
Summary and Future work
LCS Algorithm on Reconfigurable 2D Mesh
1 0 1 0 0
00
1000
0 000001
100010
0
1
0
0
00100010
0000
10001
0
10
10001 1
1
0
0
0 0
001
0 0
1
0
A G A C T G A G G T A
A
C
C
A
G
G
LCS Algorithm on Reconfigurable 2D Mesh
0 1 0 0 0
01
1000
1 001000
001101
0
1
0
0
00100010
0000
10001
0
10
10001 1
1
0
0
0 1
001
0 0
1
0
A G A C T G A G G T A
G
A
C
A
G
G
LCS Algorithm on Reconfigurable 2D Mesh Inject token from the bottom row Token reaches a gap, enter south port of
some PE, and stops at that PE, whose W-S switch is not set
Close the W-S bypass switch of that PE, and bypass Vertically (N-S) of all to the top of the PEs identified in above step
LCS Algorithm on Reconfigurable 2D Mesh Inject token from the top row Token reaches a gap, enter West port of
some PE, and stop at that PE whose W-S switch is not set
Close the W-S bypass switch of that PE, and Bypass Horizontally (W-S) of all PEs to the right of the PE identified in above step
Bypass W-S switch of all those PEs, where there is cross over of H and V switch
LCS Algorithm on Reconfigurable 2D Mesh Inject token from the bottom row PE receiving a token adds 1 to its Match
Register “M” contains 1 and passes it on if its W-S bypass switch is set, if ends of LCS stores it in the Length Register “L”
The PE with the largest value in its “L” register is the start of LCS
Increment “L” by 1, if “M” register has value 1
LCS Algorithm on Reconfigurable 2D Mesh When H or V switch are set, the token
bypass this switch, the “L” value remains unchanged
We bypass only those tokens whose, value in the “M” Match register is maximum and that in “L” Length register is Minimum.
If both the token have “M” value same, block that token having “L” value maximum
If both “L” and “M” value are same, select any one of them
LCS Algorithm on Reconfigurable 2D Mesh
1 0 1 0 0
00
1000
0 000001
100010
0
1
0
0
00100010
0000
10001
0
10
10001 1
1
0
0
0 0
001
0 0
1
0
A G A C T G A G G T A
A
C
C
A
G
G
LCS Algorithm on Reconfigurable 2D Mesh
0 1 0 0 0
01
1000
1 001000
001101
0
1
0
0
00100010
0000
10001
0
10
10001 1
1
0
0
0 1
001
0 0
1
0
A G A C T G A G G T A
G
A
C
A
G
G
Presentation Outline Reconfigurable Network in the ASC
Processor Modifying the Network for LCS Algorithm Longest Common Subsequence on
Reconfigurable 2D Mesh Exact match
Longest Common Subsequence on Reconfigurable 2D Mesh Approximate match
Summary and Future work
Summary and Future work Summary:
In this Presentation, we have described a new parallel algorithm on specialized hardware
Inspired by certain feature of Coterie Network Modified ASC processor to add reconfigurable
2D Mesh Exact Match implemented on Altera FPGA Constant time algorithm for Exact match Approximate algorithm depends upon the
diameter of the network
Summary and Future work Future Work:
Optimize the algorithm for Approximate match
Incorporating additional parameters to find the best LCS, instead of longest one
Incorporating different weights schemes Conserve memory by using encoding
scheme Use two bits to represent four bases of DNA Using this idea, we save 75% of space/memory
Acknowledgements Professor Walker Committee members for their time ASC/MASC Group for their useful
Comments Professor Helen Piontkivska from Biology
Department Professor Charles Weems and Martin
Herbordt Hong Wang for implementing the exact
match algorithm on FPGA
THANK YOU
Questions….