A high-performance image processing pipeline for Polony DNA re-sequencing ECE1747 Project Report December 21, 2011 Author: Francesco Iorio
A high-performance image processing
pipeline for Polony DNA re-sequencing
ECE1747 Project Report
December 21, 2011
Author:
Francesco Iorio
Abstract DNA Sequencing and re-sequencing are two fundamental tools in biological research,
and applied to numerous such as genome mapping and genetic diseases related to DNA
mutation.
Polony DNA re-sequencing [POR06] is a modern high-throughput technique, which has
been implemented in numerous systems, and the baseline software this project is based
on is the image processing pipeline originally written for the Polonator G.007 machine
[POL], which gathers relatively low-resolution images and extracts the relevant data to
perform DNA re-sequencing operations.
Due to the ongoing quest towards the cost reduction of re-sequencing large amounts of
DNA material, for example the human genome, a new high-throughput parallel image
processing pipeline has been designed to support a different selection of algorithms and
to exploit SMP parallel computing systems, using a combination of pipeline, data-parallel
and task-based parallelism patterns.
The new software outperforms the original implementation in terms of total running time
on a reference SMP system, while operating on 5.5x higher resolution images, as a result
of both pipelining of the different processing stages on different processors and assigning
multiple processors to the most computationally intensive stages, while overall load
balancing amongst the stages is managed by a task-based task-stealing scheduler.
Polony DNA re-sequencing Page 3/22 2011/12/21
Table of Contents
Abstract _______________________________________________________________ 2
Table of Contents _______________________________________________________ 3
Table of Tables _________________________________________________________ 4
Table of Figures ________________________________________________________ 5
1 Introduction _______________________________________________________ 6
2 Background _______________________________________________________ 6
2.1 Polony DNA re-sequencing_____________________________________________ 6
2.2 Polonator G.007 and its image processing pipeline _________________________ 8 2.2.1 Pre-cycle: image segmentation stage ___________________________________________ 9 2.2.2 Registration stage_________________________________________________________ 10 2.2.3 Extraction stage __________________________________________________________ 10 2.2.4 Base calling stage ________________________________________________________ 10
2.3 New Polonator “H” system ____________________________________________ 10
2.4 Shortcomings of the old software on the new system _______________________ 12
3 A new image processing pipeline ______________________________________ 12
3.1 Pipeline stages ______________________________________________________ 13 3.1.1 Ingestion stage ___________________________________________________________ 13 3.1.2 Phase Detection stage _____________________________________________________ 14 3.1.3 Registration stage_________________________________________________________ 15 3.1.4 Extraction stage __________________________________________________________ 17 3.1.5 Base calling stage ________________________________________________________ 17
4 Experimental setup ________________________________________________ 17
4.1 Synthetic image generator ____________________________________________ 18
4.2 Test image processing system __________________________________________ 18
5 Results ___________________________________________________________ 18
5.1 Test 1: old vs. new pipeline ____________________________________________ 18
5.2 Test 2: new pipeline using high-resolution images and fine grid pitch ________ 20
6 Conclusions and future work ________________________________________ 21
References ___________________________________________________________ 21
Polony DNA re-sequencing Page 4/22 2011/12/21
Table of Tables Table 1 Polonator G.007 system details ____________________________________________________ 9 Table 2 Polonator H system details _______________________________________________________ 11 Table 3 Experimental Platform __________________________________________________________ 18 Table 4 Test 1 image generator configuration parameters _____________________________________ 19 Table 5 Test 2 image generator configuration parameters _____________________________________ 20
Polony DNA re-sequencing Page 5/22 2011/12/21
Table of Figures Figure 1 – Polonator G.007: beads uniformly distributed over the substrate ________________________ 7 Figure 2 – Polonator G.007 image processing pipeline ________________________________________ 9 Figure 3 – Polonator “H”: beads distributed over the substrate in grid layout _____________________ 11 Figure 4 – Parallel image processing pipeline ______________________________________________ 13 Figure 5 – Partial Projection Profile technique for phase detection _____________________________ 15 Figure 6 – 2D DFT power spectrum cross-correlation registration ______________________________ 16
Polony DNA re-sequencing Page 6/22 2011/12/21
1 Introduction DNA is one of the fundamental building blocks of life and its accurate analysis
constitutes a major technical and scientific challenge.
Modern high-throughput re-sequencing techniques provide fast, accurate detection and
low cost, although the ever-increasing demand in the area of DNA research requires
constantly reducing the overall turnaround time of the sequencing results and the cost per
sequenced base.
Polony DNA re-sequencing [POR06] is one of the modern techniques designed to both
drastically reduce the overall cost of the re-sequencing process and to significantly
decrease the time to result, and as such offers a very competitive price/performance point
$0.11/Kbase, while being capable of analyzing around 400,000 bases per hour.
Scanning very large DNA sequences such as the entire human genome, which consists of
approximately 6 billion base pairs, still requires several days of processing, therefore
limiting the amount of research that can be performed in a given time.
2 Background DNA strands are composed by two long polymers connected in a double helix spatial
configuration. The two polymers are composed of sequences of smaller molecules,
namely nucleotides (Adenine, Cytosine, Guanine, Thymine).
In order to study DNA-based living organisms and their interaction, knowledge of the
exact sequence of nucleotides that compose DNA strands of extreme importance.
One of the largest and most expensive projects in DNA research has been the sequencing
of the entire human genome [HGP], which is the complete DNA material contained in
human genes; the project was completed in 2003 at a combined cost of over $3 billion.
Re-sequencing is a variant of the sequencing process that does not attempt to sequence
unknown DNA strands, but instead measure the difference between a previously known
strand and other unknown strands that are supposed to be very similar.
The purpose of re-sequencing is mostly to study DNA alterations and mutations, and to
measure the differences between DNA of different life forms and species.
In recent years a number of techniques have been developed to reduce the cost of
sequencing, with the goal of increasing the amount of research conducted on DNA
mutation and the effects of alterations in genes on the development of cellular lifecycles.
2.1 Polony DNA re-sequencing
In order to reduce the time and cost involved in sequencing individual bases in large
DNA strands, the Polony re-sequencing process revolves around a technique of large-
scale multiplexing of the base-detection operations.
The process consists of several interleaved chemical reactions and image processing
stages: the original large DNA strand to be re-sequenced is initially split into thousands
of small DNA strips each consisting of 26-bases to form a library, which are then
attached to a substrate and sequenced in parallel using a set of image processing
techniques used to detect bases in multiple individual strips at once.
Polony DNA re-sequencing Page 7/22 2011/12/21
The complete setup procedure involves the following stages:
1. Split DNA strand into 26-base long DNA strips (templates).
2. Attach especially synthesized reference (primer) strand to all templates.
3. Amplification process (strand replication) to increase the density of the templates.
4. Emulsion process to separate different templates and attach templates to magnetic
objects (beads), in order to generate objects large enough to be visible using a
regular fluoroscopy microscope, each having thousands of identical templates
attached to it.
5. A single layer of beads is spread uniformly over a panel (substrate), placed in a
container where all chemical processes take place (flow cell). This results in a
random distribution of beads over the substrate, as displayed in Figure 1.
Figure 1 – Polonator G.007: beads uniformly distributed over the substrate
After the initial setup a pre-cycle is performed to detect objects positions on the substrate.
In order to capture an image of the total substrate area at a resolution high enough to
allow distinguishing individual objects, the area is subdivided into a number of imaging
locations. For each imaging location the following operations are performed:
1. The microscope is moved to the imaging location.
2. A white-light image is taken and this allows the detection of all the objects at
once.
3. Image segmentation is performed on the image to detect the 2D spatial
coordinates of individual objects.
4. All the objects coordinates are stored to disk.
Polony DNA re-sequencing Page 8/22 2011/12/21
Following the setup procedure, 26 cycles of chemical reactions and image processing are
required to extract the required information for all 26 bases at each object location. For
each of the 26 cycles the following operations are performed:
1. A set of four kinds of pre-prepared polymers are flushed onto the substrate, each
kind containing a fluorophore segment which is visible when illuminated with a
light source of the appropriate color; the polymers are synthesized so they
chemically bond with templates that contain a specific nucleotide in the position
currently being sequenced.
2. For each imaging location the following operations are performed:
a. The microscope is moved to the imaging location used in the pre-cycle
four separate times, one pass per color.
b. In each pass a light of a different color is shone on the substrate and an
image is taken.
c. Image registration is performed to align the original objects coordinates
detected in the pre-cycle with the new image, due to the mechanical offset
introduced by the robotic arm visiting the same location multiple times.
d. Extraction of the objects values is performed using the original
coordinates detected in the pre-cycle and the offset detected in the
registration phase.
After the 26 cycles are performed, information about all the templates is processed to
generate base sequences for each template using the color intensities extracted at each
location to assign individual bases (nucleotides) to individual locations in each template
(base calling).
The resulting 26-base sequences are then individually aligned to the original reference
DNA strand using error metrics when perfect alignments are not found, resulting in a full
sequence of the new strand together with a complete list of differences between the new
strand and the reference strand.
2.2 Polonator G.007 and its image processing pipeline
The first implementation of the Polony DNA re-sequencing process in a commercially
available system is the Polonator G.007 system [POL], which is the result of the
collaboration between the Harvard Wyss Institute and Dover Systems; Table 1 details the
machine’s full specifications.
Polony DNA re-sequencing Page 9/22 2011/12/21
Table 1 Polonator G.007 system details
Imaging resolution 1000x1000 pixels
Imaging interval 120ms
Total imaging area 2000 mm2
Imaging throughput 22 MB/sec.
Sequencing throughput ~0.4 Gbases/hr.
Flow cell sequencing time ~45 hours
Human genome sequencing time ~710 hours
Figure 2 – Polonator G.007 image processing pipeline
Figure 2 shows the overall Polonator G.007 image processing software pipeline structure.
In the pipeline each stage is executed sequentially and serializes all its state to disk prior
to moving to the subsequent stage.
All the imaging stages move the microscope over a number of imaging locations in order
to have images that cover the full substrate. The full set of images taken at all imaging
locations represents a full scan of the flow cell.
2.2.1 Pre-cycle: image segmentation stage
As previously discussed the pre-cycle performs a full scan of the flow cell, taking a
white-light image at each imaging location.
Objects are randomly placed on the substrate, therefore in order to detect individual
object positions, image segmentation is performed on the white-light images and the
coordinates of each object centroid is stored to disk.
The segmentation algorithm uses a threshold value to distinguish pixels that constitute the
image background from the objects themselves; the threshold value is computed per
Polony DNA re-sequencing Page 10/22 2011/12/21
image by adding the standard deviation of all pixel values in the image to the mean of all
pixel values.
Pixels above the threshold value represent objects, while the other pixels represent the
background. Connected component labeling is performed using the pixels above the
threshold and centroids are computed for all the detected objects.
In order to facilitate the subsequent registration stage, coordinates for a subset of the
detected objects are also stored to disk as a registration array.
2.2.2 Registration stage
Due to the very high level of image magnification and mechanical tolerances, two images
taken at the same imaging location in two different times can exhibit a small offset,
which experiments have quantified in +-20 pixels in both X and Y directions.
Image registration is therefore required to return a 2D vector representing the offset
between an image taken at a specific imaging location in a cycle and the reference, white-
light image taken at the same imaging location.
Registration is performed individually on all four color images taken at each imaging
location, as they are all taken at different times.
The registration process involves reading pixel values in a window centered at at the
objects coordinates specified in the registration array, and finding the offset at which the
sum of the intensity values is highest. The highest value represents the point of maximum
correlation.
2.2.3 Extraction stage
The offset resulting from the registration stage is added to the 2D coordinates of all
objects detected at each imaging location to extract the individual objects values.
Individual intensity values are extracted from each of the four color images and stored to
disk for processing by the base calling stage.
2.2.4 Base calling stage
Base calling reads the four intensity values for all detected objects from disk and assigns
to each object a nucleotide base depending on the highest intensity value out the four.
The exact details of this process are presented in section 3.1.5, as it was largely
unchanged between the old and the new software.
As the base calling is not a very computationally intensive step relative to the image
processing stages, all experiments for the old software pipeline do not include it in the
timing results.
2.3 New Polonator “H” system
In order increase the overall image processing throughput with the ultimate goal of
increasing the sequencing rate (and simultaneously reducing the cost), a new system was
designed: the Polonator “H”.
The new system was designed to use one or more cameras, up to four, to increase the
level of multiplexing by sampling multiple imaging locations at a time. Furthermore, high
resolution cameras and fast frame grabbers are used, which decrease the interval between
images to 60ms.
Polony DNA re-sequencing Page 11/22 2011/12/21
Another improvement over the old system is in the design of the flow cell substrate:
while in the old system beads were uniformly spread, therefore assuming random 2D
coordinates, the new system uses a different chemical process that produces DNA
nanoballs and attaches them to an etched-silicon grid.
This results in the objects assuming positions on a uniform grid of predetermined size and
pitch, which is an important property that can be exploited by the image processing
software.
Figure 3 displays the regular grid layout the images generate by the new system exhibit.
Figure 3 – Polonator “H”: beads distributed over the substrate in grid layout
The full system specification is detailed in Table 2.
Table 2 Polonator H system details
Imaging resolution 2560x2160 pixels
Imaging interval 60ms
Total imaging area 5000 mm2
Imaging throughput ~184 MB/sec. per camera
Peak sequencing throughput 1 camera: ~8.7 Gbases/hr.
4 cameras: ~34.9 Gbases/hr.
Peak flow cell sequencing time 1 camera: ~14 hours
4 cameras: ~3.5 hours
Peak human genome sequencing time 1 camera: ~34.2 hours
4 cameras: ~8.5 hours
Polony DNA re-sequencing Page 12/22 2011/12/21
2.4 Shortcomings of the old software on the new system
As the new system was being designed it was immediately apparent that the original
software was not suitable, as it has a number of shortcomings which prevent it from
working properly under the new operating conditions:
While using full image segmentation works fine for objects detection, the new
grid-shaped layout allows for more efficient processing due to its predetermined
geometry.
Due to the very fine grid pitch used by the new machine (800nm), the microscope
zoom level (20x) and pixel size (6.5um), the objects dimensions are very small, of
the order of ~1-2 pixels in diameter, and the distance between consecutive objects
centers on the grid is ~2.5 pixels and can therefore partially overlap, which
determines anomalies in the segmentation procedure used.
The registration phase uses a set of known objects coordinates and samples an
image window around each presumed object location, using the cumulative peak
intensity level from the four color images to determine the most likely offset;
while this technique works well with randomly distributed objects, it can fail in
presence of a highly filled regular grid, as it can determine multiple equally valid
registrations, each spaced by the grid frequency.
The registration phase finds integer offsets, which are not sufficient in presence of
a fine grid.
The problem of connected component labeling used by the segmentation is known
to be hard to parallelize, and does not scale well on multiple processors.
After analyzing the aforementioned shortcomings the decision was made to rewrite the
software pipeline completely, to use more appropriate algorithms and to exploit parallel
computing to increase the overall throughput.
3 A new image processing pipeline In order to sustain the high data throughput the new machine generates while performing
the imaging cycles, the new image processing software pipeline was designed to exploit
multiple CPU cores, commonly available in modern microprocessors.
As explained above, the new system attaches objects to a regular grid, which can simplify
detection using techniques previously used in microarray analysis; the new pipeline
aggressively exploits the inherent knowledge of the grid geometry to accelerate its stages.
In his report Peter Bajcsy [BAJ06] examines several methods commonly applied in
detection and extraction of object data from DNA microarrays, which are structures that
produce images very similar to the new Polonator system, and we thus designed the new
software pipeline to reuse part of the already developed knowledge and apply it to our
scenario.
Figure 4 shows the pipeline design and illustrates the included processing stages.
Polony DNA re-sequencing Page 13/22 2011/12/21
Figure 4 – Parallel image processing pipeline
All the pipeline stages are concurrently active, except for Phase Detection and
Registration, which are active in different imaging cycles, and use lightweight events to
communicate data items between them.
The pipeline uses task-based parallelism throughout, and employs Intel’s Threading
Building Blocks [INT] as the main task-stealing scheduler to automatically balance the
load over all the available CPU threads at runtime, by maintaining separate task-queues
per thread, therefore minimizing the overhead of context switching in presence dynamic
assignment of load to threads.
Data is input sequentially to the ingestion stage by one or more cameras (up to four), and
is subsequently forwarded through the pipeline for processing, generating as output a
string of bases per cycle, corresponding to one base sequenced per object per imaging
location.
3.1 Pipeline stages
3.1.1 Ingestion stage
The ingestion stage temporarily caches images coming from one or more cameras to form
image sets composed of the four color images that represent a unique imaging location.
The frame grabber transfers images to main memory using DMA into a pre-allocated
circular buffer and generates an event when an image is ready essentially creating a
producer-consumer queue operating on the circular buffer.
Once all the four images for a specific imaging location are ready the ingestion stage
creates a small memory structure to contain all the information pertinent to that imaging
location (image set) and forwards it to the next stage in the pipeline.
Polony DNA re-sequencing Page 14/22 2011/12/21
The decision to have a dedicated stage for image ingestion facilitates future extensions of
the pipeline, or the reordering of the imaging sequence by encapsulating the caching and
reordering to feed the other pipeline stages with a consistent stream of image sets.
3.1.2 Phase Detection stage
The phase detection stage is dedicated to positioning a 2D frame of reference for the
objects grid, with the goal of having the ability to calculate all remaining objects
coordinates starting from the reference frame and using the grid pitch information that is
known in advance.
This dramatically reduces the processing complexity that regular image segmentation
presents.
As previously described by Bajsci [BAC06], Deepa [DE09] and Siswantoro [SIS10]
horizontal and vertical projection profiles are useful in determining the grid layout in
images containing microarray data.
The technique used for this project is a refinement that uses specific knowledge about the
images produced by the Polonator machine. More specifically the key difference between
traditional DNA microarrays and our machine is that images taken from DNA
microarrays generally contain the objects grid in one or more sections of the image, but
very rarely the grid spans the entire image, whereas in our scenario the grid fills
completely the image, as the substrate we are imaging is much larger than the microscope
imaging area due to the required resolution and zoom level.
Using this knowledge, combined with experimental-derived knowledge of average and
maximum fill-rate in our grids (80% and 95% respectively), we could therefore avoid
sampling the full image to generate the horizontal and vertical projection profiles,
sampling instead a small vertical slice and a small horizontal slice, their size determined
statistically to ensure 99% fill rate.
After the projection profiles have been obtained we do not directly detect intensity peaks
on the profiles like the regular projection profile technique, but instead we perform
intensity binning using the grid pitch as the binning frequency, to obtain a compact
profile that exhibits a peak corresponding to the desired phase value.
We named this technique “Partial Projection Profile”, Figure 5 depicts the process.
Polony DNA re-sequencing Page 15/22 2011/12/21
Figure 5 – Partial Projection Profile technique for phase detection
3.1.3 Registration stage
The purpose of the registration stage is to align images taken at the same imaging
location in different cycles. The reason this is necessary is the combination of
temperature variations and mechanical drift in the robotic arm movement, which causes
images taken at the same imaging location after the mechanical arm moved away from
the location and then returned to it to have a small offset.
Image registration is not a technique commonly used in DNA microarrays processing, as
normally each acquired image represents a different experiment and therefore requires
separate grid placement and values extraction, whereas in our case we not only have prior
information on the grid phase and pitch, we also need to preserve the spatial
characteristics of the grid, by matching all object coordinates at every imaging location,
so that intensity values for an individual object in different cycles can be concatenated to
form the sequence of bases present in that object.
General image registration is a widely researched problem, especially in medical
imaging. Wisetphanichkij and Dejhan [WIS05] describe a robust image registration
procedure that includes coarse and fine affine transformations registration, while
NessAiver, Subhasish Biswas [NES00] present a DFT-based registration of rotation and
translation.
Full image DFT cross-correlation would allow us to have precise registration between the
images acquired in the first cycle and the images acquired in the subsequent cycles, but
the high resolution of the images would make its computation very slow and beyond our
time constraints.
Using random regions inside the image is not guaranteed to succeed due to the very
regular grid layout of our images, which in the worst case scenarios could be completely
Polony DNA re-sequencing Page 16/22 2011/12/21
empty (no object present on a grid region) or completely populated (all cells in the grid
region contain an object). In the first case registration is obviously impossible, while in
the second case every offset which is a multiple of the grid pitch would be an equally
valid registration.
We therefore adopted a combination of machine-specific heuristics and a DFT-based
power spectrum cross-correlation registration for finding sub-pixel registration offsets
faster than using full image DFT cross correlation.
The process chooses a number of regions inside each image captured at all imaging
locations in the first cycle (templates) based on the presence of objects in these regions
forming discernible patterns, and uses the obtained region coordinates to perform cross
correlation to find the relative offsets in all the images captured in the subsequent cycles.
The technique is implemented by splitting the registration operations for each imaging
location between two separate stages.
The phase detection stage performs the patterns search by detecting absence of objects on
the grid, then stores the selected regions coordinates and computes the 2D forward DFT
of those regions.
The registration phase uses the information generated in the phase alignment stage to
extract regions at the same coordinates, then compute the 2D forward DFT of the regions,
compute the power spectrum cross-correlation in the frequency space, and then perform
an inverse 2D DFT to generate a correlation intensity image. The correlation intensity
image is searched for the peak value to find the highest correlation level and that is used
as the registration offset.
The inverse 2D DFT is performed on a 4x larger domain, in order to have 0.25 pixel
accuracy of the registration offsets, Figure 6 shows the results of the inverse 2D DFT.
Processing of the forward and inverse 2D DFT is performed in parallel on the different
regions in each image by creating individual tasks for each DFT operation and assigning
it to the Intel Threading Building Blocks task-stealing scheduler, which arbitrates
between inter-stage pipeline parallelism and intra-stage task parallelism.
Figure 6 – 2D DFT power spectrum cross-correlation registration
Polony DNA re-sequencing Page 17/22 2011/12/21
3.1.4 Extraction stage
The extraction stage uses the previously calculated phase (for the phase alignment stage)
and offset (for the registration stage) values that position precisely the grid frame over the
image, and then proceeds to iteratively calculate all the objects centers coordinates and to
sample the four color images at those coordinates.
For each object position the output is therefore a set of four 16bit values, one per color.
Normalization of the intensity values is performed here, as well as compensation for
different response rate of the different fluorescent polymers.
The process is very memory intensive, as it involves reading values from the large images
at dynamically computed coordinates, causing a large amount of cache misses.
3.1.5 Base calling stage
Base calling involves determining the base present at each object location in a specific
imaging cycle.
The four color intensities extracted in the previous stage contain information about the
base present at each object location.
The sets of intensity values for each imaging location are treated as an array of four-
dimensional vectors, which are processed using the following operations:
Principal component analysis is performed on each vector to determine an initial
estimation of the most likely base assignment each vector represents and vectors
are marked as representing the corresponding base.
Clustering of the vectors is then performed using the initial assignments, to form
four 4-dimensional clusters, one for each base type.
All vectors are then re-assigned to base types according to their Euclidean
distance from the clusters centroids, the base type is selected as the cluster with
the smallest distance from the vector.
Assignments are then converted into 4bit values; while 2bit values would normally be
sufficient to encode four values, additional information is required to mark invalid
(empty) object locations, therefore a 4bit value has been selected as it is a good
compromise between values packing and potential future extensions.
The total output generated by the base calling stage is then a file containing 26
consecutive arrays of bases, one per cycle, with each array containing one base value per
object per imaging location.
Using the high-resolution images generated by the new system the output is
approximately 1023*863 4bit values = ~441KB per imaging location per cycle, for a total
of ~11.7GB of data for a complete flow cell.
4 Experimental setup The Polonator “H” system hardware has been designed, but has not been built at the time
of writing, therefore all the experiments were performed using a combination of a
synthetic image generator and a hardware platform that represent as faithfully as possible
the operating conditions of the new machine, as reported in Table 3.
In order to perform fair performance comparisons, experiments involving the old
software system have been designed to operate under conditions as similar as possible to
Polony DNA re-sequencing Page 18/22 2011/12/21
the new pipeline, and used the same hardware and software environment used to develop
and test the new system.
Table 3 Experimental Platform
Model HP Z800 workstation
Processor 2 x Intel Xeon E5630 2.53GHz
Memory 12 GB
OS Windows 7 Enterprise 64bit
Compiler Intel C++ Compiler v12.1
4.1 Synthetic image generator
As noted earlier the physical system was not available to conduct tests, therefore in order
to create repeatable performance and validity experiments a synthetic image generator
was designed and developed.
The image generator takes as input several parameters and creates sets of images
reproducing the standard operating conditions of both the old and the new machines.
Input parameters to the image generator are the following:
Image dimensions in pixels
Grid pitch in nm
Objects size in nm
Pixel size in nm
Phase in pixels
Offset in pixels
Rotation in degrees
Fill rate as percentage
4.2 Test image processing system
A full set of unit tests was designed to verify the validity of the image processing stages
before performance tests were performed, all the results reported below report data from
verification tests as well as performance.
All the tests involve pre-loading the set of images into main memory and then measuring
the peak image processing throughput; this choice was made to replicate the behavior of a
frame grabber that can DMA images directly into main memory.
5 Results Two different sets of tests were performed. The first measures the performance
differential between the old and the new software using an image library generated by the
synthetic generator with parameters representing the old system operating conditions. The
second measures the performance of the new pipeline using an image library generated
by the synthetic generator with parameters representing the new system operating
conditions.
5.1 Test 1: old vs. new pipeline
Test configuration:
Polony DNA re-sequencing Page 19/22 2011/12/21
256 imaging locations
3 Sequencing cycles
No sequence alignment
Image generator parameters in Table 4
Table 4 Test 1 image generator configuration parameters
Image dimensions 1000x1000
Grid pitch 1500nm
Objects size 550nm
Pixel size 320nm
Phase (avg.) 1.5
Offset (avg.) 0.9
Rotation 0.001
Fill rate 95%
Old software stages performance (without base calling stage):
Pre-cycle - image segmentation stage: 176ms/image set
Cycles 1-26 - image registration stage: 16ms/image
Cycles 1-26 - objects extraction stage: 9ms/image
Total average: 39.66ms/image
New pipeline stages performance (1 thread – with base calling stage):
Cycle 1 - image phase detection stage: 1.1ms/image set
Cycles 2-26 - image registration: 18.9ms/image set
Cycles 1-26 - objects extraction stage: 1.6ms/image set
Total average: 3.64ms/image
New pipeline stages performance (4 threads – with base calling stage):
Cycle 1 - image phase detection stage: 1.0ms/image set
Cycles 2-26 - image registration: 5.3ms/image set
Cycles 1-26 - objects extraction stage: 1.6ms/image set
Total average: 1.36ms/image
Total running time/throughput:
Old software: 142.2 sec. => 41.2 MB/s
New pipeline (1 thread): 13.03 sec. => 450 MB/s
New pipeline (4 threads): 4.53 sec. => 1292 MB/s
The results show how the new pipeline exhibits a large performance benefit compared
with the old software. Most of the advantage is already visible in the single-threaded
implementation of the new pipeline as it benefits greatly by the new algorithms that use
inherent knowledge of the objects location distribution in the images.
The parallel implementation further increased the performance differential by both
reducing the latency of the most computationally intensive stages and at the same time
Polony DNA re-sequencing Page 20/22 2011/12/21
operating on multiple stages simultaneously, therefore maximizing the utilization of the
system’s resources.
5.2 Test 2: new pipeline using high-resolution images and fine grid pitch
Test configuration:
64 imaging locations
3 Sequencing cycles
Base calling included
No sequence alignment
Image generator parameters in Table 5
Table 5 Test 2 image generator configuration parameters
Image dimensions 2560x2160
Grid pitch 800nm
Objects size 400nm
Pixel size 320nm
Phase (avg.) 0.4
Offset (avg.) 0.2
Rotation 0.001
Fill rate 95%
Stages performance (1 thread):
Cycle 1 - image phase detection stage: 1.4ms/image set
Cycles 2-26 - image registration: 17.9ms/image set
Cycles 1-26 - objects extraction stage: 30.3ms/image set
Total average: 10.675ms/image
Stages performance (4 threads):
Cycle 1 - image phase detection stage: 1.3ms/image set
Cycles 2-26 - image registration: 6.0ms/image set
Cycles 1-26 - objects extraction stage: 28.92ms/image set
Total average: 8.33ms/image
Total running time/throughput:
1 thread: 25.6 sec. => 315 MB/s
4 threads: 10.9 sec. => 741 MB/s
The results show how the extraction stage has a much more significant impact on the
overall performance of the stages, making the multithreaded stages perform only slightly
better than the single threaded equivalents; pipeline parallelism partially compensates for
this by aggressively overlapping the stages, therefore masking the additional latency the
extraction stage requires.
Polony DNA re-sequencing Page 21/22 2011/12/21
6 Conclusions and future work The new pipeline is ~10x faster than the old software due to a choice of different
algorithms that use system-specific knowledge and lower overall disk I/O.
The new pipeline also scales well on a modern commodity multicore system up to ~20x
faster than the original software.
Furthermore, the new pipeline addresses validity issues present in the old software when
applied to the new hardware system, which would otherwise produce erroneous results.
While the current peak performance level supports 4 high-resolution cameras,
performance is barely sufficient to handle the desired throughput and therefore additional
improvements are required to guarantee sustained levels of performance for extensive
continuous usage.
In particular more work is required to improve caching effects, especially in the object
extraction stage, which is the critical bottleneck when processing high-resolution images.
Applying parallel execution and memory access tiling to the extraction stage should give
good results provided the system’s memory bandwidth limitations are not exceeded.
Grid rotation is not currently detected automatically, and it needs to be input as a global
parameter; a method based on the Radon Transform can be applied to perform an initial
calibration and then used throughout one full experiment, as the grid rotation value is
dependent only on the camera CCD sensor alignment relative to the flow cell, and is
therefore not expected to vary within the same experiment.
The most computationally intensive algorithms within the stages (DFT/iDFT/Extraction)
can be moved to a GPU; while this would increase the overall cost of the system, it would
provide additional performance, which can be translated in more accurate processing.
References
[BAJ04] Peter Bajcsy. GRIDLINE: automatic grid alignment in DNA microarray
scans. In IEEE Transactions on Image Processing, Vol. 13, Issue 1, Page
15, January 2004.
[BAJ06] Peter Bajcsy. An Overview of DNAMicroarray Grid Alignment and
Foreground Separation Approaches. In EURASIP Journal on Applied
Signal Processing, Pages 1–13, 2006.
[DE09] Deepa J, Tessamma T. Automatic Gridding of DNA Microarray Images
using Optimum Subimage. In International Journal of Recent Trends in
Engineering, Vol. 1, No. 4, May 2009.
[HGP] Human Genome Project.
http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
[INT] Intel Corporation. Intel Threading Building Blocks Tutorial.
http://threadingbuildingblocks.org/uploads/81/91/Latest%20Open%20Sou
rce%20Documentation/Tutorial.pdf. 2003.
[KAT02] M. Katzer, F. Kummert and G. Sagerer. Robust Automatic Microarray
Image Analysis. In Proceedings of the International Conference on
Bioinformatics: North-South Networking, Bangkok, 2002.
Polony DNA re-sequencing Page 22/22 2011/12/21
[LAR08] Monica G. Larese, Juan C. Gomez. Automatic Spot Addressing in cDNA
Microarray Images. In Journal of Computer Science and Technology
(JCS&T), Vol. 8 No. 2, July 2008. [MO00] S. K. Moore. Understanding
The Human Genome. In IEEE Spectrum, Pages 33-42, November 2000.
[NES00] Moriel S. NessAiver, Subhasish Biswas. Image Registration Using a
Discrete Fourier Transform Implementation Of the Decoupled Automated
Rotation and Translation Algorithm (DFT-DART). In Proceedings of
ISMRM, Denver, CO, 2000, 586, Vol. 8 No. 2, July 2008.
[MO00] S. K. Moore. Understanding The Human Genome. In IEEE Spectrum,
Pages 33-42, November 2000.
[POL] Polonator G.007 system and software. http://www.polonator.org/
[POR06] Gregory J. Porreca, Jay Shendure, George M. Church, Polony DNA
Sequencing. In Current Protocols in Molecular Biology. Unit Number:
UNIT 7.8. Harvard Medical School, Boston, Massachusetts. 2006.
[SIS10] Joko Siswantoro, Automatic Gridding for DNA Microarray Image Using
Image Projection Profile. In Proceedings of the 6th IMT-GT Conference
on Mathematics, Statistics and its Applications (ICMSA2010). Universiti
Tunku Abdul Rahman, Kuala Lumpur, Malaysia. 2010.
[WIS05] Sompong Wisetphanichkij, Kobchai Dejhan. Fast Fourier Transform
Technique and Affine Transform Estimation-Based High Precision Image
Registration Method. In GESTS Int’l Trans. Computer Science and Engr.
Journal, Vol.20, No.1, 2005.
[WA05] Yu Wang, Frank Y. Shih, Marc Q. Ma. Precise Gridding of Microarray
Images by Detecting and Correcting Rotations in Subarrays. In
Proceedings of the 8th Joint Conference on Information Sciences, 2005.
[YA00] Y.H. Yang, M. J. Buckley, S. Dudoit and T.P.Speed. Comparison of
Methods for Image Analysis on cDNA Microarray Data. Technical Report
#584, Department of Statistics, University of California at Berkeley,
November 2000.