Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Post on 17-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Leonid GlimcherP. 1

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Scaling and Parallelizing a Scientific Feature Detection and Categorization Application

Using a Cluster Middleware.

L. Glimcher, G. Agrawal,

S. Mehta, R. Jin, R. Machiraju

The Ohio State University

Leonid GlimcherP. 2

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Presentation Road Map

• Motivation for scalable datamining.• Description of middleware and functionality.• Description of defect detection and

categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.

Leonid GlimcherP. 3

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Motivation for FREERIDE

• Problem:– Simulation data from engineering and

scientific applications is growing larger,– Analysis models are more complex ,– Drawing knowledge becomes increasingly

more complicated.• Solution:

– Parallel datamining, but …• Catch: application development effort.

Leonid GlimcherP. 4

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

FREERIDE

KEY observation: most algorithms follow canonical loop.

Middleware API:

• Subset of data to be processed,

• Reduction object,

• Local and global reduction operations,

• Iterator.

Supports:

• Disk resident datasets

• Shared & distributed Memory

While( ) {

forall( data instances d) {

I = process(d)

R(I) = R(I) op d

}

…….

}

Leonid GlimcherP. 5

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Previously on FREERIDE

• FREERIDE has been used for:– Apriori and FP-tree frequent item set mining,– KNN classification and decision tree

construction,– K-means and EM clustering,– Vortex Detection (IPDPS 2004).

• Will it work for a scientific mining task with a more complex processing structure?

Leonid GlimcherP. 6

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Presentation Road Map

• Motivation for scalable datamining.• Description of middleware and functionality.• Description of defect detection and

categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.

Leonid GlimcherP. 7

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Overview of Sequential Algorithm

• To understand the properties of the materials– How defects affect the materials?

• Data generated by Molecular Dynamics Simulation– Simulator by Physics Department (OSU)

• Main Tasks– Phase 1 – Defect Detection– Phase 2 – Defect Categorization

Leonid GlimcherP. 8

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Example – Different shades represent different detected defects

Leonid GlimcherP. 9

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Mapping detection/categorization to FREERIDE

FREERIDE Processing Stage Algo phases

Local Processing Node Processing

Global Combination Post Processing

Detection phase Rule Discovery

Defect Segmentation

Categorization phase

Moment Pruning

LCS Matching

Leonid GlimcherP. 10

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Presentation Road Map

• Motivation for scalable datamining.• Description of middleware and functionality.• Description of sequential defect detection and

categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.

Leonid GlimcherP. 11

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Key Parallelization Issues

• Challenges in detection phase stem from partitioning data into chunks:– detection on chunk boundaries,– joining multi-chunk defects.

• Categorization phase:

1. Load balancing is necessary for scalability.

2. Updating catalog with new classes needs to be efficient.

Leonid GlimcherP. 12

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Detection Challenges

Leonid GlimcherP. 13

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Intuitive (un-balanced) Categorization

P N

M

Increasing no. of nodes will increase “sequential” fraction.

Leonid GlimcherP. 14

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Load Balanced Categorization

Approach has been tested with variable numberof multi-node defects.

Leonid GlimcherP. 15

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Intuitive (sequential) Catalog Updates

“Catalog completeness” hasdirect effect on scalability.

Leonid GlimcherP. 16

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Parallel Catalog Updates

Tested with different levels of “catalog completeness”.

Leonid GlimcherP. 17

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Presentation Road Map

• Motivation for scalable datamining.• Description of middleware and functionality.• Description of sequential defect detection and

categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.

Leonid GlimcherP. 18

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Experimental Results: Demonstrating Scalability

• Experimental results for up to 8 processing nodes.

• Experimental Platform:– Cluster (1-8) of 700

MHz Pentium machines– Connected through

Myrinet LANai 7.0– 1 GB memory each

node– Datasets ranging in

size from 133 MB to 1.8 GB

0

5000

10000

15000

20000

1 2 4 8Processing Nodes

Exe

cuti

on

Tim

e (s

ec)

Categorization

Detection

Breakdown of Total Execution time (1.8 GB)

Leonid GlimcherP. 19

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

More Scalability Experiments

01000200030004000

Exe

cuti

on

T

ime

(sec

)

1 2 4 8

Processing Nodes

0/3 in db

1/3 in db

2/3 in db

3/3 in db

• 480 MB Dataset, 1-8 nodes

• Catalog completeness varies, but speedups remain near linear.

• More scalability experiments in paper.

Leonid GlimcherP. 20

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Experimental Results: Evaluating Load Balancing

0

200

400

600

800

1000

1200

1400

1600

Exec

utio

n Ti

me

(sec

)

1 2 4 8

Processing Nodes

Parallel detection

Un-optimizedcategorization

Optimizedcategorization

0200400600800

100012001400160018002000

Exec

utio

n Ti

me (

sec)

1 2 4 8

Processing Nodes

Parallel detection

Un-optimizedcategorization

Optimizedcategorization

480 MB, 2/3 in db

480 MB, 0/3 in db

Optimized scales better!

Leonid GlimcherP. 21

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Experimental Results: Parallel Matching Approach

Default implementation performs sequential categorization of the non-matching defects.

Optimized implementation:

1. parallel local catalog update,

2. merging of local catalogs on Master node,

3. finalizing local catalogs in parallel.

0

500

1000

1500

2000

Ex

ec

uti

on

Tim

e

(se

c)

1 2 4 8

Processing Nodes

0/3 in db1/3 in db2/3 in db3/3 in db

`

0

500

1000

1500

2000

Exe

cuti

on

Tim

e (s

ec)

1 2 4 8

Processing Nodes

0/3 in db

1/3 in db

2/3 in db

3/3 in db

Leonid GlimcherP. 22

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Presentation Road Map

• Motivation for scalable datamining.• Description of middleware and functionality.• Description of sequential defect detection and

categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.

Leonid GlimcherP. 23

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Conclusions

FREERIDE can be used to parallelize scientific mining algorithms with a more complex processing structure.

Scalability can be achieved with less programming effort than if a parallel application was “hand-coded”.

Parallel applications created using FREERIDE allow working efficiently with disk-resident datasets.

Our approaches to load balancing and to parallel categorization of non-matching defects perform better than naïve approaches to solving the posed problem.

Leonid GlimcherP. 24

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Questions?

top related