Top Banner
Leonid Glimcher P. 1 Computer Science and Engineering ipdps’05 Parallelizing Defect Detection and Categorization Using FREERIDE Scaling and Parallelizing a Scientific Feature Detection and Categorization Application Using a Cluster Middleware. L. Glimcher, G. Agrawal, S. Mehta, R. Jin, R. Machiraju The Ohio State University
24

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Jan 17, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 1

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Scaling and Parallelizing a Scientific Feature Detection and Categorization Application

Using a Cluster Middleware.

L. Glimcher, G. Agrawal,

S. Mehta, R. Jin, R. Machiraju

The Ohio State University

Page 2: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 2

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Presentation Road Map

• Motivation for scalable datamining.• Description of middleware and functionality.• Description of defect detection and

categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.

Page 3: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 3

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Motivation for FREERIDE

• Problem:– Simulation data from engineering and

scientific applications is growing larger,– Analysis models are more complex ,– Drawing knowledge becomes increasingly

more complicated.• Solution:

– Parallel datamining, but …• Catch: application development effort.

Page 4: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 4

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

FREERIDE

KEY observation: most algorithms follow canonical loop.

Middleware API:

• Subset of data to be processed,

• Reduction object,

• Local and global reduction operations,

• Iterator.

Supports:

• Disk resident datasets

• Shared & distributed Memory

While( ) {

forall( data instances d) {

I = process(d)

R(I) = R(I) op d

}

…….

}

Page 5: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 5

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Previously on FREERIDE

• FREERIDE has been used for:– Apriori and FP-tree frequent item set mining,– KNN classification and decision tree

construction,– K-means and EM clustering,– Vortex Detection (IPDPS 2004).

• Will it work for a scientific mining task with a more complex processing structure?

Page 6: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 6

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Presentation Road Map

• Motivation for scalable datamining.• Description of middleware and functionality.• Description of defect detection and

categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.

Page 7: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 7

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Overview of Sequential Algorithm

• To understand the properties of the materials– How defects affect the materials?

• Data generated by Molecular Dynamics Simulation– Simulator by Physics Department (OSU)

• Main Tasks– Phase 1 – Defect Detection– Phase 2 – Defect Categorization

Page 8: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 8

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Example – Different shades represent different detected defects

Page 9: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 9

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Mapping detection/categorization to FREERIDE

FREERIDE Processing Stage Algo phases

Local Processing Node Processing

Global Combination Post Processing

Detection phase Rule Discovery

Defect Segmentation

Categorization phase

Moment Pruning

LCS Matching

Page 10: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 10

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Presentation Road Map

• Motivation for scalable datamining.• Description of middleware and functionality.• Description of sequential defect detection and

categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.

Page 11: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 11

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Key Parallelization Issues

• Challenges in detection phase stem from partitioning data into chunks:– detection on chunk boundaries,– joining multi-chunk defects.

• Categorization phase:

1. Load balancing is necessary for scalability.

2. Updating catalog with new classes needs to be efficient.

Page 12: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 12

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Detection Challenges

Page 13: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 13

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Intuitive (un-balanced) Categorization

P N

M

Increasing no. of nodes will increase “sequential” fraction.

Page 14: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 14

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Load Balanced Categorization

Approach has been tested with variable numberof multi-node defects.

Page 15: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 15

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Intuitive (sequential) Catalog Updates

“Catalog completeness” hasdirect effect on scalability.

Page 16: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 16

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Parallel Catalog Updates

Tested with different levels of “catalog completeness”.

Page 17: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 17

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Presentation Road Map

• Motivation for scalable datamining.• Description of middleware and functionality.• Description of sequential defect detection and

categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.

Page 18: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 18

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Experimental Results: Demonstrating Scalability

• Experimental results for up to 8 processing nodes.

• Experimental Platform:– Cluster (1-8) of 700

MHz Pentium machines– Connected through

Myrinet LANai 7.0– 1 GB memory each

node– Datasets ranging in

size from 133 MB to 1.8 GB

0

5000

10000

15000

20000

1 2 4 8Processing Nodes

Exe

cuti

on

Tim

e (s

ec)

Categorization

Detection

Breakdown of Total Execution time (1.8 GB)

Page 19: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 19

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

More Scalability Experiments

01000200030004000

Exe

cuti

on

T

ime

(sec

)

1 2 4 8

Processing Nodes

0/3 in db

1/3 in db

2/3 in db

3/3 in db

• 480 MB Dataset, 1-8 nodes

• Catalog completeness varies, but speedups remain near linear.

• More scalability experiments in paper.

Page 20: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 20

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Experimental Results: Evaluating Load Balancing

0

200

400

600

800

1000

1200

1400

1600

Exec

utio

n Ti

me

(sec

)

1 2 4 8

Processing Nodes

Parallel detection

Un-optimizedcategorization

Optimizedcategorization

0200400600800

100012001400160018002000

Exec

utio

n Ti

me (

sec)

1 2 4 8

Processing Nodes

Parallel detection

Un-optimizedcategorization

Optimizedcategorization

480 MB, 2/3 in db

480 MB, 0/3 in db

Optimized scales better!

Page 21: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 21

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Experimental Results: Parallel Matching Approach

Default implementation performs sequential categorization of the non-matching defects.

Optimized implementation:

1. parallel local catalog update,

2. merging of local catalogs on Master node,

3. finalizing local catalogs in parallel.

0

500

1000

1500

2000

Ex

ec

uti

on

Tim

e

(se

c)

1 2 4 8

Processing Nodes

0/3 in db1/3 in db2/3 in db3/3 in db

`

0

500

1000

1500

2000

Exe

cuti

on

Tim

e (s

ec)

1 2 4 8

Processing Nodes

0/3 in db

1/3 in db

2/3 in db

3/3 in db

Page 22: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 22

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Presentation Road Map

• Motivation for scalable datamining.• Description of middleware and functionality.• Description of sequential defect detection and

categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.

Page 23: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 23

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Conclusions

FREERIDE can be used to parallelize scientific mining algorithms with a more complex processing structure.

Scalability can be achieved with less programming effort than if a parallel application was “hand-coded”.

Parallel applications created using FREERIDE allow working efficiently with disk-resident datasets.

Our approaches to load balancing and to parallel categorization of non-matching defects perform better than naïve approaches to solving the posed problem.

Page 24: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Leonid GlimcherP. 24

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Questions?