Top Banner
Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung Leung and Minyi Guo* Department of Computer Science University of Otago, New Zealand * Department of Computer Science Shanghai Jiao Tong University, China
28

Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Feb 23, 2016

Download

Documents

Sukki Yoon

Performance Tuning on Multicore Systems for Feature Matching within Image Collections. Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang , Kai-Cheung Leung and Minyi Guo * Department of Computer Science University of Otago , New Zealand * Department of Computer Science - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Performance Tuning on Multicore Systems for

Feature Matching within Image Collections

Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung Leung and Minyi Guo*

Department of Computer Science University of Otago, New Zealand

* Department of Computer ScienceShanghai Jiao Tong University, China

Page 2: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Contents

• Motivation• Our work• Evaluation• Conclusion

Page 3: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Contents

• Motivation• Our work• Evaluation• Conclusion

Page 4: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Similarity Search

• Definition:– To preprocess a database of N objects so that

given a query object, one can effectively determine its nearest neighbors in database.

• Applications:– pattern recognition, chemical similarity

analysis, and statistical classification, etc.

Page 5: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

The problem – KNN Search

• K Nearest Neighbor Search:– Feature: an array of D elements

• f = [e1]

– Feature Space: a set of features• Fs= {f1}

– Feature Similarity: Euclidean distance• =sqrt(Σ(fi

m-fjm)2)

– Search: given a query feature fq, find k features in Fs so that they have the shortest distances to fq.

Page 6: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Our Case Study• Feature Matching: a fundamental problem in many

computer vision tasks– Use the SIFT algorithm to generate features for each image;– Use a k-Nearest Neighbors (k-NN) algorithm to find similar

features between images

Page 7: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Challenges

• Very time-consuming:– datasets become larger:

• hundreds or thousands of images;– image resolution increases:

• 2300×1500 pixels, or higher;

• New platforms: HPC turns to multi-/many-core age:

• AMD 16-core and 64-core machines.

Page 8: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Motivation

• Performance evaluation:– Find out common problems that may limit the

performance of feature matching on multi-/many-core platforms.

• Performance tuning:– Find general methods to solve the identified

problems.

Page 9: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Contents

• Motivation

• Our work• Evaluation• Conclusion

Page 10: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Data Distribution

10000 20000 30000 400000

5

10

15

20

25

30

0

100000

200000

300000

400000

500000

600000

700000

26 26 26

3

181124

420008

660949

146180

images features

feature size range

num

ber o

f im

ages

tota

l num

ber o

f fea

ture

s

Page 11: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Data Size

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 8005

1015202530354045

data size kd-tree size totalImage id

Siz

e (M

B)

Page 12: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Problems

• Unbalanced workload:– Levels of parallelism;– Scheduling policy.

• Poor last-level cache utilization:– Memory architecture.

Page 13: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Levels of parallelism

…….. ……..

Level_1

 

Level_2 Level_3

———————

Level_4

LinearKD-treeKmeansLSHOthers

Level_1&2

Reference Images Query Images Features

Page 14: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Scheduling policy

• OpenMP scheduling policy:– Static: the scheduler will assign an equal number

of tasks to each thread (not used);

– Dynamic: when one thread finishes its current task, it will take new tasks from the global task queue;

– Guided: chunk size is adjusted dynamically when tasks are requested from the task queue.

Page 15: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Memory architecture• More cores are sharing the memory and last-level

cache:– Memory bandwidth:

• AMD 16-core 12.8 GB/s• AMD 64-core 25.6 GB/s

– Last-level cache:• AMD 16-core 6 MB• AMD 64-core 16 MB

• Large images may not fit in cache and will cause many memory accesses, which leads to hitting the memory wall.

Page 16: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Divide-and-Merge

• We propose Divide-and-Merge:– Whole feature space is split into several

smaller sub-spaces;– Search each sub-space independently;– Merge their results.

Page 17: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Divide-and-Merge

Page 18: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Time complexity

• Accurate algorithms:– Brute force: – Apply DM:

• Approximate algorithms:– Randomized KD-Tree: – Apply DM:

Page 19: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Contents

• Motivation• Our work

• Evaluation• Conclusion

Page 20: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Hardware and Software configuration

Name CPU Cache Memory OS Compiler

AMD 16-core(AMD16)

AMD Opteron Processor

83804 cores × 4 @ 2.5 GHz

L1: 128 KB,L2: 512 KB,L3: 6144 KB

16 GiB, DDR2 800 MHz12.8 GB/s

Ubuntu 12.04.1 g++-4.4

AMD 64-core(AMD64)

AMD Opteron Processor

62768 cores × 8 @ 2.3 GHz

L1: 48 KB,L2: 1000 KB,

L3: 16384 KB

64 GiB, DDR3 1333

MHz21.32 GB/s

Ubuntu 12.04.1 g++-4.4

Environment:OpenCV + OpenMP: one of the most frequently used setup for computer vision researchers to utilize parallel platforms

Page 21: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Levels of parallelism

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

2

4

6

8

10

12

Level_1 Level_2 Level_3 Level_1&2

Scalability

Page 22: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Scheduling policy(on level_1&2)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

2

4

6

8

10

12

d1 d2 d4 guided

Scalability

Page 23: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Scheduling policy(on level_3)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

2

4

6

8

10

12

14

d1 d2 d4 guided

Scalability

Page 24: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Memory architecture

1. Original Execution

2. Apply Divide-and-Merge

Page 25: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Evaluation on Manawatu Dataset

1 4 8 121620242832364044485256606405101520253035404550

Level_3 Level_3_DMLevel_1&2 Level_1&2_DM

Scalability

1 4 8 12162024283236404448525660640

5

10

15

20

25

Level_3 Level_3_DMLevel_1&2 Level_1&2_DM

Speedup

Page 26: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Evaluation on Manawatu Dataset

1 4 8 121620242832364044485256606405101520253035404550

Level_3 Level_3_DMLevel_1&2 Level_1&2_DM

Scalability

1 4 8 12162024283236404448525660640

2

4

6

8

10

12

14

Level_3 Level_3_DMLevel_1&2 Level_1&2_DM

Speedup

Page 27: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Contents

• Motivation• Our work• Evaluation

• Conclusion

Page 28: Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Conclusion• We have shown that performance tuning is

demanding on modern multicore systems.

• We have comprehensively evaluated the impact of the three factors that have an influence on large-scale image feature matching.

• We have proposed a Divide-and-Merge algorithm that can greatly improve the speedup and scalability of feature matching algorithms on multicore machines.