Top Banner
Efficient Object Detection on GPUs using MB-LBP features and Random Forests Shalini Gupta, Nvidia
22

Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Apr 06, 2018

Download

Documents

hakien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Efficient Object Detection on GPUs using MB-LBP features and Random Forests Shalini Gupta, Nvidia

Page 2: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Problem overview

§ Accurate and real-time object (face) detection on the GPU

Page 3: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Applications

Smart photography Human-computer interaction

Page 4: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Windowed approach

(x, y)

(x, y)

Image pyramid Object/non-object pattern classifier

Final detections

20

24

Most popular algorithms: •  Viola and Jones, 2004. •  Zhang et al, 2007.

Page 5: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Existing solution – Features

§ Multi-block Local Binary Pattern (MB-LBP) features

20

24

w

h B0 B1 B2

B3 B4 B5

B6 B7 B8

(x0, y0)

0 1 0

1 0

1 0 1

01011010 Threshold

MB-LBP code

Average of block

Page 6: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Existing solution – Classifier

§ Adaptive boosting cascaded classifier

Stage 1 Stage 2 Stage 3

20

24

Sub-window

T T

F F

Rejected sub-windows

More stages ~15-20

Only 3x speedup on GPU

Page 7: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Proposed solution

§ MB-LBP features + Random Forest Classifier

1

2

4 5

3

6 7

1

2

4 5

3

6 7

1

2

4 5

3

6 7

1

2

4 5

3

6 7

1

2

4 5

3

6 7

1

2

4 5

3

6 7

Independent decision trees that vote Analogous to a committee of decision makers that don’t talk to each other.

Page 8: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Why random forests? § Well suited for GPUs

—  Massively parallel

—  Same amount of computation for each pixel

§ Previous work —  Face detection with HAAR features (Belle 2008)

—  Face Recognition (Belle 2008, Ghosal 2009)

—  Expression recognition (Fanelli et al., 2012)

§ Fast training

§ Possible to add recognition on top of detection

§ Online learning

Page 9: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Random forest training "   Train multiple independent decision trees

"   Each tree trained on a random subset of data selected via bagging

"   Randomly picked subset of features determine each split

P1

P2

P3 P4

P5

P6

F2

F1

F3

F4 F5

F6

F7 Select with repetition

P6

P6 F7

F5 P2

F5

F4

F2

F6 P2

P3

P1 F7 P6

P1 F7

Each feature represents a possible split Randomly picks features 1, 5 & 6 Feature 1 is better than 5 & 6 so is chosen for the split.

2 3

4 5 6

1

Page 10: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Training data

20x24 rotated and mirrored near frontal upright faces

Positive cases (~47K faces) Negative cases (~50K non-faces)

Randomly selected from 10K images

Page 11: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Feature Selection

1 2

4 5

3

6 7

20

24

§ All 5796 MB-LBP features —  Slow training

— Lower accuracy

§ Feature selection based on repeatability — Rejected features selected < 6 times in ~1K trees

— 2135 features selected

—  Improved accuracy

1 2

4 5

3

6 7

1 2

4 5

3

6 7 1 2 1000

w

h B0 B1 B2

B3 B4 B5

B6 B7 B8

(x0, y0)

Page 12: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Bootstrapping

Train

Positive Cases

Negative Cases

Find false positives

Append

Up to five stages of bootstrapping improved accuracy.

Page 13: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Classifier Parameters

§ Ordered decisions

§  Increasing number of features randomly selected for a split

§ 32 total trees

§ Tree depth of 5

Page 14: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

GPU (CUDA) Detector

GPU

MB-LBP features

RF classifier

CPU

Non-maxima suppression

CPU

Convert to gray

Resize

Integral image

>95% of computation

Page 15: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

CUDA Kernel

Shared memory

32

52

8 x 32 threads process 256

pixels (1pixel/thread)

Thread block

Decision trees in cache

1 2

4 5

3

6 7

Bank conflicts

•  Trees stored in BFS order as fixed height full binary trees

•  No execution branching while computing trees

Page 16: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Optimizations § For large images, skip every other pixel – 30% faster

§ Reducing bank conflicts by increased bank size and increased registers

§ 16 bit integral instead of 32 bit

§ Borders and small images on CPU

§ Memcopy and kernel temporal overlap

Page 17: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Non-maxima suppression

Final confidence = avg(confidence) + (no. of windows)/50 - improves accuracy

Page 18: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Accuracy Measured on the FDDB dataset – 2845 images containing 5171 faces

Hard cases

Page 19: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Performance (GK107 vs. core i7 – 3.0 GHz)

MB-LBP + Random Forest

MB-LBP + Cascaded AdaBoost

Haar + Cascaded AdaBoost

(Viola and Jones)

CPU (i7) single core 471 117 200

GPU (GK107) 22 42 100

Speed up 21.4 2.7 2

Image size 640 x 480

MB-LBP + Random Forest

MB-LBP + Cascaded AdaBoost

Haar + Cascaded AdaBoost

(Viola and Jones)

CPU (i7) single core 1752 526 1250

GPU (GK107) 95 175 425

Speed up 18.4 3 3

Image size 1280 x 960

Page 20: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

GPU utilization (GK107)

•  95% global efficiency, 5% overhead of loads from shared •  99.6% occupancy •  IPC ~3 •  Further speedup needs algorithmic changes

Page 21: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Conclusion

§ MB-LBP features + random forest classifiers for object detection

§ Feature selection technique

§ Optimized GPU (CUDA) detector implementation

§ Highly portable to GPUs (20x speedup)

Page 22: Efficient Object Detection on GPUs using MB-LBP …on-demand.gputechconf.com/gtc/2013/presentations/S3297-Efficient...MB-LBP features and Random Forests ... 1 0 1 0 1 01011010 Threshold

Questions?