IMAGE SEGMENTATION USING HARDWARE FOREST CLASSIFIERS Neil Pittman & Alessandro Forin, Microsoft Research Antonio Criminisi & Jamie Shotton, Microsoft Research Cambridge Atabak Mahram, Boston University
IMAGE SEGMENTATION USING HARDWARE
FOREST CLASSIFIERS
Neil Pittman & Alessandro Forin, Microsoft ResearchAntonio Criminisi & Jamie Shotton, Microsoft Research Cambridge
Atabak Mahram, Boston University
Kinect Pipeline
Depth BGR Body Parts
Mod. FitCent. Skeletons
Depth ImageBackground RemovalBody Part ClassificationCentroid CalculationModel FittingUser Skeletons
Partitioning – Xbox & K4WSensor
(Kinect)SW Host
(PC/Xbox/SoC)HW Accelerator(FPGA/GPU)
Application(PC/Xbox)
Depth BGR Body Parts
Cent.Mod. Fit.
Skeleton
Partitioning – What we WantSensor
(Kinect)SW Host
(PC/Xbox/SoC)HW Accelerator(FPGA/GPU)
Application(PC/Xbox)
Depth BGR
Body Parts
Cent. Mod. Fit. Skeleton
What we Want – ‘noBGR’ Directly connect sensor to compute unit. Offload the computation to the device. Lower power for embedded and mobile. Higher Frame rate for next generation apps.
We want hardware Background Removal (BGR)!!!
Background Removal (BGR) Grows moving/active
pixels into islands using Connected Components Algorithm.
Uses history and complex rules for merging and splitting islands into player mask.
Highly sequential per pixel comparisons to its neighbors undesirable for hardware implementation.
Background Removal (BGR) BGR represents a
large computational work load.
Required to classify pixels into one of two classes: player or not player.
We have hardware that can classify pixels into 31 classes: FPGA Forest Fire
CPU BGR fpsIntel Atom 14.3
Arm Cortex 7.0
Forest Fire A random tree based
classification algorithm. Starting at the root, a pixel
traverses each tree to a leaf. Decision of left or right child
is based on an evaluation function.
Each Leaf contains probabilities for each class.
Results of each tree are aggregated for final result.
‘noBGR’ Hypothesis:
BGR is simply classifying pixels as player or background.
If Forest Fire can be trained to classify human body parts, it can be trained to classify player from background.
We have an efficient hardware implementation of Forest Fire for the FPGA.
We can implement BGR functionality using FPGA Forest Fire.
Two Experiments – Baseline
ConnectedComponents
(BGR)
Upsample & Tag
RANSACFloor Decimate
Subsample Forest Fire(Body Parts)
K-meansCentroids
ModelFit
Two Experiments – One Stage
Forest Fire(BGR &
Body Parts)
K-meansFloor
Subsample K-meansCentroids
ModelFitPre-ModelFit
Two Experiments – Two Stage
Forest Fire(BGR)
Upsample/Decimate
K-meansFloor
Subsample Forest Fire(Body Parts)
K-meansCentroids
ModelFitPre-ModelFit
Floor Calculation Forest Fire classifies
the floor pixels of the input image.
Floor plane is calculated based on the centroids of these pixels.
Represented by normal vector where player is standing.
Floor Calculation
Replaces the function of the SW BGR. Finds floor plane for Model Fitting Stage. Implemented in hardware using both
RANSAC and k-means algorithms.
Algorithm Floors Detected
Percentage of Total
Inclination Error (deg)
Azimuth Error (deg)
Float RANSAC 2,366 80.9 20.5 9.9
Integer RANSAC 2,483 84.9 2.5 6.3
k-Means 2,912 99.6 11.0 11.8
Player Tagging
Software BGR Hardware BGR
The BGR software partitions the foreground mask into player masks.
The BGR hardware outputs a single foreground mask. All foreground pixels and their resulting centroids are labeled ‘player 1’.
Player Tagging Model Fitting requires body parts be
assigned to individual players. Pre-ModelFit partitions the centroids by
player using a heuristic.
ResultsInput Depth
SWBGR Forest Fire BGR Forest Fire Floor Skel.
Results
Baseline Hardware (Two Stage) Ground Truth
Results
Suite 94* Suite 97** Suite 111***0
10
20
30
40
50
60
70
80
90
Baseline
One Stage
Two Stage, Depth 16
Two Stage, Depth 10
Percentage Difference from Ground Truth
*standing with varied scenes, **standing with similar scenes, ***seated and furniture.
Power ComparisonPlatform w/Kinect Power (W)PC 162.6
Xbox 92.1
Xilinx ML605 25.7
Digilent ZedBoard 9.0
The Kinect Sensor is powered by the Xbox via USB. It is powered by an external power supply with the other platforms. The Kinect sensor alone draws 3.5 W. This is added to those platforms’ system power in the table above.
FPGA Utilization – One Stage
LUTs % FF % BRAM %Full System 37470 24.9 31796 10.6 27 6.49
Forest Fire Core 30129 20.0 23198 7.69 5 1.2
Sorting FIFO 425 0.28 428 0.14 2 0.48
DDR3 Controller 5488 3.64 7496 2.49 0 0.0
PC Interface 1852 1.23 1101 0.37 22 5.29
Input Buffer 76 0.05 42 0.01 11 2.64
Output Buffer 0 0.0 0 0.0 8 1.92
Using Virtex 6 240t (xv6vlx240t-1ff1156)
FPGA Utilization – Two Stage
LUTs % FF % BRAM %Full System 67236 44.6 55405 18.4 32 7.69
Forest Fire Two Instances 60611 40.2 46614 15.5 10 2.4
Forest Fire Core0 29888 19.8 23275 7.72 5 1.2
Forest Fire Core1 29773 19.8 23337 7.74 5 1.2
DDR3 Controller 4746 3.15 7686 2.55 0 0.0
PC Interface 1876 1.24 1101 0.37 22 5.29
Input Buffer 76 0.05 42 0.01 11 2.64
Output Buffer 0 0.0 0 0.0 8 1.92
Using Virtex 6 240t (xv6vlx240t-1ff1156)
Thank You. Questions?
Neil Pittman, [email protected] Forin, [email protected]
Demo Night Please come see our
Hand Tracking Demo.
BACKUP
Kinect Pipeline
Depth BGR Body Parts
Mod. FitCent. Skeletons
One Stage Database: 32 classes, 3 Trees, 20
Levels. Input: Raw Depth Image. Output: Pixels Tagged with Body Parts
and Floor. Average Performance: ≈ 56 fps
Two Stage, Core0 Database: 3 classes, 3 Trees, 16 Levels. Input: Raw Depth Image. Output: Pixels Tagged with Player, Non-
Player and Floor. Average Performance: ≈ 200+ fps if
subsampled.
Two Stage, Core1 Database: 31 classes, 3 Trees, 20
Levels. Input: Depth Image filtered by Player
Tags. Output: Pixels Tagged with Body Parts. Average Performance: ≈ 200+ fps