Top Banner
Real-time Signal Processing on Embedded Systems Advanced Cutting-edge Research Seminar I&III
49

Real-time Signal Processing on Embedded Systems

Feb 23, 2016

Download

Documents

neka

Real-time Signal Processing on Embedded Systems. Advanced Cutting-edge Research Seminar I&III. Practical Applications. Pedestrian Detection FPGA-based system Pedestrian Tracking GPU-based system. Hardware Architecture for High-Accuracy Real-Time Pedestrian Detection with CoHOG Features. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Real-time Signal Processing on Embedded Systems

Real-time Signal Processing on Embedded Systems

Advanced Cutting-edge Research Seminar I&III

Page 2: Real-time Signal Processing on Embedded Systems

Practical Applications Pedestrian Detection

FPGA-based system Pedestrian Tracking

GPU-based system

Page 3: Real-time Signal Processing on Embedded Systems

Hardware Architecture forHigh-Accuracy Real-Time Pedestrian Detection with CoHOG Features

Page 4: Real-time Signal Processing on Embedded Systems

Outline Introduction Pedestrian detection using CoHOG features Proposed hardware architecture

Parallel execution Merging histogram calculation and SVM prediction

FPGA implementation Conclusion

Page 5: Real-time Signal Processing on Embedded Systems

Outline Introduction Pedestrian detection using CoHOG features Proposed hardware architecture

Parallel execution Merging histogram calculation and SVM prediction

FPGA implementation Conclusion

Page 6: Real-time Signal Processing on Embedded Systems

Pedestrian detection on automotive systems

Challenges: Various appearances of pedestrians

…Clothes’ shape and color, pose, etc. Template-base or simple gradient-base method does not

perform high-accuracy recognition

Viewpoint movement…all objects in an image are moving Background subtraction or

frame subtraction cannot be usedA robust recognition method

suitable for pedestrians is required

Page 7: Real-time Signal Processing on Embedded Systems

Pedestrian detection algorithms

Recent trend: Combination of gradients and histograms

Gradient: robust for illumination and color change Histogram: robust for deformation

Examples Histograms of oriented gradients (HOG)

Co-occurrence histograms of oriented gradients (CoHOG)* HOG-based method Using pairs of oriented gradients

One of today’s best algorithms for pedestrian detection However, Real-time execution is difficult to be achieved by

software implementation(e.g. a few seconds are required for processing on a 320x240 image)

* T. Watanabe, S. Ito, and K. Yokoi, “Co-occurrence histograms of oriented gradients for pedestrian detection,” PSIVT2009

Specialized hardware for real-time processing

Page 8: Real-time Signal Processing on Embedded Systems

Outline Introduction Pedestrian detection using CoHOG features Proposed hardware architecture

Parallel execution Merging histogram calculation and SVM prediction

FPGA implementation Conclusion

Page 9: Real-time Signal Processing on Embedded Systems

Outline Introduction Pedestrian detection using CoHOG features Proposed hardware architecture

Parallel execution Merging histogram calculation and SVM prediction

FPGA implementation Conclusion

Page 10: Real-time Signal Processing on Embedded Systems

Pedestrian detection using CoHOG

Classified by SVM

Calculate gradient orientations

Pick up pairwise pixels

Divide into small regions (BLOCKS)

Calculateco-occurrence histograms

Repeat for various positions of pixel pairs(called as OFFSETS)

Gradient orientations

Offset 1

Offset 2

CoHOG feature vector

Variations of offsets(31 offsets)

Co-occurrence histogram of oriented

gradients

Page 11: Real-time Signal Processing on Embedded Systems

Detection procedure Sliding window approach

Feature vectors are extracted in a scan line order.

Image size or window size is scaled to detect pedestrians in another scale.

Page 12: Real-time Signal Processing on Embedded Systems

Outline Introduction Pedestrian detection using CoHOG features Proposed hardware architecture

Parallel execution Merging histogram calculation and SVM prediction

FPGA implementation Conclusion

Page 13: Real-time Signal Processing on Embedded Systems

Parallel execution ofCoHOG feature calculation

Large number of co-occurrence histograms must be calculated → All histograms can be calculated in parallel

Offsets 31 parallel threads

Blocks Horizontal : 6 parallel threads Vertical: 12 parallel threads

Large parallelism

We execute31 parallel offsets and6 horizontal block-threads=186 parallel threads

Offset variations: 31

Block number: 6x12=72

Processing performance is drastically improved!

Page 14: Real-time Signal Processing on Embedded Systems

Merging histogram calculation and SVM prediction

Dimensions of CoHOG feature vector is very high 64×31offsets×72blocks=about 140k dimensions Large memory is required to store the feature vector Many multiplications must be executed during

SVM prediction f(x)=sign(w・ x+b)

Our proposal:Execute histogram calculationand SVM prediction simultaneously

Matrix size: 8x8=64

Block number: 6x12=72Offset variations: 31

Page 15: Real-time Signal Processing on Embedded Systems

Merging histogram calculation and SVM prediction

Straightforward approach

+1+

1+1

×wi,j ×wi,j ×wi,j×wi,j

+

Scan image

+1 to a corresponding bin

Histogram is generated

Inner product is calculated for SVM prediction

Weighting vector values

image,

,

otherwise,0)( are nsorientatio if,1

)(

i,jx

x

ji

jix

i

j

i j

ji

jiji xw,

,, )(xw

Histogram calculation

SVM prediction

Page 16: Real-time Signal Processing on Embedded Systems

Merging histogram calculation and SVM prediction

Proposed method

+wi,j+wi,j

+wi,j

+

Scan image

Directly accumulate weighting vector values

Circuit size can be drastically reduced!

Large memory to store histograms and many multipliers for SVM prediction are unnecessary

i j

ji

ji

jiji

i,jw

i,jw

, image

,

, image,

otherwise,0)( are nsorientatio if,

otherwise,0)( are nsorientatio if,1

image,

,

otherwise,0)( are nsorientatio if,1

)(

i,jx

x

ji

jix

ji

jiji xw,

,, )(xw

Histogram calculation

SVM prediction

Page 17: Real-time Signal Processing on Embedded Systems

Proposed architectureInput image

Line buffers

Gradient orientation image generator

Sobel filter (horizontal)

Sobel filter (vertical)

Orientation classifier Frame

bufferWxH

Controller

Sub-window data

Combined module forhistogram calculation and SVM predictionShift

registers

Accumulator Result

s

6 blocks

31 offsets

Weighting vector ROMs

Page 18: Real-time Signal Processing on Embedded Systems

Proposed architecture

Parallel execution 31 offsets×6 blocks

= 186 parallel threads Merging histogram calculation and SVM

prediction No histogram memory and multipliers Only weighting vector ROMs and an

accumulator

Input image

Line buffers

Gradient orientation image generator

Sobel filter (horizontal)

Sobel filter (vertical)

Orientation classifier Frame

bufferWxH

Controller

Sub-window data

Combined module forhistogram calculation and SVM predictionShift

registers

Accumulator Result

s

6 blocks

31 offsets

Weighting vector ROMs

Efficient hardware architecture is successfully designed by using proposed

methods

Page 19: Real-time Signal Processing on Embedded Systems

Outline Introduction Pedestrian detection using CoHOG features Proposed hardware architecture

Parallel execution Merging histogram calculation and SVM prediction

FPGA implementation Conclusion

Page 20: Real-time Signal Processing on Embedded Systems

FPGA implementation Implementation result

Target FPGA: Xilinx Virtex-5 XC5VLS330T-2Device name Used Available Utilizatio

nNumber of Slice RegistersNumber of Slice LUTsNumber of occupied SlicesNumber of BlockRAMTotal Memory used (KB)Number of DSP48Es

5,98028,495

8,58061

2,1962

207,360207,360

51,840324

11,664192

2%13%16%18%18%1%

Max delay: 5.997ns (Max frequency: 167MHz)Our system can process139,166 sub-windows / second

Intel Core i7 3.2GHz:about 1,100 sub-windows / second More than 100 times

faster!

Capable for real-time processing on 38 fps 320x240 video sequence

20

Page 21: Real-time Signal Processing on Embedded Systems

Pedestrian detection system

FPGA board Receives input images from

host PC, and returns results of pedestrian detection

Xilinx Virtex-5 FPGA LX330T PCI Express endpoint DDR2 memory

Host PC Transfers images captured

by a camera, and displays detection results

CPU: Intel Core i7 3.2GHz Camera: USB webcam

(640x480 resolution)

PCI Express

Detection result

Page 22: Real-time Signal Processing on Embedded Systems

Outline Introduction Pedestrian detection using CoHOG features Proposed hardware architecture

Parallel execution Merging histogram calculation and SVM prediction

FPGA implementation Conclusion

Page 23: Real-time Signal Processing on Embedded Systems

Conclusion High-performance and efficient

hardware architecture for CoHOG-based pedestrian detection is proposed Effectively exploits parallelism in CoHOG

algorithm→ 186 parallel processing is realized

Drastically reduces circuit area (memory and multipliers) by proposing simultaneous execution of histogram calculation and SVM prediction

Achieves more than 100 times faster processing by FPGA implementation than CPU→ Capable for real-time processing on 38 fps 320x240 video sequence

Page 24: Real-time Signal Processing on Embedded Systems

Parallel Implementation of Pedestrian Tracking Using Multiple Cues on GPGPU

Page 25: Real-time Signal Processing on Embedded Systems

Outline Introduction Pedestrian Tracking using Multiple

Cues Parallel Implementation on NVIDIA

GPU Conclusion

Page 26: Real-time Signal Processing on Embedded Systems

Outline Introduction Pedestrian Tracking using Multiple

Cues Parallel Implementation on NVIDIA

GPU Conclusion

Page 27: Real-time Signal Processing on Embedded Systems

Introduction Pedestrian recognition

Detection Tracking

Detection Tracking

Combination of 2 steps

Scan entire image

Track the pedestrians over

the frames

Input image

Page 28: Real-time Signal Processing on Embedded Systems

Introduction Pedestrian Tracking

Particle Filter HSV color histogram (K. Okuma et.al.,

ECCV2004)

Simple background Complex backgroundHSV histogram within the rectangle

Succeed to track Fail to track

Page 29: Real-time Signal Processing on Embedded Systems

IntroductionColor information

HSV histogram HSV histogram

Shape information

Combining both color and shape information

Red shirt

Red carGray

gnd. Gray gnd.

Page 30: Real-time Signal Processing on Embedded Systems

Introduction The contributions of this paper

New pedestrian tracking algorithm using both color and shape information based on particle filters

Parallel implementation on GPGPU for real-time processing

Page 31: Real-time Signal Processing on Embedded Systems

Outline Introduction Pedestrian Tracking using Multiple

Cues Parallel Implementation on NVIDIA

GPU Conclusion

Page 32: Real-time Signal Processing on Embedded Systems

Particle Filter (pedestrian tracking)

Current frame (time t-1)

Particle

Prediction

MeasurementRe-sampling (time t)

Scatter particles

Measure the pedestrian likelihood

Eliminate low likelihood particles and replicate high likelihood particles.

Page 33: Real-time Signal Processing on Embedded Systems

Particle Filter (pedestrian tracking)

Current frame

Particle

Prediction

MeasurementRe-sampling

To define pedestrian likelihood,we useShape information…HOG featureColor information…HSV histogram

Page 34: Real-time Signal Processing on Embedded Systems

Histograms of Oriented Gradients

Represent object shape information

HOG Feature spacePedestrian

Non-pedestrian

Discriminant borderHOG

Calculate gradient orientationAggregate gradient orientation of each block

Map the vector on the feature spaceLearn beforehand by SVM

Page 35: Real-time Signal Processing on Embedded Systems

HSV Histogram

HSV color space

Represent object color information Convert an input image into a

HSV image Calculate a HSV hist. Calculate a Bhattacharyya dist.

Input imageHSV histogram HSV feature space

Reference HSV hist.

Bhattacharyya distance

HSV

Hue

Saturation

Value

Page 36: Real-time Signal Processing on Embedded Systems

Pedestrian tracking using multiple cues

HOG feature spacePedestrian

Non-pedestrian

MeasurementPrediction

HSV feature spaceReference HSV hist.

HSVHOG

Pedestrian likelihood)()1()( HSVHOG gccf

Weighted coefficient [0,1]

Existing algorithm

Page 37: Real-time Signal Processing on Embedded Systems

Tracking results HOG+HSV (our proposed

algorithm) HSV only (K. Okuma et.al.,

ECCV2004) HOG only

Page 38: Real-time Signal Processing on Embedded Systems

Outline Introduction Pedestrian Tracking using Multiple

Cues Parallel Implementation on NVIDIA

GPU Conclusion

Page 39: Real-time Signal Processing on Embedded Systems

NVIDIA GPU architecture Streaming

multiprocessors (SM)

32-bit scalar processors (SP)

Shared memory Read only cache Device memoryIn case of Tesla C1060,•4GB Device memory•30 streaming multiprocessors (total 240 SPs)•1.3 GHz processor clock

SP SPSP SPSP SPSP SPShrd mem

Cache

SM

Device memory

SP SPSP SPSP SPSP SPShrd mem

Cache

SMSP SPSP SPSP SPSP SPShrd mem

Cache

SM

Page 40: Real-time Signal Processing on Embedded Systems

Implementation strategy

Run measurement process on GPU. Almost 99% computation time

Current frame Prediction

Re-sampling

SP SPSP SPSP SPSP SPShrd mem

Cache

SM

Device memory

SP SPSP SPSP SPSP SPShrd mem

Cache

SMSP SPSP SPSP SPSP SPShrd mem

Cache

SM

Measurement

Page 41: Real-time Signal Processing on Embedded Systems

Implementation strategy

Allocate each particle on SM Independent process of each particle

Current frame Prediction

Re-sampling

SP SPSP SPSP SPSP SPShrd mem

Cache

SM

Device memory

SP SPSP SPSP SPSP SPShrd mem

Cache

SMSP SPSP SPSP SPSP SPShrd mem

Cache

SM

Measurement

Page 42: Real-time Signal Processing on Embedded Systems

Implementation strategy

Exploit pixel level parallelism on SPs Sync. among SPs is fast.

Current frame Prediction

Re-sampling

SP SPSP SPSP SPSP SPShrd mem

Cache

SM

Device memory

SP SPSP SPSP SPSP SPShrd mem

Cache

SMSP SPSP SPSP SPSP SPShrd mem

Cache

SM

Measurement

Page 43: Real-time Signal Processing on Embedded Systems

HSV likelihood calculation

Input imageHSV histogram HSV feature space

Reference HSV hist.

Bhattacharyya distance

HSV

Allocate each particle calculation to the SM

Sum all the histogramsCalculate the

Bhattacharyya dist.

Transfer the results to the CPU memoryCalculate HSV

histogram on SPs per line

Page 44: Real-time Signal Processing on Embedded Systems

HOG likelihood calculation

HOG Feature spacePedestrian

Non-pedestrian

Discriminant borderHOG

Allocate each particle calculation to the SM

Calculate grad. and angle on SPs

Calculate the distance to the discriminant border

Sum histogramsCalculate HOG

histogram on SPs per some pixels

Transfer the results to the CPU memory

Page 45: Real-time Signal Processing on Embedded Systems

Processing time GPU: NVIDIA Tesla C1060

Number of multiprocessors: 30 Total number of scalar processors:

240 Comparing Intel Core i7 965 @ 3.2

GHz

Core i7 Tesla C1060

020406080

100120140

processing time per frame[ms]

13.9 times faster

113.6 fps

Page 46: Real-time Signal Processing on Embedded Systems

Outline Introduction Pedestrian Tracking using Multiple

Cues Parallel Implementation on NVIDIA

GPU Conclusion

Page 47: Real-time Signal Processing on Embedded Systems

Conclusion Pedestrian tracking algorithm

using HSV and HOG features is proposed

Real-time processing can be achieved by the parallel implementation using NVIDIA GPU

Page 48: Real-time Signal Processing on Embedded Systems

Report subject (not mandatory)

What do you think about the advance of signal processing on embedded systems in the future? Please submit the report by email to

[email protected]. Please write your student ID and

name. Deadline: Feb 3rd 17:00

Page 49: Real-time Signal Processing on Embedded Systems

レポート課題 ( 必須ではない )

組込みシステムにおける信号処理の今後について自由に述べよ ( 応用でも、やりたいことでも何でも OK) 提出先 [email protected] ID と名前をメール本文に明記すること。 締切 2/3 17:00