Top Banner
Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platforms 1 Department of Computer and Communications Engineering University of Thessaly Volos, Greece Konstantis Daloukas 1 Christos D. Antonopoulos 1 Nikolaos Bellas 1 Sek M. Chai 2 2 Motorola Inc. Schaumburg, IL, USA
30

Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

Mar 30, 2018

Download

Documents

phamnga
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator

Platforms

1Department of Computer and Communications Engineering

University of Thessaly Volos, Greece

Konstantis

Daloukas1

Christos D.

Antonopoulos1

Nikolaos

Bellas1

Sek M.

Chai2

2Motorola Inc. Schaumburg, IL, USA

Page 2: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 2

Introduction

A. Conventional rectilinear lens

B. Full-frame fisheye lens 98 degrees horizontal

by 147 degrees vertical

Wide-angle lenses (a.k.a. fisheye lenses) are traditionally used to enlarge the field of view in photography

C. Full circular fisheye lens 180 degrees horizontal

and vertical

Page 3: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 3

Introduction

• Main Applications

– Meteorology

– Astronomy

– Robot Navigation

– Video Surveillance

– Video Conferencing

– Digital Cameras

• The incoming rays are mapped onto a spherical surface

• Such mapping introduces barrel distortion

Page 4: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 4

Motivation

• Explore the mapping of the algorithm’s inherent parallelism on three contemporary platforms:

– x86 Chip Multiprocessor (Core 2 Quad)

– Cell B.E. processor

– Virtex-4 FPGA

• Present a detailed characterization of the performance using both high- and low-level metrics

Page 5: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 5

Outline

• Introduction

• Wide-angle Lenses Distortion Correction Algorithm

• Description of Target Platforms

• Algorithm Optimizations

• Performance Evaluation

• Conclusions

Page 6: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 6

Wide-angle Lenses Distortion Correction

Transformation of the distorted wide-angle images back to the central perspective space.

Page 7: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 7

Projection Model of Wide-angle Lenses

Wide-angle Projection Central Perspective

Projection

Page 8: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 8

Algorithmic Flow (A) • Inverse Mapping: Maps each image point (i, j) to the

corresponding point (x, y) in the wide-angle space

1333231

232221

131211

j

i

rrr

rrr

rrr

Z

Y

X

c

c

c

hx xd

Xc

Yc

Zc

YcXca

R

x

1

)()(tan

2

2

22

hy yd

Yc

Xc

Zc

YcXca

R

y

1

)()(tan

2

2

22

Page 9: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 9

Algorithmic Flow (A)

• Need to approximate the value of fractional positions in the fisheye space

• Complex memory access pattern

Page 10: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 10

Algorithmic Flow (B) • Bicubic Interpolation: uses a 4x4 window of pixels to

approximate intermediate points

Page 11: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 11

Algorithmic Flow (B) • Bicubic interpolation is broken into horizontal and vertical 1D interpolation

• Ci are the pixel values

2)()(

2)43()(

2)253()(

2)2()(

)(*)(*)(*)(*)(

234

233

232

231

44332211

sssU

ssssU

sssU

ssssU

sUCsUCsUCsUCxg

s

t

2)()(

2)43()(

2)253()(

2)2()(

)(*)()(*)()(*)()(*)()(

234

233

232

231

44332211,

tttV

ttttV

tttV

ttttV

tVxgtVxgtVxgtVxgyxG

Page 12: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 12

Complete Algorithm

For each pixel (i, j) in the central perspective space {

Apply inverse mapping to find fractional

coordinates (x, y) in the wide-angle space

Use bicubic interpolation to approximate the pixel

value at (x,y)

}

Apply a 2D low pass filter and downscale

output image to VGA resolution (640x480)

Page 13: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 13

Outline

• Introduction

• Wide-angle Lenses Distortion Correction Algorithm

• Description of Target Platforms

• Algorithm Optimizations

• Performance Evaluation

• Conclusions

Page 14: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 14

Intel Core 2 Quad

• A mainstream homogeneous multicore system

• 2.5 GHz operating frequency

• 1.3 GHz FSB

• Organized as two independent dual core processor blocks

• 3MB L2 cache for each block

• 64KB L1 cache for each processor

• Supports the SSE 4.1 vector instruction set

Page 15: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 15

Cell Broadband Engine

• A heterogeneous multicore processor

• Integrates a 2-way SMT PPC and 8 SPEs

• 3.2 GHz operating frequency

• Each SPE contains: – A 128-bit wide SIMD execution engine

– 256KB private Local Store

• On-chip network (EIB) with 307.2 GBps peak perf.

• Peak Performance: – 204.8 GFlops for single-precision

– 14.63 GFlops for double-precision

Page 16: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 16

Virtex-4 LX80 FPGA

• Arrays of uncommitted logic blocks • Flexibility in tailoring the architecture to match the

application • High power efficiency • Virtex-4 LX80:

– 80,640 logic cells – 62.5 MHz operating frequency

• Main drawbacks: – Programmed primarily with HDLs – Low clock frequency

• Correction module generated using the Proteus architectural synthesis tool

Page 17: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 17

Proteus

• Produces hardware accelerators that follow the streaming paradigm – Produces several load/store units and the

datapath as well

• The application is expressed using an assembly-like streaming DFG

• Source code is modulo-scheduled with II = 2

• Generate 100K lines of synthesizable Verilog from 800 lines of code

Page 18: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 18

Outline

• Introduction

• Wide-angle Lenses Distortion Correction Algorithm

• Description of Target Platforms

• Algorithm Optimizations

• Performance Evaluation

• Conclusions

Page 19: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 19

High-Level Optimizations • Block Tiling

– Partition the output image in blocks and correct a block of pixels at a time

– Alleviates the problem of prefetching

– Facilitates efficient data partitioning (x86 and Cell) and task-level pipelining (FPGA)

atan(x)

√x

1/xInverse

Mapping

Coordinates

Input pixels

Y pixels

Cb/Cr pixelsDMA

BicubicInterpolation

Y,CbCrVertical low pass filter

Y,CbCrHorizontal

low pass filter

Output

pixels

Page 20: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 20

Low-Level Optimizations • x86 and Cell:

– SIMD Optimization

– Explicit loop unrolling

– Eliminate pipeline stalls from data dependencies

r1 1 2 3 4

r2

r3

r4

5 6

r1

r2

r3

r4

1

2

3

4

7 8

9 10 11 12

13 14 15 16

5 9 13

6 10 14

7 11 15

8 12 16

Page 21: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 21

Low-Level Optimizations

• x86 and Cell:

– Inverse-mapping amortization

• Cell-specific:

– Manual instruction scheduling

• FPGA

– Modulo scheduling with II = 2

– 400 sDFG operations in all pipeline stages

Page 22: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 22

Outline

• Introduction

• Wide-angle Lenses Distortion Correction Algorithm

• Description of Target Platforms

• Algorithm Optimizations

• Performance Evaluation

• Conclusions

Page 23: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 23

Performance and Scalability Analysis

0

5

10

15

20

25

30

35

40

Only PPE 1 SPE 2 SPE 4 SPE 8 SPE 1T 2T 4T Virtex-4

LX80

Cell Core 2 Quad FPGA

Pro

cess

ing

Sp

ee

d (

Fram

es/

Sec)

Inverse Mapping Amortization

HL+LL optimizations

HL optimizations

0.55 fps

3.83 fps

7.86 fps

14.95 fps

29.94 fps

3.70 fps

8.01 fps

15.82 fps

22.28 fps

Page 24: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 24

Performance and Scalability Analysis

0%

20%

40%

60%

80%

100%

On

ly P

PE

HL,

1 S

PE

HL,

2 S

PE

HL,

4 S

PE

HL,

8 S

PE

HL+

LL, 1

SP

E

HL+

LL, 2

SP

E

HL+

LL, 4

SP

E

HL+

LL, 8

SP

E

IMA

, 1 S

PE

IMA

, 2 S

PE

IMA

, 4 S

PE

IMA

, 8 S

PE

HL,

1T

HL,

2T

HL,

4T

HL+

LL, 1

T

HL+

LL, 2

T

HL+

LL, 4

T

Vir

tex-

4 L

X8

0

Cell Core 2 Quad FPGA

Mo

du

le R

un

tim

e B

reak

do

wn

Inverse Mapping Bicubic Interpolation Low Pass Filter

Page 25: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 25

Memory Performance Average Off-Chip Bandwidth

0

50

100

150

200

250

300

350

400

Cell Core2 Quad Cell Core2 Quad Virtex-4 LX

80

Cell

HL optimizations HL + LL optimizations IMA

MB

ytes

/sec

8 threads

4 threads

2 threads

1 thread

Page 26: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 26

Stall Cycles Stall Cycles

0

0,5

1

1,5

2

2,5

Total Branch

Misses

Resource

Related

(LD/ST)

Total Branch

Misses

LS Busy

Core2 Quad Cell

Billi

on C

ycle

s (cu

mm

ulat

ive)

HL optimizations

HL + LL optimizations

IMA

Page 27: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 27

Development Cost

• A significant factor that must be considered

– One aspect in the comparison of programming models in the three platforms

– Use Lines-of-Code (LOC) as the primary metric

• Initial single-threaded version: 800 lines

• Fully-optimized version for x86: extra 500 LOC

• Fully-optimized version for Cell: extra 1500 LOC

• FPGA Implementation: 800 assembly-like LOC

– Requires multiple time-consuming synthesis and Place & Route iterations

Page 28: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 28

Outline

• Introduction

• Wide-angle Lenses Distortion Correction Algorithm

• Description of Target Platforms

• Algorithm Optimizations

• Performance Evaluation

• Conclusions

Page 29: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 29

Conclusions

• Presented the implementation of a real-time image warping algorithm – Analyzed and characterized the performance on all

underlying architectures – Applied a series of optimizations and identified their effect

• Commercially available general purpose multi-cores not capable of handling real-time distortion correction

• Exotic architectures such as Cell or FPGAs offer the necessary computational power – Significantly higher development cost – Advanced tools, development models and support

environments can alleviate this effort

Page 30: Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platformsinf-server.inf.uth.gr/~kodalouk/presentations/Daloukas... ·  · 2017-09-06Fisheye Lens Distortion

April 20, 2010 IPDPS 2010 30

Acknowledgements

• We would like to thank Barcelona Supercomputing Center for providing us with access to their IBM QS20 blade

• This project is partially supported by the EC Marie Curie International Reintegration Grant (IRG) 223819