Top Banner
HIPI: Computer Vision at Large Scale Chris Sweeny Liu Liu
17
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HIPI

HIPI: Computer Vision atLarge Scale

Chris SweenyLiu Liu

Page 2: HIPI

Intro to MapReduceSIMD at ScaleMapper / Reducer

Page 3: HIPI

MapReduce, Main TakeawayData Centric, Data Centric, Data Centric!

Page 4: HIPI

Hadoop, a Java ImplAn Implementation of MapReduce originated

from Yahoo!The Cluster we worked at has 625.5 nodes,

with map task capacity of 2502 and reduce task capacity of 834

Page 5: HIPI

Computer Vision at ScaleThe “computational vision”The sheer size of dataset:

PCA of Natural Images (1992): 15 images, 4096 patches

High-perf Face Detection (2007): 75,000 samples

IM2GPS (2008): 6,472,304 images

Page 6: HIPI

HIPI Workflow

Page 7: HIPI

HIPI Image Bundle SetupMoral of the story:

Many small files are killing the performance in distributed file system.

Page 8: HIPI

Redo PCA in Natural Images at ScaleThe first 15 principal components with 15

images (Hancock, 1992):

Page 9: HIPI

Redo PCA in Natural Images at ScaleComparison:

Hancock, 1992

HIPI, 100

HIPI, 1,000

HIPI, 10,000

HIPI, 100,000

Page 10: HIPI

Optimize HIPI PerformanceCulling: because decompression is costly

Decompress at needA boolean cull(ImageHeader header) method

for conditional decompression

Page 11: HIPI

Culling, to inspect specific camera effectsCanon Powershot S500, at 2592x1944

Page 12: HIPI

HIPI, Glance at Performance figuresAn empty job (only decompressing and

looping over images), 5 run, using minimal figure, in seconds, lower is better:

10 100 1000 100001000000

100

200

300

400

500

Many Small FilesHadoop Sequence FileHIPI Image Bundle

Page 13: HIPI

HIPI, Glance at Performance figuresIm2gray job (converting images to gray

scale), 5 run, using minimal figure, in seconds, lower is better:

10 100 1000 100001000000

100

200

300

400

500

Many Small FilesHadoop Sequence FileHIPI Image Bundle

Page 14: HIPI

HIPI, Glance at Performance figuresCovariance job (compute covariance matrix

of patches, 100 patches per image), 1~3 run*, using minimal figure, in seconds, lower is better:

10 100 1000 100001000000

10002000300040005000600070008000

Many Small FilesHadoop Sequence FileHIPI Image Bundle

Page 15: HIPI

HIPI, Glance at Performance figuresCulling job (decompressing all images V.S.

decompressing images we care about), 1~3 run, using minimal figure, in seconds, lower is better:

10 100 1000 100001000000

100

200

300

400

500

600

700

Without CullingWith Culling

Page 16: HIPI

ConclusionEverything at large scale gets better.HIPI provides an image-centric interface that

performs on par or better than the leading alternative

Cull method provides significant improvement and convenience

HIPI offers noticeable improvements!

Page 17: HIPI

Future workRelease HIPI as Opensource Project.Work on deep integration with Hadoop.Making HIPI work-load more configurable.Making work-load more balanced.