HIPI

HIPI: Computer Vision atLarge Scale

Chris SweenyLiu Liu

Intro to MapReduceSIMD at ScaleMapper / Reducer

MapReduce, Main TakeawayData Centric, Data Centric, Data Centric!

Hadoop, a Java ImplAn Implementation of MapReduce originated

from Yahoo!The Cluster we worked at has 625.5 nodes,

with map task capacity of 2502 and reduce task capacity of 834

Computer Vision at ScaleThe “computational vision”The sheer size of dataset:

PCA of Natural Images (1992): 15 images, 4096 patches

High-perf Face Detection (2007): 75,000 samples

IM2GPS (2008): 6,472,304 images

HIPI Workflow

HIPI Image Bundle SetupMoral of the story:

Many small files are killing the performance in distributed file system.

Redo PCA in Natural Images at ScaleThe first 15 principal components with 15

images (Hancock, 1992):

Redo PCA in Natural Images at ScaleComparison:

Hancock, 1992

HIPI, 100

HIPI, 1,000

HIPI, 10,000

HIPI, 100,000

Optimize HIPI PerformanceCulling: because decompression is costly

Decompress at needA boolean cull(ImageHeader header) method

for conditional decompression

Culling, to inspect specific camera effectsCanon Powershot S500, at 2592x1944

HIPI, Glance at Performance figuresAn empty job (only decompressing and

looping over images), 5 run, using minimal figure, in seconds, lower is better:

10 100 1000 100001000000

100

200

300

400

500

Many Small FilesHadoop Sequence FileHIPI Image Bundle

HIPI, Glance at Performance figuresIm2gray job (converting images to gray

scale), 5 run, using minimal figure, in seconds, lower is better:

10 100 1000 100001000000

100

200

300

400

500


HIPI, Glance at Performance figuresCovariance job (compute covariance matrix

of patches, 100 patches per image), 1~3 run*, using minimal figure, in seconds, lower is better:

10 100 1000 100001000000

10002000300040005000600070008000


HIPI, Glance at Performance figuresCulling job (decompressing all images V.S.

decompressing images we care about), 1~3 run, using minimal figure, in seconds, lower is better:

10 100 1000 100001000000

100

200

300

400

500

600

700

Without CullingWith Culling

ConclusionEverything at large scale gets better.HIPI provides an image-centric interface that

performs on par or better than the leading alternative

Cull method provides significant improvement and convenience

HIPI offers noticeable improvements!

Future workRelease HIPI as Opensource Project.Work on deep integration with Hadoop.Making HIPI work-load more configurable.Making work-load more balanced.

HIPI

Documents

convenience y hipi

2592x1944 hipi

y work

hipi performancey culling

need y

costly y

yahoo y

y pca of natural images