Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint work with Chunhui Gu, Joseph Lim & Pablo Arbelaez (CVPR 20
Mar 27, 2015
Computer Vision GroupUC Berkeley
How should we combine high level and low level knowledge?
Jitendra Malik UC Berkeley
Recognition using regions is joint work with Chunhui Gu, Joseph Lim & Pablo Arbelaez (CVPR 2009)
Computer Vision GroupUC Berkeley
The central problems of vision
Grouping /Segmentation
3D structure/Figure-Ground
Object and Scene Recognition
Computer Vision GroupUC Berkeley
Detection and Segmentation: Giraffes
Orig. Image Segmentation Orig. Image Segmentation
Computer Vision GroupUC Berkeley
Detection and Segmentation: Mugs
Orig. Image Segmentation Orig. Image Segmentation
Computer Vision GroupUC Berkeley
Outline
• Current paradigm: Multiscale scanning
• Our approach– Bottom up region segmentation– Hough transform style voting (learned weights)– Top down segmentation
• Results on ETHZ , Caltech 101, MSRC
Computer Vision GroupUC Berkeley
Detection: Is this an X?
Ask this question repeatedly, varying position, scale, category…
Paradigm introduced by Rowley, Baluja & Kanade 96 for face detectionViola & Jones 01, Dalal & Triggs 05, Felzenszwalb, McAllester, Ramanan 08
Computer Vision GroupUC Berkeley
Problems with the multi-scale scanning paradigm
• Computational complexity•10^6 windows, 10 scales, 10^4 categories
• Not natural for irregularly shaped objects
• Segmentation is delinked
• Context is delinked
Computer Vision GroupUC Berkeley
Our Approach
• Perceptual Organization provides the right primitives for visual recognition.
• After more than a decade of work, we finally have high quality, generic, detectors for contours and regions. We now only need to work with ~100 elements, each with its local scale estimate.
• In this talk, we demonstrate recognition using regions. Detection and segmentation happen in the same framework.
• There will always be some errors in the bottom-up grouping process, the recognition machinery needs to be robust to that.
Computer Vision GroupUC Berkeley
Contour Detection (CVPR 2008)
Computer Vision GroupUC Berkeley
Region Detection (CVPR 2009)
Computer Vision GroupUC Berkeley
Region detector wins on any measure!Region Benchmarks on BSDS
Probabilistic Rand Index on BSDS Variation of Information on BSDS
Region Benchmarks on MSRC/PASCAL08
Computer Vision GroupUC Berkeley
Parallelizing Image SegmentationCatanzaro et al, UC Berkeley, ICCV 09
• GTX 280 is an Nvidia Graphics Processor, massively parallel general purpose computing platform– 30 cores, 8 wide SIMD
= 240 way parallelism– 140 GB/s memory bandwidth
(Modern CPUs have ~10-20 GB/s)– Special memory subsystems for
graphics processing
• Sequential Implementation: 5 minutes per image
• Parallel, Optimized Implementation: 2 seconds
Computer Vision GroupUC Berkeley
Why Use Regions?
• Local estimate of scale; no search necessary
• Shape, color and texture in the same framework
• Hierarchy of regions (“partonomy”) represents scenes, objects, parts. Makes use of context natural.
• Do not suffer from background clutter
• Reduce candidate windows on detection task– 1000 to 10000 times fewer windows on the ETHZ dataset
• Need to be robust to segmentation errors
Computer Vision GroupUC Berkeley
Object Representation using Regions
Bag of Regions
RegionSegmentation
Computer Vision GroupUC Berkeley
Region Representation
Region-based Hough Voting• Recover transformation from matched regions• Transform exemplar bounding box to query
20
Exemplar Query
T(x,y,sx,sy)
T(x,y,sx,sy)
Region-based Voting
Exemplar 1
Query
21
Region-based Voting
Exemplar 1
Query
22
Region-based Voting
Exemplar 1
Query
23
Region-based Voting
Exemplar 1
Query
24