Top Banner
The Visual Recognition Machine Jitendra Malik University of California at Berkeley
42

The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

The Visual Recognition MachineThe Visual Recognition Machine

Jitendra Malik

University of California at Berkeley

Jitendra Malik

University of California at Berkeley

Page 2: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

From images to objectsFrom images to objects

Labeled sets: tiger, grass etc

Page 3: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

RecognitionRecognition

• Possible for both instances or object classes (Mona Lisa vs. faces or Beetle vs. cars)

• Tolerant to changes in pose and illumination

Page 4: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Three stagesThree stages

• Segmentation: Images Regions

• Association: Regions Super-regions

• Matching: Super-regions Prototype views

• Segmentation: Images Regions

• Association: Regions Super-regions

• Matching: Super-regions Prototype views

Page 5: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Page 6: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Three stagesThree stages

• Segmentation: Images Regions

• Association: Regions Super-regions

• Matching: Super-regions Prototype views

• Segmentation: Images Regions

• Association: Regions Super-regions

• Matching: Super-regions Prototype views

Page 7: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Boundaries of image regions defined by a number of attributes

Boundaries of image regions defined by a number of attributes

– Brightness/color

– Texture

– Motion

– Stereoscopic depth

– Familiar configuration

– Brightness/color

– Texture

– Motion

– Stereoscopic depth

– Familiar configuration

Page 8: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Image Segmentation as Graph PartitioningImage Segmentation as Graph PartitioningBuild a weighted graph G=(V,E) from image

V: image pixels

E: connections between pairs of nearby pixels

region

same the tobelong

j& iy that probabilit :ijW

Partition graph so that similarity within group is large and similarity between groups is small -- Normalized Cuts [Shi&Malik 97]

Page 9: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Some Terminology for Graph Partitioning

Some Terminology for Graph Partitioning

• How do we bipartition a graph:• How do we bipartition a graph:

BAwith

BA,

),,W(B)A,(vu

vucut

disjointy necessarilnot A' andA

A'A,

),(W)A'A,(

vu

vuassoc

Page 10: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Normalized Cut, A measure of dissimilarity

Normalized Cut, A measure of dissimilarity

• Minimum cut is not appropriate since it favors cutting small pieces.

• Normalized Cut, Ncut:

• Minimum cut is not appropriate since it favors cutting small pieces.

• Normalized Cut, Ncut:

V),(

B)A,(

V)A,(

B)A,( B)A,(

Bassoc

cut

assoc

cutNcut

Page 11: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Solving the Normalized Cut problem

Solving the Normalized Cut problem

• Exact discrete solution to Ncut is NP-complete even on regular grid,– [Papadimitriou’97]

• Drawing on spectral graph theory, good approximation can be obtained by solving a generalized eigenvalue problem.

• Exact discrete solution to Ncut is NP-complete even on regular grid,– [Papadimitriou’97]

• Drawing on spectral graph theory, good approximation can be obtained by solving a generalized eigenvalue problem.

Page 12: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Normalized Cut As Generalized Eigenvalue problem

Normalized Cut As Generalized Eigenvalue problem

• after simplification, we get• after simplification, we get

...

),(

),( ;

11)1(

)1)(()1(

11

)1)(()1(

)VB,(

)BA,(

)VA,(

B)A,(B)A,(

0

i

x

T

T

T

T

iiD

iiDk

Dk

xWDx

Dk

xWDx

assoc

cut

assoc

cutNcut

i

.01},,1{ with ,)(

),(

DybyDyy

yWDyBANcut T

iT

T

Page 13: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Computational AspectsComputational Aspects

• Solving for the generalized eigensystem:

• (D-W) is of size , but it is sparse with O(N) nonzero entries, where N is the number of pixels.

• Using Lanczos algorithm.

• Solving for the generalized eigensystem:

• (D-W) is of size , but it is sparse with O(N) nonzero entries, where N is the number of pixels.

• Using Lanczos algorithm.

.D where,W)D-D(D

DW)-(D

2

1

2

1-

2

1-

xzzz

xx

NN

Page 14: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Three stagesThree stages

• Segmentation: Images Regions

• Association: Regions Super-regions

• Matching: Super-regions Prototype views

• Segmentation: Images Regions

• Association: Regions Super-regions

• Matching: Super-regions Prototype views

Page 15: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

AssociationAssociation

• Number of super-regions of size k in image with n regions is approximately (4**k)*n/k

• For typical images, this ranges between 1000 and 10000

• Plausibility ordering could reduce effective number substantially

• Computing time for this stage negligible

• Number of super-regions of size k in image with n regions is approximately (4**k)*n/k

• For typical images, this ranges between 1000 and 10000

• Plausibility ordering could reduce effective number substantially

• Computing time for this stage negligible

Page 16: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Three stagesThree stages

• Segmentation: Images Regions

• Association: Regions Super-regions

• Matching: Super-regions Prototype views

• Segmentation: Images Regions

• Association: Regions Super-regions

• Matching: Super-regions Prototype views

Page 17: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Matching Matching

• Objects are represented by a set of prototypical views (~10 per object)

• For each super-region S, calculate probability that it is an instance of view V

• Determine most probable labeling of image into objects

• Objects are represented by a set of prototypical views (~10 per object)

• For each super-region S, calculate probability that it is an instance of view V

• Determine most probable labeling of image into objects

Page 18: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Page 19: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Matching super-regions to viewsMatching super-regions to views

• Based on color, texture and shape similarity• Color, texture matching is relatively well

understood and fast• Shape matching is difficult because the

algorithm should tolerate pose, illumination and intra-category variation

• GOAL: small misclassification error with few views.

• Based on color, texture and shape similarity• Color, texture matching is relatively well

understood and fast• Shape matching is difficult because the

algorithm should tolerate pose, illumination and intra-category variation

• GOAL: small misclassification error with few views.

Page 20: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Core ideaCore idea

• Find corresponding points on the two shapes and use those to deform prototype into alignment

• Allowing this flexibility reduces number of prototype views needed

• Find corresponding points on the two shapes and use those to deform prototype into alignment

• Allowing this flexibility reduces number of prototype views needed

Page 21: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Page 22: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Page 23: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

MNIST Handwritten Digits

Page 24: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Digit Prototypes

Page 25: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Matching with original and deformed prototypesPrototype Test Error

Page 26: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Deforming prototypes using thin plate splines

Page 27: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Only 25 deformable templates needed (instead of 60 K) to get 5% error

Page 28: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

COIL Object Database

Page 29: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Page 30: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Computing cost on a Pentium PCComputing cost on a Pentium PC

• Segmentation: 5 minutes /image

• Matching : 0.5 sec / match

• Segmentation: 5 minutes /image

• Matching : 0.5 sec / match

Page 31: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Cost on 10**4 node machine Cost on 10**4 node machine

• Segmentation: 0.03 sec /image, which is 30 Hz (video rate)

• Matching : 20K matches/sec at full resolution (100 points/shape)

• Segmentation: 0.03 sec /image, which is 30 Hz (video rate)

• Matching : 20K matches/sec at full resolution (100 points/shape)

Page 32: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

How many prototype views can one match at 1 Hz?

How many prototype views can one match at 1 Hz?

• 1K candidate super-regions• Consider only 1% of matches at full

resolution (10% pass color/texture filter, 10% of those pass low resolution shape filter)

• If half time spent in pruning and half in full resolution matching, 1000 prototype views can be matched at 1 Hz.

• 1K candidate super-regions• Consider only 1% of matches at full

resolution (10% pass color/texture filter, 10% of those pass low resolution shape filter)

• If half time spent in pruning and half in full resolution matching, 1000 prototype views can be matched at 1 Hz.

Page 33: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

What can one do with matching 1000 views a second?

What can one do with matching 1000 views a second?

• Worst case: 100 object categories

• Best case depends on how well one can exploit context, hierarchy and hashing.

• Cf. humans can recognize 10-100K objects

• Worst case: 100 object categories

• Best case depends on how well one can exploit context, hierarchy and hashing.

• Cf. humans can recognize 10-100K objects

Page 34: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Memory requirementsMemory requirements

• 10 K object categories * 10 views/category * 100 * 100 pixels/view * 1 byte/pixel gives us 1 Gigabyte.

• 10 K object categories * 10 views/category * 100 * 100 pixels/view * 1 byte/pixel gives us 1 Gigabyte.

Page 35: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Concluding remarks Concluding remarks

• Speech in 1985 was in the same state as vision in 2000. Hidden Markov Models adoption led to a decade of research which refined the paradigm for continuous speech recognition.

• The proposed 3 stage framework for recognition: segmentation, association and matching, could provide the same focus and coherence to vision research leading to general purpose object recognition in 10 years.

• Speech in 1985 was in the same state as vision in 2000. Hidden Markov Models adoption led to a decade of research which refined the paradigm for continuous speech recognition.

• The proposed 3 stage framework for recognition: segmentation, association and matching, could provide the same focus and coherence to vision research leading to general purpose object recognition in 10 years.

Page 36: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Page 37: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Page 38: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Page 39: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Page 40: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Page 41: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Page 42: The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.