Recognition Scene understanding / visual object categorization Pose clustering

Kapitel 14 “Recognition” – p. 1

Recognition Scene understanding / visual object categorization Pose clustering Object recognition by local features Image categorization Bag-of-features models Large-scale image search

Kapitel 14

Scene understanding (1)

Scene categorization

•outdoor/indoor

•city/forest/factory/etc.

Object detection

•find pedestrians

mountain

building

banner

marketpeople

street lamp

building

Visual object categorization (1)

Recognition is all about modeling variabilitycamera positionilluminationshape variationswithin-class variations

Within-class variations (why are they chairs?)

Pose clustering (1)

Working in transformation space - main ideas: Generate many hypotheses of transformation image vs. model,

each built by a tuple of image and model features

Correct transformation hypotheses appear many times

Main steps: Quantize the space of possible transformations

For each tuple of image and model features, solve for the optimal transformation that aligns the matched features

Record a “vote” in the corresponding transformation space bin

Find "peak" in transformation space

Pose clustering (2)

Example: Rotation only. A pair of one scene segment and one model segment suffices to generate a transformation hypothesis.

Pose clustering (3)

2-tuples of image and model corner points are used to generate hypotheses. (a) The corners found in an image. (b) The four best hypotheses found with the edges drawn in. The nose of the plane and the head of the person do not appear because they were not in the models.

C.F. Olson: Efficient pose clustering using a randomized algorithm. IJCV, 23(2): 131-147, 1997.

Object recognition by local features (1)

D.G. Lowe: Distinctive image features from scale-invariant keypoints. IJCV, 60(2): 91-110, 2004

The SIFT features of training images are extracted and storedFor a query image

Extract SIFT features Efficient nearest neighbor indexing Pose clustering by Hough transform For clusters with >2 keypoints (object hypotheses): determine

the optimal affine transformation parameters by least squares method; geometry verification

Input Image Stored

Cluster of 3 corresponding feature pairs

Image categorization (1)

Training Labels

Training Images

Classifier Training

Training

Image Features

Testing

Test Image

Trained Classifier

Outdoor

Prediction

Global histogram (distribution):

color, texture, motion, …

histogram matching distance

Cars found by color histogram matching

See Chapter “Inhaltsbasierte Suche in Bilddatenbanken

Bag-of-features models (1)

ObjectObjectBag of Bag of ‘words’‘words’

Origin 1: Text recognition by orderless document representation (frequencies of words from a dictionary), Salton & McGill (1983)

US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/

Origin 2: Texture recognition

Texture is characterized by the repetition of basic elements or textons

For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters

Universal texton dictionary

histogram

G. Cruska et al.: Visual categorization with bags of keypoints. Proc. of ECCV, 2004.

We need to build a “visual” dictionary!

Main step 1: Extract features (e.g. SIFT)

Main step 2: Learn visual vocabulary (e.g. using clustering)

clustering

visual vocabulary (cluster centers)

… Source: B. LeibeAppearance codebook

Example: learned codebook

Main step 3: Quantize features using “visual vocabulary”

Main step 4: Represent images by frequencies of “visual words” (i.e., bags of features)

Example: representation basedon learned codebook

Main step 5: Apply any classifier to the histogram feature vectors

Example:

Caltech6 dataset

Dictionary quality and size are very important parameters!

Caltech6 dataset

J.C. Niebles, H. Wang, L. Fei-Fei: Unsupervised learning of human action categories using spatial-tempotal words. IJCV, 79(3): 299-318, 2008.

Action recognition

Large-scale image search (1)

Query Results from 5k Flickr images

J. Philbin, et al.: Object retrieval with large vocabularies and fast spatial matching. Proc. of CVPR, 2007

Mobile tourist guide self-localization object/building recognition photo/video augmentation

Aachen Cathedral

[Quack, Leibe, Van Gool, CIVR’08]

Application: Image auto-annotationLeft: Wikipedia imageRight: closest match from Flickr

Moulin Rouge

Tour MontparnasseColosseum

ViktualienmarktMaypole

Old Town Square (Prague)

Sources

K. Grauman, B. Leibe: Visual Object Recognition. Morgen & Claypool Publishers, 2011

R. Szeliski: Computer Vision: Algorithms and Applications. Springer, 2010. (Chapter 14 “Recognition”)

Course materials from others (G. Bebis, J. Hays, S. Lazebnik, …)

Recognition Scene understanding / visual object categorization Pose clustering

Documents

Bird Species Categorization Using Pose Normalized Deep...

3D generic object categorization, localization and pose...

LNCS 7511 - Automatic Categorization of Anatomical Landmark....

Decision categorization

Experimental Result analysis of Text Categorization using...

Clustering -...

Asset Categorization

Unsupervised Object Matching and Categorization via...

Chapter 8 Multi-view Object Categorization and Pose...

RotationNet: Joint Object Categorization and Pose ...

On feature distributional clustering for text categorization

RotationNet: Joint Object Categorization and Pose...

CICLing poster€¦ · Web viewInformation extraction. Text...

Fine-grained Event Categorization with Heterogeneous Graph.....

Chapter 8 Multi-view Object Categorization and Pose …

Efﬁcient Pose Clustering Using a Randomized Algorithm