Top Banner
Object Category Detection Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg AIMS-CDT Computer Vision Hilary 2020
62

Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Jul 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Object Category Detection

Andrew Zisserman

Visual Geometry Group

University of Oxford

http://www.robots.ox.ac.uk/~vgg

AIMS-CDT Computer Vision

Hilary 2020

Page 2: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

What we would like to be able to do…

• Visual scene understanding

• What is in the image and where

Dog 1: Terrier

Motorbike: Suzuki GSX 750

Ground: Gravel

Plant

Wall

Gate

Dog 2: Sitting on Motorbike

Person: John Smith, holding Dog 2

• Object categories, identities, properties, activities, relations, …

Page 3: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Things vs. Stuff

Stuff (n): Material defined by a homogeneous or repetitive pattern of fine-scale properties, but has no specific or distinctive spatial extent or shape.

Thing (n): An object with a specific size and shape.

Ted Adelson, Forsyth et al. 1996.

Slide: Geremy Heitz

Page 4: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Recognition Tasks

• Image Classification

– Does the image contain an aeroplane?

• Object Class Detection/Localization

– Where are the aeroplanes (if any)?

• Object Class Segmentation

– Which pixels are part of an aeroplane (if any)?

Page 5: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Challenges: Background Clutter

Page 6: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Challenges: Occlusion and truncation

Page 7: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

10Challenges: Intra-class variation

Page 8: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Why detection?

• Spatial relationships for image understanding and retrieval

• Visual question and answering

• Object grasping/tracking

“a cat riding a skateboard”

Page 9: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Why detection?

“Detect to Track and Track to Detect”, Feichtenhofer, Pinz, Zisserman, ICCV 2017

• Tracking by detection

Page 10: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Motivation/Applications

www.mobileye.com

Collision prevention

Organizing image

collectionsSlide: Ross Girshick

Page 11: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Outline

Part I: Principles of Sliding window detectors

• Train a sliding window detector

• Speeding up inference

Part II: Deep Networks for object category detection

• Two-stage and one-stage networks

• State of the art

Page 12: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

• Use a sub-window

– At correct position, no clutter is present

– Slide window to detect object

– Change size of window to search over scale

Problem of background clutter

Page 13: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Yes,

a car

No,

not a car

Detection by Classification

• Basic component: binary classifier

Car/non-car

Classifier

Page 14: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Detection by Classification

• Detect objects in clutter by search

Car/non-car

Classifier

• Sliding window: exhaustive search over position and scale

Page 15: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Detection by Classification

• Detect objects in clutter by search

Car/non-car

Classifier

• Sliding window: exhaustive search over position and scale

Page 16: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Detection by Classification

• Detect objects in clutter by search

Car/non-car

Classifier

• Sliding window: exhaustive search over position and scale

(can use same size window over a spatial pyramid of images)

Page 17: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Window (Image) Classification

• Features hand crafted (for now)

• Classifier learnt from data

Feature

Extraction

Classifier

Training Data

Car/Non-car

Page 18: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Problems with sliding windows …

• aspect ratio

• granuality (finite grid)

• partial occlusion/truncation

• multiple responses

Page 19: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Dalal & Triggs CVPR 2005 Pedestrian detection

• Objective: detect (localize) standing humans in an image

• Sliding window classifier

• Train a binary SVM classifier to determine whether a window contains a

standing person or not

• Histogram of Oriented Gradients (HOG) feature

• Although HOG + SVM originally introduced for pedestrians has been used very

successfully for many object categories

Page 20: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Window (Image) Classification

Feature

Extraction

HOG

Classifier

SVM

Pedestrian/

Non-pedestrian

Image

window

Page 21: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

• Tile 64 x 128 pixel window into 8 x 8 pixel cells

• Each cell represented by histogram over 8 orientation bins

(i.e. angles in range 0-180 degrees)

Feature: Histogram of Oriented Gradients (HOG)

imagedominant

direction HOG

orientation

fre

quency

Page 22: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Histogram of Oriented Gradients (HOG) continued

• Adds a second level of overlapping spatial bins re-normalizing orientation histograms

over a larger spatial area

• Feature vector dimension (approx) = 16 x 8 (for tiling) x 8 (orientations) x 4 (for blocks)

= 4096

Page 23: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

HOG Descriptor similarity to CNN layers

Image

Pixels Apply

Gabor filters

Spatial pool

(Sum)

Normalize to

unit length

feature

vector

Conv1

Sum pooling

Layer norm

Page 24: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Window (Image) Classification

• HOG Features

• Linear SVM classifier

Feature

Extraction

Classifier

Training Data

pedestrian/Non-pedestrian

Page 25: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs
Page 26: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Tiling defines (records) the spatial correspondence

Page 27: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Dalal and Triggs, CVPR 2005

Page 28: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Learned model

average over

positive training data

Page 29: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Slide from Deva Ramanan

Page 30: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

What is represented by HOG

Inverting and Visualizing Features for Object Detection

Carl Vondrick Aditya Khosla Tomasz Malisiewicz Antonio Torralba http://web.mit.edu/vondrick/ihog/index.html

HOG Inverse

Original

Page 31: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

What is represented by HOG

HOG Inverse Original

Page 32: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Training a sliding window detector

• Object detection is inherently asymmetric: much more “non-object” than “object” data

• Classifier needs to have very low false positive rate

• Non-object category is very complex – need lots of data

Page 33: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Bootstrapping

1. Pick negative training set at random

2. Train classifier

3. Run on training data

4. Add false positives to training set

5. Repeat from 2

• Collect a finite but diverse set of non-object windows

• Force classifier to concentrate on hard negative examples

• For some classifiers can ensure equivalence to training on entire data set

Page 34: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Example: train an upper body detector

– Training data – used for training and validation sets

• 33 Hollywood2 training movies

• 1122 frames with upper bodies marked

– First stage training (bootstrapping)

• 1607 upper body annotations jittered to 32k positive samples

• 55k negatives sampled from the same set of frames

– Second stage training (retraining)

• 150k hard negatives found in the training data

Page 35: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Training data – positive annotations

Page 36: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Positive windows

Note: common size and alignment

Page 37: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Jittered positives

Page 38: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Jittered positives

Page 39: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Random negatives

Page 40: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Random negatives

Page 41: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Window (Image) first stage classification

HOG Feature

Extraction

Linear SVM

ClassifierJittered positives

random negatives

Page 42: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

First stage performance on validation set

Page 43: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Reminder: Precision – Recall curve

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

recall

pre

cis

ion

all dataset

retrieved setpositives

• Precision: fraction of the retrieved set that are positives

• Recall: fraction of all positives in retrieved set

classifier score decreasing

Page 44: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

• Area based measure

• Performance measure: Average Precision

Detection Evaluation: Intersection over Union

Ground truth Bgt

Predicted Bp

Bgt Bp

Detection correct if “intersection over union” > Threshold = 50%

Intersection over Union (IoU)

= Area(GT ∩ Pred) / Area(GT ∪ Pred)

Page 45: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Window (Image) first stage classification

HOG Feature

Extraction

Linear SVM

ClassifierJittered positives

random negatives

• find high scoring false positives detections

• these are the hard negatives for the next round of training

• cost = # training images x inference on each image

Page 46: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Hard negatives

Page 47: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Hard negatives

Page 48: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

First stage performance on validation set

Page 49: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Performance after retraining

Page 50: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Effects of retraining

Page 51: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Side by side

before retraining after retraining

Page 52: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Side by side

before retraining after retraining

Page 53: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Side by side

before retraining after retraining

Page 54: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Tracked upper body detections

Page 55: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Accelerating Sliding Window Search

• Sliding window search is slow because so many windows are needed e.g. x £ y £ scale ¼ 100,000 for a 320£240 image

• Most windows are clearly not the object class of interest

• Can we speed up the search?

Example:

face

detection

Page 56: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Cascaded Classification

• Build a sequence of classifiers with increasing complexity

Classifier

NFace

Non-face

Classifier

2

Non-face

Classifier

1

Non-face

Window

More complex, slower, lower false positive rate

• Reject easy non-objects using simpler and faster classifiers

Possibly a

face

Possibly a

face

Page 57: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Cascaded Classification

• Slow expensive classifiers only applied to a few windows ) significant speed-up

• Controlling classifier complexity/speed:

– Number of support vectors [Romdhani et al, 2001]

– Number of features [Viola & Jones, 2001]

– Type of SVM kernel [Vedaldi et al, 2009]

– Number of parts [Felzenszwalb et al, 2011]

Page 58: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Detection Proposals

• Propose image regions that contain objects (rather than stuff)

• Proposals can be boxes or segmented regions and are class agnostic

• Aim to cover all the objects in the image with a small number of proposals,

e.g. 100-1000 per image

• “Objectness” Alexe et al, PAMI 2012

Page 59: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Detection Proposals – example method 1

Selective Search for Object Recognition

J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, International Journal of Computer Vision 2013

• Uses hierarchical segmentation based on colour uniformity and image edges

• Produces about ~ 2000 regions / image with a > 95% probability of hitting any relevant object in the image

Page 60: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Detection Proposals – example method 2

Edge Boxes: Locating Object Proposals from Edges

Larry Zitnick & Piotr Dollár,

ECCV 2014

Page 61: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Detection Proposals – example method 3

Further reading:

What makes for effective detection proposals?

J. Hosang, R. Benenson, P. Dollár, and B. Schiele, PAMI 2015.

Learning to propose Objects

Philipp Krähenbühl and Vladlen Koltun, CVPR 2015

Page 62: Object Category Detectionaz/lectures/aims-cv/detection-part1.pdf•Object detection is inherently asymmetric: much more “non-object” than “object” data •Classifier needs

Summary

• Detection by sliding window classification

• Multiple scales (and aspect ratios) to detect objects of different sizes

• Importance of hard negative mining (due to the class imbalance)

• Speed up training and inference by selecting only a sub-set of windows