Internet Video Search A brief history of television

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

29‐11‐2011

Internet Video Search 1

Internet Video SearchInternet Video Search

CeesCees G.M. G.M. SnoekSnoek & Arnold W.M. & Arnold W.M. SmeuldersSmeulders

Intelligent Systems Lab Amsterdam,U i it f A t d Th N th l dUniversity of Amsterdam, The Netherlands

A brief history of televisionA brief history of television

•• From broadcasting to narrowcastingFrom broadcasting to narrowcasting

•• …to …to thincastingthincasting

~1955 ~1985 ~2005

~>2008

…as of May 2011

The international business caseThe international business case

•• Everybody with a message uses video for deliveryEverybody with a message uses video for delivery

•• Growing Growing unmanageableunmanageable amounts of videoamounts of video

CrowdCrowd--given searchgiven search

What others say is in the video.

RawRaw--driven searchdriven search

www.science.uva.nl/research/isla

MultimediaN project

29‐11‐2011

1. Short course outline1. Short course outline

.0 Problem statement.0 Problem statement

.1 Measuring features.1 Measuring features

.2 Concept detection.2 Concept detection

.3 Lexicon learning.3 Lexicon learning

.4 Telling stories.4 Telling stories

.5 Video browsing.5 Video browsing

Problem 1:Problem 1:Variation in appearanceVariation in appearance

So many images of one thing, due to minor differences in:illuminationbackgroundbackground occlusionviewpoint, …

•• ThisThis is the is the sensorysensory gapgap

SuitBasketball

1101011011011011011011001101011011111001101011011111

1101011011011

110101101101101101101100110101101111100

Problem 2:Problem 2:What defines things?What defines things?

Machine

Multimedia Archives

US flag

Aircraft

Dog TennisMountain

Building

011011011001101011011111001101011011111

01011011111001101011011111

1101011011011011011011001101011011111001101011011111

01011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111Humans

Problem 3: Problem 3: Many things in the worldMany things in the world

•• ThisThis is the model gapis the model gap

ProblemProblem 4:4:WhatWhat story story tellstells a video?a video?

•• This is the narrative gapThis is the narrative gap

Problem 5: Problem 5: Use is openUse is open--endedended

•• ThisThis is the interface gapis the interface gap

screen

keywords

29‐11‐2011

Conclusion on problemsConclusion on problems

•• Video search is a diverse and challengeVideo search is a diverse and challenge--rich rich research topicresearch topic

Sensory gapSensory gap–– Sensory gapSensory gap

–– SemanicSemanic gapgap

–– Model gapModel gap

–– Narrative gapNarrative gap

–– Interface gapInterface gap

Today’s promiseToday’s promise

•• You will be acquainted with the theory and You will be acquainted with the theory and practice of the semantic video search paradigm. practice of the semantic video search paradigm.

•• You will be able to recall the five major scientific You will be able to recall the five major scientific problems in video retrieval and explain, and value problems in video retrieval and explain, and value the presentthe present--day solutions.day solutions.

.1 Measuring features.1 Measuring featuresgg

There are a million appearances to one concept

A million appearancesA million appearances

Where are the patterns (of the same shoe)?

Somewhere the variance must be removed.

Invariance: the need for ~Invariance: the need for ~

The illumination and the viewing direction are removed as soon as the image has entered.

Common transformationsCommon transformations

Illumination transformationsIllumination transformations ContrastContrast

Intensity and ShadowIntensity and Shadow

ColorColor

ViewpointViewpoint Rotation and LateralRotation and Lateral

DistanceDistance

Viewing angleViewing angle

ProjectionProjection

CoverCover SpecularSpecular or matteor matte

Occlusion & Clutter, Wear & Tear, Aging, Night & day and Occlusion & Clutter, Wear & Tear, Aging, Night & day and so on into increasingly complex transformations.so on into increasingly complex transformations.

29‐11‐2011

Features of selected points may be good enough to describe object

More than one transformationMore than one transformation

iff the selection & the feature set are both invariant forscene-accidental conditions Gevers TIP 2000

Design of invariants: OrbitsDesign of invariants: Orbits

For a property variant under W, observations of a constant property are spread over the orbit. The purpose of an invariant is to capture all of the orbit into one value.

Example: invarianceExample: invariance

projection

Slide credit: Theo Gevers

(R,G,B)-space(c(c11,c,c22,c,c33))--spacespace

},max{arctan),,(1 BG

RBGRc =

},max{arctan),,(2 BR

GBGRc =

},max{arctan),,(3 GR

BBGRc =

shadows shading highlights ill. intensity ill. shadows shading highlights ill. intensity ill. colorcolorEE -- -- -- -- --WW -- + + -- + + --

Color invarianceColor invariance

Gevers, PRL, 1999Geusebroek, PAMI 2001

CC + + + + -- + + --MM + + + + -- + ++ +NN + + + + -- + ++ +LL + + + + + + + + --HH + + + + + + + + --

Local shape motivationLocal shape motivation

Perceptual importanceConcise data

Robust to occlusion & clutter

Tuytelaars FTCGV 2008

Meet the GaussiansMeet the Gaussians

Taylor expansion at xTaylor expansion at x

For discretely sampled signal use the Gaussians

Robust additive differentialsRobust additive differentials

Dimensions separableDimensions separable

No maxima No maxima introducedintroduced

For discretely sampled signal use the Gaussians

29‐11‐2011

Meet the Gaussians Meet the Gaussians

The basic video observables are:

Local receptive fields Local receptive fields ff((xx))

The receptive fields up to The receptive fields up to second second order.order.

Grey value as well as opponent color sets.Grey value as well as opponent color sets.

Taxonomy of image structureTaxonomy of image structure

Slide credit: Theo Gevers

T-junction

Highlight

Corner

Junction

From receptive fields to meaningFrom receptive fields to meaning

Lee et al, Comm. ACM 2011

With examples

The 2D Gabor function is:

)(22 2

1)( vyuxjyx

eeyxh ++

−= πδ

Meet GaborMeet Gabor

),( yjeeyxh = δ

πσTuning parameters: u, v, σ + usual invariants by combinationManjunath and Ma on Gabor for texture as seen in F-space

Local receptive fields Local receptive fields FF((xx))

The receptive fields for (u, v) measured locallyThe receptive fields for (u, v) measured locally

Greyvalue as well as opponent color sets.Greyvalue as well as opponent color sets.

Hoang 2003

29‐11‐2011

GaborGabor filters: filters: texturetexture

Hoang, ECCV 2002

Original image K-means clustering Segmentation

GaborGabor filters: filters: texturetexture

Local receptive field in f(Local receptive field in f(xx,t),t)

Gaussian equivalent over x and t:Gaussian equivalent over x and t:

zero orderzero order first order over tfirst order over t

Burghouts TIP 2006

Receptive fields: overviewReceptive fields: overview

All observables up to first order All observables up to first order colorcolor, , second order spatial scales, eight second order spatial scales, eight frequency bands & first order in time.frequency bands & first order in time.

Good observables > easy algorithmsGood observables > easy algorithms

Periodicity:

Detect periodic motion by one steered filter:

Deadly simple algorithm…

Burghouts TIP 2006

Meet the LoweansMeet the Loweans

So far we paid respect to the spatial order.So far we paid respect to the spatial order.

Now we will weakly follow the spatial order and form Now we will weakly follow the spatial order and form histograms on all directions we encounter locallyhistograms on all directions we encounter locallyhistograms on all directions we encounter locally, histograms on all directions we encounter locally, … better known as (the second part of) SIFT.… better known as (the second part of) SIFT.

Lowe IJCV 2004

29‐11‐2011

Meet the Meet the LoweansLoweans

4 x 4 Gradient window after thresholding4 x 4 Gradient window after thresholding

Histogram of 4 x 4 samples per window in 8 directionsHistogram of 4 x 4 samples per window in 8 directions

Gaussian weighting around center (Gaussian weighting around center (σσ is 1/2 of is 1/2 of σσ keypoint)keypoint)

4 x 4 x 8 = 128 dimensional feature vector4 x 4 x 8 = 128 dimensional feature vector

Image: Jonas Hurrelmann

SIFT detectionSIFT detection

Slide credit: Jepson 2005

Enriching SIFT Enriching SIFT (in a nutshell)(in a nutshell)

•• Affine SIFTAffine SIFT–– Choose prominent direction in SIFTChoose prominent direction in SIFT

Mikolajczyk, IJCV 2005Ke, CVPR 2004

Van de Sande, PAMI 2010

•• PCAPCA--SIFTSIFT–– Robust and compact representationRobust and compact representation

•• ColorSIFTColorSIFT–– Add several invariant color descriptorsAdd several invariant color descriptors

•• TimeSIFTTimeSIFT–– Anyone?Anyone?

Illumination invarianceIllumination invariance

Invariance properties of the descriptors usedLight intensity

changeLight intensity

shiftLight intensity

change and shiftLight color

changeLight color change

and shift

SIFT + + + + +

SIFT + + + + +OpponentSIFT +/- + +/- +/- +/-C-SIFT + + + +/- +/-rgSIFT + + + +/- +/-RGB-SIFT + + + + +

Where to sample in the video?Where to sample in the video?

•• Video shot is set of frames representing a Video shot is set of frames representing a continuous camera action in time and spacecontinuous camera action in time and space

Analysis typically on a single key frame per shotAnalysis typically on a single key frame per shot–– Analysis typically on a single key frame per shotAnalysis typically on a single key frame per shot

Shot Key Frame

WhereWhere to sample in the frame?to sample in the frame?

Tuytelaars 2008 FTCGV

29‐11‐2011

Interest point examplesInterest point examples

Original image Harris Laplacian Color salient points

Mikolajczyk, CVPR 2006van de Weijer, PAMI 2006

DenseDense sampling sampling exampleexample

What What is the object in the is the object in the middle?middle?

No segmentation …No pixel values of the object …

FastFast densedense descriptorsdescriptors

Uijlings et al, CIVR 2009

Image Patch

Reuse sub-regions: 16x speed-up

Conclusion on Conclusion on measuring featuresmeasuring features

•• Invariance is crucial when designing featuresInvariance is crucial when designing features–– More invariance means less stable…More invariance means less stable…

b t b t tb t b t t–– …but more robustness to sensory gap…but more robustness to sensory gap

•• Effective features strike a balance between Effective features strike a balance between invariance and discriminatory powerinvariance and discriminatory power–– And for video search efficiency is helpful also…And for video search efficiency is helpful also…

And there is always more …And there is always more …

For example:For example:

Local Invariant Feature Detectors: A SurveyLocal Invariant Feature Detectors: A Survey

TinneTinne TuytelaarsTuytelaars & & KrystianKrystian MikolajczykMikolajczyk

FTCGV 3:3(177FTCGV 3:3(177——280)280)

29‐11‐2011

The semantic gapThe semantic gap

The semantic gap is the lack of coincidenceThe semantic gap is the lack of coincidence

The semantic gap is the lack of coincidence The semantic gap is the lack of coincidence between the information that one can extract between the information that one can extract from the sensory data and the interpretation that from the sensory data and the interpretation that the same data has for a user in a given situationthe same data has for a user in a given situation

Arnold Smeulders, PAMI, 2000

The science of labelingThe science of labeling

•• To understand anything in science, things need a To understand anything in science, things need a name that is universally recognizedname that is universally recognized

•• Worldwide endeavor in Worldwide endeavor in naming visual informationnaming visual information

living organisms chemical elements human genome‘categories’ text

Naming visual informationNaming visual information

•• Concept detectionConcept detection–– Does the image contain an airplane?Does the image contain an airplane?

Slide credit: Andrew Zisserman

Focus of today’s Focus of today’s lecturelecture

•• Object localizationObject localization–– Where is the airplane, (if any)?Where is the airplane, (if any)?

•• Object segmentationObject segmentation–– Which pixels are part of an airplane, Which pixels are part of an airplane,

(if any)?(if any)?

How difficult is the problem?How difficult is the problem?

•• Human vision consumes 50% brain power…Human vision consumes 50% brain power…

Van Essen, Science 1992

Semantic concept detectionSemantic concept detection

•• The patient approachThe patient approach–– Building detectors oneBuilding detectors one--atat--aa--timetime

A face detector for frontal faces

29‐11‐2011

A simple face detectorA simple face detector

One PhD per detector requires too many students…

So how about these?So how about these?

and the thousands of others ………

Labeled

Basic concept detectionBasic concept detection

aircraftoutdoor

Feature Extraction

Supervised Learner

Training

Feature Measurement

Classification

Testing

Labeled examples

It is an aircraft probability 0.7It is outdoor probability

Demo: concept detectionDemo: concept detection

Visualization byJasper Schulte

Linear classifiers Linear classifiers -- marginmargin

Slide credit: Cordelia Schmid

Nonlinear SVMNonlinear SVM

29‐11‐2011

Breakthrough: nonlinear SVMsBreakthrough: nonlinear SVMs

Vapnik, 1995

Nonlinear SVMsNonlinear SVMs

Kernels for concept detectionKernels for concept detection

Comparing kernelsComparing kernels

Zhang, IJCV ‘07

Causes of poor generalizationCauses of poor generalization

•• OverOver--fittingfitting–– Separate your dataSeparate your data

•• Curse of dimensionalityCurse of dimensionality–– Information fusion helpsInformation fusion helps

Feature fusionFeature fusionSynchronization

Normalization

Transformation

Concatenation

Feature vector

Shot segmentedvideo

Concept confidence

Concatenation

29‐11‐2011

Feature fusionFeature fusionSynchronization

Normalization

Transformation

Concatenation

Feature vector

Shot segmentedvideo

Concept confidence

Concatenation

+ Only one learning phase- Combination often ad hoc - One feature may dominate- Curse of dimenisonality

Avoiding dimensionality curseAvoiding dimensionality curse

•• Codebook aka bagCodebook aka bag--ofof--words modelwords model–– Create a codeword vocabularyCreate a codeword vocabulary

Di tiDi ti i ithi ith d dd d

Leung and Malik, IJCV, 2001Sivic and Zisserman, ICCV, 2003

and many others…

–– DiscretizeDiscretize image with image with codewordscodewords

–– Count codewordsCount codewords

0 100 200 300 400 5000

EmphasizingEmphasizing spatialspatial configurationsconfigurations

•• CodebookCodebook ignoresignores geometricgeometric correspondencecorrespondence

Grauman, ICCV 2005Lazebnik, CVPR 2006

Marszalek, VOC 2007

•• For video:For video:

•• SolutionSolution: : spatialspatial pyramidpyramid–– aggregate statistics of local features over fixed aggregate statistics of local features over fixed

subregionssubregions

–– 1x1 entire image1x1 entire image

–– 2x2 image quarters2x2 image quarters

–– 1x3 horizontal bars1x3 horizontal bars

Codebook modelCodebook model

•• Codebook consists of Codebook consists of codewordscodewords–– kk--means clustering of descriptorsmeans clustering of descriptors–– Commonly 4 000Commonly 4 000 codewordscodewords per codebookper codebookCommonly 4,000 Commonly 4,000 codewordscodewords per codebookper codebook

Cluster

AssignDense+OpponentSIFT Feature vector (length 4,000)

Codebook assignmentCodebook assignment

van Gemert, PAMI 2010

Hard assignmentHard assignment Soft assignmentSoft assignment

● Codeword

–– Soft assignment: assign to multiple clusters, weighted by Soft assignment: assign to multiple clusters, weighted by distance to centerdistance to center

–– Single sigma (distance weighting) for all codebook elementsSingle sigma (distance weighting) for all codebook elements

ExtendingExtending soft soft assignmentassignment

•• FisherFisher VectorVector–– Train a Train a GaussianGaussian Mixture Model, Mixture Model, wherewhere eacheach codebookcodebook

element haselement has itsits ownown sigmasigma –– oneone per dimensionper dimension

Perronnin ECCV 2010

element has element has itsits ownown sigma sigma oneone per dimensionper dimension

–– Do Do notnot store store assigmentassigment, , butbut differencesdifferences in all descriptor in all descriptor dimensionsdimensions

•• GreatlyGreatly increasesincreases complexitycomplexity–– feature vector is feature vector is #codewords x #descriptor#codewords x #descriptor

29‐11‐2011

ExtendingExtending soft soft assignmentassignment

•• Super Vector CodingSuper Vector Coding–– also counts the dimensionalso counts the dimension--wise difference of a SIFT wise difference of a SIFT

descriptor to a visual worddescriptor to a visual word

Zhou ECCV 2010

descriptor to a visual worddescriptor to a visual word

•• Key insight: these methods propose many new Key insight: these methods propose many new components and algorithms, but components and algorithms, but difference difference coding coding is their main contributionis their main contribution

Difference codingDifference coding

•• Fisher vectorFisher vector

Perronnin ECCV 2010Zhou ECCV 2010

Jegou CVPR 2010

•• Super vector codingSuper vector coding

•• VLADVLAD

Fast quantizationFast quantization

•• Random Random forestsforests–– Randomized process makes it very fast to buildRandomized process makes it very fast to build

T t t ll f t t ti tiT t t ll f t t ti ti

Moosman, PAMI 2008Uijlings, CIVR 2009

–– Tree structure allows fast vector quantizationTree structure allows fast vector quantization

–– Logarithmic rather than linear projection timeLogarithmic rather than linear projection time

•• RealReal--timetime BoWBoW (!)(!)–– WhenWhen usedused withwith fastfast densedense samplingsampling

–– SURF 2x2 descriptor SURF 2x2 descriptor insteadinstead of 4x4of 4x4

–– RBF RBF kernelkernel

Codebooks grow big…Codebooks grow big…

•• Researchers concatenate multiple codebooksResearchers concatenate multiple codebooks

Spatial pyramid adds more dimensionsSpatial pyramid adds more dimensionsoo 1x1 = 4K1x1 = 4K

oo 2x2 = 16K2x2 = 16K

oo 1x3 = 12K1x3 = 12K

–– Feature vector length easily Feature vector length easily >100K>100K……

SVM preSVM pre--computed kernel trickcomputed kernel trick

•• Use distance between feature vectorsUse distance between feature vectors

-γ dist(f1, f2)K(f f ) = e

•• Increase efficiency significantlyIncrease efficiency significantly–– PrePre--compute the SVM kernel matrixcompute the SVM kernel matrix

–– Long vectors possible as we only need 2 in memoryLong vectors possible as we only need 2 in memory

–– Parameter optimization reParameter optimization re--uses preuses pre--computed matrixcomputed matrix

K(f1, f2) = e

GPUGPU--empowered empowered prepre--computed kernelcomputed kernel

1 CPU 4 CPU

Van de Sande, TMM 2011

40000 1x CPU Opteron 250 (2,4GHz)

1x CPU Core i7 920 (2 66GHz)

4000 8000 16000 32000 64000 128000

Time (s)

Total Feature Vector Length

1x CPU Core i7 920 (2,66GHz)

4x CPU Opteron 250 (2,4 GHz)

4x CPU Core i7 920 (2,66GHz)

25x CPU Opteron 250 (2,4GHz)

49x CPU Opteron 250 (2,4GHz)

1x GPU Geforce GTX260 (27 cores)

37x speed-upsingle CPU

10x speed-upquad core

2x speed-up49cpu cluster

29‐11‐2011

Efficient classificationEfficient classification

Maji et al., CVPR 2008

For the Intersection Kernel hi is piecewise linear, and quite smooth, blue plot. We can approximate with fewer uniformly spaced segments, red plot. Saves time & space!

vsvs HIKHIKHIK 75 times faster, negligible loss in average precision

χ²χ²

Moving object appearanceMoving object appearance

= Emphasis added

keyframeShot boundary Shot boundary

Moving object appearanceMoving object appearance

= Emphasis added

keyframeShot boundary Shot boundary

Feature Feature fusionfusion

Relative frequency

1 2 3 4 5

Codebook element

Harris -Laplace salient points

Point sampling strategy Color feature extraction Codebook model

Bag-of-features

Spatial pyramid

Dense sampling

1 2 3 4 5

Relative frequency

1 2 3 4 5

Codebook element

Bag-of-features

Spatial pyramid: multiple bags-of-features

+ Codebook reduces dimensionality- Combination still ad hoc - One codebook may dominate

Classifier Classifier fusionfusion

+ Focus on feature strength

+ Fusion in semantic spacep

- Expensive learning effort

- Loss of feature correlation

29‐11‐2011

Unsupervised fusion of classifiersUnsupervised fusion of classifiers

Support Vector

Global Image

+ Aggregation functions reduce learning effort+ Efficient use of training examples

Snoek, TRECVID 2006Wang, ACM MIR 2007

Vector MachineFeature

Extraction

GeometricMean

Logistic Regression

Fisher Linear

Discriminant

Regional Image Feature

Extraction

Keypoint Image Feature

Extraction

- Linear function unlikely to be optimal

Fusing conceptsFusing concepts

•• Exploitation of concept coExploitation of concept co--occurrence occurrence –– Concepts do not occur in vacuumConcepts do not occur in vacuum

Concept 1

Concept 2

Concept 3

Naphade Trans. MM 2001

SkyAircraft

HowHow to to fusefuse conceptsconcepts??

•• LearningLearning modelsmodels

•• IncludeInclude ontologiesontologies

Learning models Learning models -- explicitlyexplicitly

•• Using graphical modelsUsing graphical models–– Computationally complexComputationally complex

Li it d l bilitLi it d l bilit–– Limited scalabilityLimited scalability

Qi, TOMCCAP 2009

Learning models Learning models -- implicitlyimplicitly

•• Using support vector machine, or data miningUsing support vector machine, or data mining–– Assumes classifier learns relationsAssumes classifier learns relations

S ff f tiS ff f ti–– Suffers from error propagationSuffers from error propagation

Weng, ACM MM 2008

Including knowledgeIncluding knowledge

•• Can ontologies help?Can ontologies help?–– Symbolic ontolgoies Symbolic ontolgoies vsvs uncertain detectorsuncertain detectors

Wu, ICME 2004

29‐11‐2011

Concept detection pipelineConcept detection pipeline

IBM 2003

Concept detection pipelineConcept detection pipeline

IBM 2003

FeatureFusion

ClassifierFusion

ConceptFusion

Video diverVideo diver

Wang, ACM MIR 2007

Video diverVideo diver

FeatureFusion

ClassifierFusion

ConceptFusion

Wang, ACM MIR 2007

Supervised L

Supervised Learner

Semantic Features

Combination

Content Features Extraction

Layout Features Extraction

Semantic PathfinderSemantic Pathfinder

Snoek, PAMI 2006

Supervised Learner

Multimodal Features

Combination

Learner

Visual Features Extraction

Textual Features Extraction

Content Analysis Step Style Analysis Step Context Analysis Step

Context Features Extraction

Capture Features Extraction

Select Best of 3 Paths

after Validation

Animal

Sports

Vehicle

Entertainment Monologue

FlagFire

Weather news

Hu Jintao

Supervised L

Supervised Learner

Semantic Features

Combination

Content Features Extraction

Layout Features Extraction

Semantic PathfinderSemantic PathfinderFeatureFusion

ClassifierFusion

ConceptFusion

Snoek, PAMI 2006

Supervised Learner

Multimodal Features

Combination

Learner

Visual Features Extraction

Textual Features Extraction

Content Analysis Step Style Analysis Step Context Analysis Step

Context Features Extraction

Capture Features Extraction

Select Best of 3 Paths

after Validation

Animal

Sports

Vehicle

Entertainment Monologue

FlagFire

Weather news

Hu Jintao

29‐11‐2011

StateState--ofof--thethe--ArtArt

Snoek et al, TRECVID 2008-2011Van Gemert et al, PAMI 2010

Van de Sande et al, PAMI 2010

StateState--ofof--thethe--ArtArtFeatureFusion

ClassifierFusion

Snoek et al, TRECVID 2008-2011Van Gemert et al, PAMI 2010

Van de Sande et al, PAMI 2010

StateState--ofof--thethe--ArtArt

Snoek et al, TRECVID 2008-2011Van de Sande et al, PAMI 2010

Van Gemert et al, PAMI 2010

Software available for download at http://colordescriptors.com

Demo: Demo: MediaMillMediaMill video search enginevideo search engine

http://www.mediamill.nlhttp://www.mediamill.nl

Detecting Semantic Concepts in VideoDetecting Semantic Concepts in VideoConclusion on:Conclusion on:

•• We started with invariance and manual laborWe started with invariance and manual labor

•• We generalized with machine learningWe generalized with machine learning–– …but needed several abstractions to do appropriately…but needed several abstractions to do appropriately

•• For the moment, no oneFor the moment, no one--sizesize--fitsfits--all solution all solution –– Learn optimal machinery per conceptLearn optimal machinery per concept

29‐11‐2011

Problem 3: Problem 3: Many things in the worldMany things in the world

•• ThisThis is the model gapis the model gap

Trial 1: counting dictionary wordsTrial 1: counting dictionary words

Biederman, Psychological Review 1987

Slide credit: Li Fei-Fei

Trial 2: reverseTrial 2: reverse--engineeringengineering

•• Estimation by Hauptmann et al.: 5000Estimation by Hauptmann et al.: 5000–– Using manually labeled queries and conceptsUsing manually labeled queries and concepts

But speculative and questionable assumptionsBut speculative and questionable assumptions

Hauptmann, PIEEE 2008

–– But speculative, and questionable assumptionsBut speculative, and questionable assumptions

‘Google performance’

Oracle Combination + Noise

‘Realistic’ Combination

How to obtain labeled examples?How to obtain labeled examples?massive amounts of

–– …but only human …but only human expertsexperts provide provide good qualitygood quality examplesexamples

•• MM078MM078--Police/Security PersonnelPolice/Security Personnel–– Shots depicting law enforcement or private Shots depicting law enforcement or private

security agency personnel.security agency personnel.

Experts start with concept definitionExperts start with concept definition

y g y py g y p

Expert annotation toolsExpert annotation tools

•• Balance between:Balance between:–– Spatiotemporal level of Spatiotemporal level of

annotation detailannotation detail

Volkmer, ACM MM 2005

annotation detailannotation detail

–– Number of conceptsNumber of concepts

–– Number of positive Number of positive and negative examplesand negative examples

29‐11‐2011

LSCOM LSCOM (Large Scale Concept Ontology for Multimedia)(Large Scale Concept Ontology for Multimedia)

•• Provides manual annotations for 449 conceptsProvides manual annotations for 449 concepts–– In international broadcast TV newsIn international broadcast TV news

Naphade, IEEE MM 2006

•• Connection to Connection to CycCyc ontologyontology

•• LSCOMLSCOM--LiteLite–– 39 semantic concepts39 semantic concepts

http://www.lscom.org/

Verified positive examplesVerified positive examples

•• ImageNetImageNet (15M images)(15M images)

–– 22,000 22,000 categoriescategories

100 l100 l–– > 100 examples> 100 examples

•• SUN SUN (130K images)(130K images)

–– 397 scene categories397 scene categories

–– > 100 examples> 100 examples

Deng et al, CVPR 2009

Xiao et al, CVPR 2010

Random negatives are not Random negatives are not necessarily informativenecessarily informative

PositivesNegatives Decision boundary

Xirong Li et al, ICMR 2011

？Active Learning?AdaBoost?

Social Negative

Bootstrapping

Sampling informative negatives Sampling informative negatives

•• Iteratively selecting the Iteratively selecting the most misclassified most misclassified negatives as the informative onesnegatives as the informative ones

Selection

Prediction*

* airplane classifier

Virtually labeled negatives

Most misclassified negatives

Social negative bootstrappingSocial negative bootstrapping

aviationstation tennis outreach

lotusvagrantsketchpeople

b hlif i ft i l cow

airplaneNegative examples

beachlife aircraft airplane cow

planesister street

tokyolithuania aeroplane aeroplane

tradeoff between effectiveness and efficiency

to find the most informative negatives

Concept detection Concept detection (on VOC08(on VOC08--val)val)

•• Social negative bootstrapping is much better Social negative bootstrapping is much better than the baselinesthan the baselines

29‐11‐2011

Informative negatives of ‘Informative negatives of ‘tvtv’’

Internet Video Search A brief history of television

Documents

TORTURE – IN SEARCH OF - Home - Asser Institute · A...

Web-Assisted Annotation, Semantic Indexing and Search of...

Vectorland: Brief Notes from Using Text Embeddings for...

Appendix 1: Supplementary Brief - Pelmorex 1: Supplementary....

Reactive Tabu Search Contents 1.A brief review of search...

Auto-Tagging, Search and Image Synthesis - IPTC...IDS London...

Brief Guide to › cms › wp-content › uploads ›...

In Search of Social Television Gunnar Harboe Motorola, USA

Television News Search and Analysis with Lucene/Solr

CIC China Search Engine Advertisers Survey Brief 1Q2007

BSAC AI Course Gozo Underwater Search Techniques A brief...

White House Television Office (WHTV) videotape collection...

A Brief History of Search

In Search of Social Television Gunnar Harboe Motorola, USA.....

Brief Introduction About Search Engines & Its Working

OVERALL Approach - Eprints · A Unified, Modular and...