Hanwang Zhang 1, Zheng-Jun Zha 2, Yang Yang 1, Shuicheng Yan 1, Yue Gao 1, Tat-Seng Chua 1 1: National University of Singapore 2: Institute of Intelligent.

ATTRIBUTE-AUGMENTED SEMANTIC HIERARCHY

Towards Bridging Semantic Gap and Intention Gap in Image Retrieval

Hanwang Zhang1, Zheng-Jun Zha2, Yang Yang1, Shuicheng Yan1, Yue Gao1, Tat-Seng Chua1

1: National University of Singapore

2: Institute of Intelligent Machines, Chinese Academy of Sciences

What happened?

1/33

Search Engine Query UserData

Large-scale

Unstructured

What happened?

2/33

Search Engine Query UserData

Semantic Gap

Intention Gap

Semantic Concept

SemanticHierarchy

Low-level Visual Feature

High-level Semantic

Bridging Semantic Gap

semantic

ontological

3/33

Semantic Gap Bridged? No !

Bridging Intention Gap

4/33

User Feedback

Low-level Visual Feature

User Intention

Intention Gap Bridged? No !

Challenges

Low-level Feature

Search IntentSemantics

5/33

Solution: Attributes

Low-level Feature

Search Intent

Attributes

Semantics

6/33

Attributes

7/33

Component

Appearance

Discriminability

snout, ear, etc

furry, brown, etc

cat or dog? etc

☞ Hierarchical Semantics

☞ Hierarchical Semantic

Similarity

Animal

Cat Dog

Vehicle

Root

Corgi Pug

Solution: Attribute-augmented Semantic Hierarchy (A2SH)

8/33

☞ Semantic hierarchy☞ Pool of attributes☞ Concept classifiers☞ Attribute classifiers

General framework for Content-based Image Retrieval

legfurry brown

wheelshiny

glass

wet

metalhead Roo

tAnimal

Dog

Pug

1

2

A Prototype of A2SH

9/33

Concepts: 1322 (958 leaves)Depth: 3 ~ 11Images: 1.23 million 50% training 50% testing

☞ 95,800 images are manually labeled with 33 attributes☞ Automatically discovered 2-26 attributes for each

concept node☞ 15 ~ 58 attributes per concept

ILSVRC 2012 ImageNet

TailLeg

Why A2SH?

10/33

SmallerVariance

Descriptive, Transferrable

☞ Attributes bridge the semantic gap

glass

wing

wheel

1

2

conceptattribute

Why A2SH?

11/33

Which “Wing”?

☞ A2SH well defines attributes more informative

☞ A2SH bridges the intention gap

12/33

Intention as attributes through attribute and image feedbacks

Leg Skin

Leg

Tail

Feedbacks are automatically digested into multiple levels

1

2

Attribute Feedback Image Feedback

Why A2SH?

13/33

Demo on A2SH

How A2SH: System Overview

14/33

How A2SH: Off-line

15/33

Concept Classifiers

16/33

predicts whether an image belongs to concept c

C

Concept Classifiers

17/33

c +

+ +

_+_

☞Exploit hierarchical relation

☞Alleviate error propagation

predicts whether an image belongs to concept c

hierarchical one v.s. all

18/33

predicts the presence of an attribute a of concept c

Attribute Classifiers

☞ Nameable attributes: human nameable, hierarchical supervised learning

☞ Unnameable attributes: human unnameable, hierarchical unsupervised learning

☞ They together offer a comprehensive description of the multiple facets of a concept

Ear

Snout

Eye

Furry

Unnameable Attribute Classifiers

19/33

☞Nameable attributes are not discriminative enough.

☞Discover new attributes for concepts that share many nameable attributes.

☞2-26 for each concept.

D. Parikh, K. Graman. “Interactively Building a Discriminative Vocabulary of Nameable Attributes”, CVPR 2011.

What we have now?

☞ Concept classifiers Semantic path prediction

☞ Attribute classifiers Image representation along the semantic path

20/33

Hierarchical Semantic Representation

Hierarchical Semantic Similarity

21/33

Images are represented by attributes in the context of concepts

Hierarchical semantic similarity

Local Semantic Metric

Same concept close, different concepts far

22/33

What we have now?

☞ Concept classifiers Semantic path prediction

☞ Attribute classifiers Image representation along the semantic path

☞ Hierarchical Semantic Similarity FunctionSemantic similarity between images

Hierarchical Semantic Representation

23/33

How A2SH: On-line

24/33

Automatic Retrieval

25/33

Hierarchical semantic similarity

Candidate images are retrieved by semantic indexing

c child(c)

Ic

candidate images

Low complexity!Efficient!

Evaluation☞ A2SH: our method

☞ hBilinear: retrieves images by bilinear semantic metric (Deng et al. 2011 CVPR)

☞ hPath: length (confidence) of the common semantic

path of an image and the query

☞ hVisual: hPath+visual similarity

☞ fSemantic: flat semantic feature similarity

☞ fVisual: visual feature similarity

26/33

Training: 50%, Gallery: 50% (95, 800 queries)

Evaluation: Automatic Retrieval

27/33

Method fVisual fSemantic

hVisual hBilinear

A2SH

Time (ms) 1.18 x 104 3.62 x 103 7.42 x 102 4.47 x 102 70.6

Effective!

Efficient!

Case Study Automatic Retrieval

28/33

fVisual

hBilinear

A2SH

matched

semantically similar

Interactive Retrieval

29/33

Query

☞ Image-level Feedback

30/33

Query

Leg

Cloth

☞ Attribute-level Feedback

Interactive Retrieval

Zhang et al. “Attribute Feedback”, MM 2012

Evaluation: Interactive Retrieval

31/33

Method A2SH HF QPM SVM

MAP@20 0.25 0.22 0.21 0.21

2-min fixed time

Case Study Interactive Retrieval

32/33

QPM

HF

A2SH

initial matched

semantically similar

Summary

A2SH

Attribute-augmented Semantic Hierarchy

SH with Attributes

Gaps bridging

Framework for CBIR

1.23 M Images

EffectivenessVerified

33/33

? !

Q A&

Nameable Attribute Classifiers

base

selected


confusion matrix

confusion matrix


Data Set Only leaves have images and each concept’s images are merged bottom-top 50% to 50% training and testing (gallery) 100 random images per leaf from testing are used as queries 100 random images from each leaf’s training images are annotated with attributes Color, texture, edge and multi-scale dense SIFT. LLC with max-pooling, 2-level spatial pyramid. 35,903-d feature vector

Concept Classifiers

0.93

0.920.77

Nameable Attributes Classifiers

Nameable Attributes Classifiers

Unnameable Attributes Classifiers

Unnameable Attributes Classifiers

Local Metric Learning

Automatic Retrieval

Interactive

Hanwang Zhang 1, Zheng-Jun Zha 2, Yang Yang 1, Shuicheng Yan 1, Yue Gao 1, Tat-Seng Chua 1 1: National University of Singapore 2: Institute of Intelligent.

Documents

similar slide

c slide

confusion matrix slide

base selected slide

leg skin leg tail slide

yang yang

sh initial matched

sh time ms1