A SYNTAX FOR IMAGE UNDERSTANDING Narendra Ahuja University of Illinois at Urbana-Champaign May 21, 2009 Work Done with . Sinisa Todorovic, Mark Tabb, Himanshu Arora, Varsha . Hedau, Bernard Ghanem, Tim Cheng .
A SYNTAX FOR IMAGE UNDERSTANDING
Narendra Ahuja
University of Illinois at Urbana-Champaign
May 21, 2009
Work Done with .
Sinisa Todorovic, Mark Tabb, Himanshu Arora, Varsha .
Hedau, Bernard Ghanem, Tim Cheng .
The Question
What is a good low-level image representation
to enable
Object Recognition,
Reasoning,
Synthesis, ... ?
What is an Object?
Object = Layout of parts with some Intrinsic Properties
e.g., Wall = Layout of Doors, Windows …
Each Part is itself a (simpler) Object Object = Hierarchy
e.g., Building WallWindows...
Object Complexity = Complexity of parts/hierarchy and layout
e.g., Building comprised of bricks
What is Not an Object
A crowded city street
A serene landscape
Allowed but not for today, for focus on more immediate issues
The Scene Object
Scene = Layout of Objects
= Hierarchical Layout
From Scene to Image
Imaging Preserves Localization
Image = Hierarchical Layout of Regions
Image vs. Objects
Image
Subimages of Parts
Smallest Subimages
(= Smallest parts)
Image
Simpler Objects
Primitive Objects
Recognition and Segmentation
Do not have access to windows with only the object of
interest
For model acquisition as well as subsequent recognition
Need to consider Simultaneous Segmentation and
Modeling/Recognition
Combinatorial Problem
Where is Which Object?
Too many possible subimages
To be matched with object models
Circular problem
Reduce combinatorial complexity,
by reducing object/image size
Parts are Simpler to Represent/Model
Smaller images/objects are
likely to be easier to handle
i.e.
Number of matching Object Models is
likely to be Smaller
Primitive Objects have
the Smallest Number of Candidate Models
Object Representation is Recursive
Object
=
Arrangement of Parts
Characterized by three types of Properties
Photometric Geometric Topological
Each Part is sufficiently simple, or is an
Breaking the Loop
Identify Candidate Subimages
From
A Hierarchical Partitioning of an Image
i.e.,
A Multiscale, Low-Level Image Segmentation
Segments = Objects of different complexity
Why Segments as Candidate Objects
Photometric Segments useful estimates of objects
Because
Object Boundary
Almost Always = Photometric Boundary
Although Photometric Boundary
May or May not = Object Boundary
Because
Independent Objects
Independent shape, orientation, reflectance
Segment/Object Contour
The Argument of Dimensionality
Segment dimensionality = 2D
= Our object dimensionality
Segment information capacity matched with object
vs.
Lower dimensional representations
e.g.,
Point features
Edge fragments
Although 3D still missing
Extensibility
Due to more complete correspondence to parts
Segments
• Simplify analysis/reduce dependence on tools
• Offer greater promise for moving beyond
the basic tasks of today
e.g., to more complex objects,
more abstract objects,
context sensitivity...
Representation Issues vs. Analysis Details
Will focus more on the representation issues
and skip
Detailed tools to carry out the various tasks
e.g. tools for: Probabilistic analysis
Structural analysis
Image Representation
Image Homogeneous regions at
ALL contrasts and sizes
Multiscale
Segmentation
Extract Hierarchical Layout of Regions
Region = Largest Homogeneous Set of Contiguous Pixels
Ahuja PAMI96, Tabb&Ahuja TIP97, Arora&Ahuja ICPR 06
Example Segmentations for Several Contrasts
in Photometric Hierarchy
Image Representation = Segmentation Tree
Multiscale Segmentation Segmentation Tree (of embedded regions)
Image Objects and Image Segmentation Tree
• Images Э Multiple Independent Objects
• Image Tree Э Multiple independent Subtrees
• Each Object = One or More Subtrees
• Object Modeling = Capturing Object Subtrees
• Photometric: Intensity contrast and variance
• Geometric:
• Area, variance of children areas
• 1st central moment, eccentricity
• Squared perimeter over area
• Topological:
• Angle between child and parent’s principal axes
• Displacement of child centroids
• Context vector: spatial distribution of sibling regions
• Todorovic&Ahuja PAMI07, IJCV07
Examples of Properties
Modeling and Recognition = Subtree Matching
Discovery = Matching across image sets (frequency)
Modeling = Finding canonical tree of an object
category
as pdf’s of properties and structure
Recognition = Probabilistic matching
All unsupervised
Model from Multiple Instances of Objects in a Category
Aligning and
Registering
Category
Occurrences
Sets of Matching
Nodes
Object Category Model = Stochastic Tree Structure
region properties
number of children
Object part (hidden)
Exponential Gaussian
Markovian chainstructure + parameters
Each Node and Branch Probabilistically Determined
Model = Grammar
Object Subtree Model
=
Tree of Probability Density Functions
=
Stochastic Grammar
From Model to Simultaneous Recognition and Segmentation
Inference = Matching image tree against the learned tree model
Results: Weizman Horses
training
images
category model
Results: Weizmann Horses
• Object segmentation is good on contours that are:
• Jagged
• Blurred
• Form complex patterns
• Low-contrast regions merge with background
Recall and Precision
Real World
> 30,000 categories
Too Many Categories
• 30000 independent models is not a good idea
Because world is not full of unrelated things
a. Parts are shared among objects
b. In different configurations in different objects
c. So category representations interrelated
d. This is directly reflected in apparent organization of Human
Knowledge/Semantics
Any similar 2D objects?
Arbitrary Images
Category = Set of Similar 2D ObjectsCategories Found
Scaling Up Category Representation
• Categories = Configurations of Shared Subcategories
• Subcategories are simpler and smaller
• Robust detection
• Sharing = Sublinear complexity Minimal computation
unshared
object parts
Multi-Category Representation = Taxonomy
• Interleaved Trees of
• Probability Density Functions of
• Tree Structures, and Tree Node
Properties
UIUC Hoofed Animals Dataset: Contains Six Animals
Simultaneous Recognition and Segmentation
Results: AnimalsSimultaneous Detection, Recognition, SegmentationSimultaneous Recognition and Segmentation
Taxonomy Structure
Vs
Observed Category Statistics
Not All Subcategories are Equally Informative
• So far
• P (Detection) = P (Match Quality)
• = Decision Making Based on Likelihood
•
• Uniform Priors on
• P (Subcat)
• P (Cat| Subcat)
But Discovered Unshared Provide More Evidence
If legs, then many possibilities
If antlers, then very likely deer
If lake, then very unlikely desert
Unshared Categories Uniqueness
Need Bayesian Detection
• During Training on Representative Datasets
• Estimate P(Cat)
• Estimate P(Cat| Subcat)
• Todorovic&AhujaCVPR08
Results: Caltech-101 and Caltech-256
Caltech-101
Caltech-256
Bringing In Layout
So Far Pure Hierarchy
Image = Segmentation TREE of Regions
Object = Subtree Actually {Subtrees}
= Recursive Embedding of Regions
Taxonomy = Interleaved STs
All Characterized by Probabilities
Problems with Pure Hierarchy
No Explicit Layout Information
Object Model = No Neighbor Relationships Among Parts
Undesirable Consequence:
Recognition Insensitive to Spatial Scrambling of Parts !
Solution = Connected Segmentation Tree (CST)
Add Links between Neighbor Nodes
Implementation = Links between Siblings
Result: Connected Segmentation Tree (CST)
= Hierarchy of Neighbor Graphs
Ahuja&Todorovic, CVPR’08
CST Based Taxonomy
Each Category = CST Subtree
(Actually {SubCSTs})
Taxonomy = Interleaved CSTs
= Interleave Hierarchies of Neighbors Graphs
Training Images Discovered CST Category Model
Results: Weizmann Horses
ST vs. CST
Degree of occlusion
artificially made in the image
Binary strength of
neighbor relationships
Real-valued strength of
neighbor relationships
ST vs. CST
Input Images Segmentation Tree CST
UIUC Hoofed Animals
LabelMe
CSTs outperform STs
Especially for partial occlusion, or
When only region layout is used without containment
Vs. Language
Embedding = Hierarchy (and Legolike compatibility)
Neighbors = Juxtaposition
Occlusion = Only Subtrees of Object tree visible
Inter-object Interaction/combinatorics friendlier
Ordering/Multiple Counting addressed by structure
Instability of Segmentation Addressable
• Splitting and merging of adjacent regions
• Partial Matching
Hedau&AhujaCVPR08, Cheng and Ahuja
Syntax should Feed Multiple Semantics
A Representation Should Work
for Multiple Applications
Modeling 2.1D Texture
• Physical texels are characterized by
• Texel thickness << Texel distance
• Inter-texel occlusion
• Only a part of a texel may be visible
• Visible texel parts = Samples of
different, unknown texel parts
Learning Texel Model
union + PDF2.1D texture identified subimages registration
Ahuja and Todorovic, ICCV07
Texel Extraction Results
Another Example – Texture Segmentation
Texture Segmentation
Another Example: Texel Distribution
How are texels distributed across texture
Ghanem and Ahuja, Submitted
Summary
• Syntax = Connected Segmentation Tree
• Semantics = Recognition, Synthesis, ...
• Model = Stochastic Grammar
• Inference = Grammar Based Parsing/Recognition (Not
covered)
• Tools = Structural and Statistical Analysis (Not covered)