1 Discrete Inference & Learning in Artificial Vision M. Nikos Paragios & M. Pawan Kumar Lecture 1 Introduction to artificial vision with discrete graphical models
Sep 13, 2015
1
Discrete Inference & Learning
in Artificial Vision
M. Nikos Paragios & M. Pawan Kumar
Lecture 1 Introduction to artificial vision with discrete graphical models
2
Human Vision
The sensor (iris - diaphragm in a camera, the cornea and the lens are both lens-like objects, the retina is where the image is recorded - CCD sensor)
The processor (information is transferred through the optic nerve to the striate cortex brain part where massive processing is performed towards complete real-time visual scene understanding almost 50% of the human brain)
3
Artificial Vision
The input (static, video, depth,
monochromatic, color high dynamic range, etc sensors)
The processor (powerful
computers exploring input, prior knowledge and models)
The process (expressing task-specific visual understanding tasks as mathematical inference problems and solve them approximately through computer simulations)
4
Why artificial vision is so complex?
The input
Large variety of sensors Images/signals of varying quality
The processor
Even the most powerful individual processor does not match up with a tiny portion of the human brain processing capacities
The mathematical inference
We are ending up solving problems being ill-defined, ill-posed, non convex, involving non-linear objective functions with numerous local minima
This is what you see
This is what your computer sees
5
Artificial Vision
10/01/2014 5
6
Artificial Vision Paradigm
Model
Optimization Procedure
Model/Data Association
Observations
7
Left Ventricle Segmentation (risk of heart attack)
Parameters
Mathematical Model
Computer Vision Paradigm
Medical image Modalities
Mathematical Model
Model-to-Data Association
Optimization
Clinical Problem
Model-to-data-association
Optimization
8
Main Challenges
Curse of Dimensionality : find a compromise between the expression power of the model and its complexity [finding the right model]
Curse of Non-linearity: the association of the model parameters and the observations are highly non-linear [finding the right relation between measurements and parameters to be estimated]
Curse of Non-Convexity: the designed objective function leaves in a high-dimensional non-convex space [finding the right objective function and be able to solve it]
Curse of Non-Modularity: any solution is hardly portable to another application setting or another problem [do not repeat the process from scratch when moving from one visual task to another]
9
Relationship with other fields
@ Wikipedia
10
Artificial Vision
11
Artificial Vision
12
Safety & Security
Intrusion detection
Traffic management / control
Activity Analysis though motion/trajectories/tracking
Facial/Iris/Fingerprint/ Recognition
Parking driving assistance / Pedestrian detection
Lane detection/ automatic cruise control
13
Computer Aided Diagnosis & Vision
Image Reconstruction
Organ Segmentation & Matching
Multimodal Image Fusion/Registration
Image-based biomarkers
Computer Assisted Surgery Navigation
Population modeling, computational anatomy
Functional understanding of the human brain
14
Media, Consumer Products
Computer Games/Interaction
Special Visual Effects
Large Scale 3d Modeling
High Resolution/Dynamic Range
Human Computer Interaction
Augmented Reality & new media
15
Discrete Artificial Vision
Given: Parameters from a graph
A neighborhood System Discrete label set
Assign labels (to objects) that minimize the energy:
edges objects
pairwise potential unary potential
MRF optimization ubiquitous in vision (and beyond)
16
Optimization of high-order models
Hypergraph
Parameters
Hyperedges/cliques
High-order energy minimization problem
high-order potential (one per clique)
unary potential (one per node)
hyperedges
parameters
17
Low Level Vision
Blind Image Deconvolution / Image Inpainting
Variables: Pixels
Labels: Intensity Values
Graph Connectivity: Pair-wise
18
Blind Image Deconvolution
19
=
Blurred image generation process
20
=
Blurred image generation process
21
=
Blurred image generation process
22
=
Blurred image generation process
blur kernel = camera motion
23
Blind Image deconvolution
observed blurred image
latent sharp image
blur kernel
noise
Goal: given just I compute both x and k
24
= quantized version of image x with just 15 colors (piecewise constant)
Yet both x and produce almost same blurry image
IDEA: compute , which has much simpler structure
High-level idea: how to reduce ill-posedness?
25
MRF-based Blind Image deconvolution
26
MRF-based Blind Image deconvolution
27
MRF-based Blind Image deconvolution
28
MRF-based Blind Image deconvolution
29
MRF-based Blind Image deconvolution
30
MRF-based Blind Image deconvolution
31
MRF-based Blind Image deconvolution
32
MRF-based Blind Image deconvolution
33 33
The Image Completion Problem Based only on the observed part of an incomplete image, fill its
missing part in a visually plausible way
We want to be able to handle: complex natural images with (possibly) large missing regions in an automatic way (i.e. without user intervention)
Many applications: photo editing, film post-production, object removal, text removal, image repairing etc.
34 34
The Image Completion Problem We would also like our method to be able to handle the
related problem of texture synthesis
In texture synthesis, we are given as input a small texture and we want to generate a larger texture of arbitrary size (specified by the user)
35 35
Exemplar-based approaches Key idea: fill missing region by copying exemplars i.e. pixels
(or patches) from the observed image part
Disadvantages:
Successful if missing region consists of only one texture e.g. texture synthesis
Greedy approach: image is filled one patch at a time
36 36
Image Completion as a Discrete Global Optimization Problem
Labels L = all wxh patches from source region S
MRF nodes = all lattice points whose neighborhood intersects target region T
potential = how well source patch xp agrees with source region around p
potential = how well source patches xp, xq agree on their overlapping region
S
T
sample labels
37 37
38 38
39 39
40 40
41 41
42 42
Mid-Level Vision
Object Segmentation / Optical Flow Estimation / Deformable Fusion / Graph Matching
Variables: control points
Labels: 2d/3d
Graph Connectivity: Pair-wise / higher order
43
Pose Invariant Segmentation of the Heart
Challenges
- Human variability - Complex background
- Low contrast
- Noise
Goal - Automatic
- Robust
- Pose-invariant !
Fig. Manual segmentation on 3D CT images
Fig. Human variability
B. XIANG et al, 3D Cardiac Segmentation with Pose-invariant
Higher-order MRFs, ISBI 2012
44
Shape representation
Point distribution model
Point distribution model
1{ , , }nX x x
Y X
{( , , )}i j kT x x x
S T
Third-order cliques
Triangulated mesh
45
Statistical shape prior
Local constraints
Global shape
jx
kx
ix
1( ) ( , )c
c C
P X PZ
( , , ) ( , )i j kP
46
Statistical shape prior
Local constraints
Global shape
Pose-invariant (i.e. translation, rotation, scale) !
jx
kx
ix
1( ) ( , )c
c C
P X PZ
( , , ) ( , )i j kP
47
Qualitative Results
Fig. Segmentation results of 3D CT volumes
Accurate boundaries with low contrast images
48
Dense Image Registration using MRFs
49
Basic Idea of Intensity-based Registration
Image registration as an optimization problem
Target and source Image:
Transformation:
Image metric:
50
Dimensionality Reduction
Linear combination of control points
e.g. Free-Form Deformations (Sederberg et al. 1986; Rueckert et al. 1999)
51
(Weighted) Block Matching
Redefinition of data term w.r.t. control lattice
Pixel-wise image metrics weighted by normalized basis functions
image points closer to a control point gain more influence on its matching energy
Statistical image metrics (e.g. mutual information, cross correlation)
evaluation of image metric in local patches centered at the control points
block size depends on control lattice resolution
52
Discrete Labeling Problem
Markov Random Field formulation with pairwise interactions
Unary potentials (matching):
Pairwise potentials (smoothness):
p q r
s t u
v w x
p q r
s t u
v w x
Nodes
Edges
53
Some Results
Affine 12-DOF Rueckert Ours
Average Surface Distance 1.66 mm 1.14 mm 1.00 mm
Average Surface Distance 1.92 mm 1.31 mm 1.06 mm
54
Experimental Validation Data Set [CMA GMH Harvard]
55
Experimental Validation Qualitative Results
56
Experimental Validation Qualitative Results
57
HIGHER-ORDER NON-RIGID 3D SURFACE MATCHING
58
High-order Graph Matching
Graph 1 Graph 2
59
High-order Graph Matching
Graph 1 Graph 2
60
High-order Graph Matching
Graph 1 Graph 2
61
High-order Graph Matching a
b
c
Graph 1 Graph 2
62
Experimental Results
63
Experimental Results
64
High-level Vision
View-Point invariant 2.5D-3D/ Large-scale Parsing with shape grammars
Variables: control points
Labels: 2d/3d displacements
Graph Connectivity: Pair-wise / higher order
65
Goals
Segmentation Tracking Depth ordering (2.5D)
2D image plane 3D Scene
Occlusion Relationship
66
An illustrative example:
Joint 2.5D Layered Modeling
Object-level representation Pixel-level representation
2D parametric shape model
1 = 0
2 = 1
0 = 2
Relative depth An object i can occlude object j only if < Background has the biggest depth
Label : the associated layers index Depth : the associated layers depth
Object 0 (Background )
Object 2
Object 1
= 1 = 0
67
What do we infer?
Joint 2.5D Layered Modeling
Object 0 (Background ) Object 2 Object 1
0 = const =?; = ? =?; = ?
Object-level representation
Pixel-level representation
= ? ; =?
What is the connection between
the two representations?
68
Markov Random Field Formulation (1) An example for two tracked objects
69
Markov Random Field Formulation (1) An example for two tracked objects
= ( , ) (shape parameters, depth)
70
Markov Random Field Formulation (1) An example for two tracked objects
= , (label, depth)
= ( , ) (shape parameters, depth)
71
Experimental Results (1)
Video from: Huang & Essa. Tracking Multiple Objects through Occlusions. CVPR05
72
Goals
2D image plane 3D Scene
Shape matching Statistical shape modeling Knowledge-based 3D segmentation
Segmentation Tracking Depth ordering (2.5D)
3D landmark model inference from 2D images (2D-3D)
73
Illustration of projection prior:
(2)|(3), exp ,
Error function defined on a quadruplet of point:
, = , \t, \t
, = ,
74
Qualitative Results for 3D Model Inference
Setting
Dataset: 101 samples from BU-4DFE [Yin et al. 2008]
Leave-one-out cross validation
Blue: ground truth; Red: result
Experimental Results
75
Image-based Modeling of Architecture using Shape Grammars
76
Procedural Modeling of Facades Start from an axiom (Image)
Sequentially apply replacement rules
The derivation tree keeps track of the building structure
76
77
Procedural Modeling of Facades Start from an axiom (Image)
Sequentially apply replacement rules
The derivation tree keeps track of the building structure
77
78
Procedural Modeling of Facades Start from an axiom (Image)
Sequentially apply replacement rules
The derivation tree keeps track of the building structure
78
79
Procedural Modeling of Facades Start from an axiom (Image)
Sequentially apply replacement rules
The derivation tree keeps track of the building structure
79
80
Procedural Modeling of Facades
80
Start from an axiom (Image)
Sequentially apply replacement rules
The derivation tree keeps track of the building structure
To be optimized: topology & geometry
81
What about real buildings ?
81
82
A small district
82
83 83
84
Segmentation energy
Single Pixel x
Single Region R
R
Rx Rx
xxR fcpcc )(log)()(
)(log)( xx fcpc
85
Segmentation energy
Single Pixel x
Single Region R
Segmentation
i Rx
xi
i
iR
i
ifcpcE )(log)()(
)(log)( xx fcpc
Rx Rx
xxR fcpcc )(log)()(
86
Architecturally consistent Robust to illumination conditions, hard cast shadow, reflections
Qualitative Results
86
87
Qualitative Results
88
Multi-view parsing using genetic algorithm
single rectified image calibrated image sequence
89
Qualitative Results
90
Discrete Artificial Vision
Given: Parameters from a graph
A neighborhood System Discrete label set
Assign labels (to objects) that minimize the energy:
edges objects
pairwise potential unary potential
MRF optimization ubiquitous in vision (and beyond)
91
Optimization of high-order models
Hypergraph
Parameters
Hyperedges/cliques
High-order energy minimization problem
high-order potential (one per clique)
unary potential (one per node)
hyperedges
parameters
92
Conclusions
Discrete Graphical Models, is a promising answer to artificial vision
Curse of Dimensionality : Prior Knowledge either through anatomy of machine learning techniques towards dimensionality reduction
Curse of Non-linearity: Model Decomposition / Data association allows direct support estimation of parameter selection from the images
Curse of Non-Convexity: Regularization terms / dropping out of constraints can improve the optimality properties of the obtained solution
Curse of Non-Modularity: Model/Data Association/Inference Decomposition and use of gradient free methods