Lecture 01 Artificial Vision Lecture1

1

Discrete Inference & Learning

in Artificial Vision

M. Nikos Paragios & M. Pawan Kumar

Lecture 1 Introduction to artificial vision with discrete graphical models

2

Human Vision

The sensor (iris - diaphragm in a camera, the cornea and the lens are both lens-like objects, the retina is where the image is recorded - CCD sensor)

The processor (information is transferred through the optic nerve to the striate cortex brain part where massive processing is performed towards complete real-time visual scene understanding almost 50% of the human brain)

3

Artificial Vision

The input (static, video, depth,

monochromatic, color high dynamic range, etc sensors)

The processor (powerful

computers exploring input, prior knowledge and models)

The process (expressing task-specific visual understanding tasks as mathematical inference problems and solve them approximately through computer simulations)

4

Why artificial vision is so complex?

The input

Large variety of sensors Images/signals of varying quality

The processor

Even the most powerful individual processor does not match up with a tiny portion of the human brain processing capacities

The mathematical inference

We are ending up solving problems being ill-defined, ill-posed, non convex, involving non-linear objective functions with numerous local minima

This is what you see

This is what your computer sees

5

Artificial Vision

10/01/2014 5

6

Artificial Vision Paradigm

Model

Optimization Procedure

Model/Data Association

Observations

7

Left Ventricle Segmentation (risk of heart attack)

Parameters

Mathematical Model

Computer Vision Paradigm

Medical image Modalities

Mathematical Model

Model-to-Data Association

Optimization

Clinical Problem

Model-to-data-association

Optimization

8

Main Challenges

Curse of Dimensionality : find a compromise between the expression power of the model and its complexity [finding the right model]

Curse of Non-linearity: the association of the model parameters and the observations are highly non-linear [finding the right relation between measurements and parameters to be estimated]

Curse of Non-Convexity: the designed objective function leaves in a high-dimensional non-convex space [finding the right objective function and be able to solve it]

Curse of Non-Modularity: any solution is hardly portable to another application setting or another problem [do not repeat the process from scratch when moving from one visual task to another]

9

Relationship with other fields

@ Wikipedia

10

Artificial Vision

11

Artificial Vision

12

Safety & Security

Intrusion detection

Traffic management / control

Activity Analysis though motion/trajectories/tracking

Facial/Iris/Fingerprint/ Recognition

Parking driving assistance / Pedestrian detection

Lane detection/ automatic cruise control

13

Computer Aided Diagnosis & Vision

Image Reconstruction

Organ Segmentation & Matching

Multimodal Image Fusion/Registration

Image-based biomarkers

Computer Assisted Surgery Navigation

Population modeling, computational anatomy

Functional understanding of the human brain

14

Media, Consumer Products

Computer Games/Interaction

Special Visual Effects

Large Scale 3d Modeling

High Resolution/Dynamic Range

Human Computer Interaction

Augmented Reality & new media

15

Discrete Artificial Vision

Given: Parameters from a graph

A neighborhood System Discrete label set

Assign labels (to objects) that minimize the energy:

edges objects

pairwise potential unary potential

MRF optimization ubiquitous in vision (and beyond)

16

Optimization of high-order models

Hypergraph

Parameters

Hyperedges/cliques

High-order energy minimization problem

high-order potential (one per clique)

unary potential (one per node)

hyperedges

parameters

17

Low Level Vision

Blind Image Deconvolution / Image Inpainting

Variables: Pixels

Labels: Intensity Values

Graph Connectivity: Pair-wise

18

Blind Image Deconvolution

19

=

Blurred image generation process

20

=


21

=


22

=


blur kernel = camera motion

23

Blind Image deconvolution

observed blurred image

latent sharp image

blur kernel

noise

Goal: given just I compute both x and k

24

= quantized version of image x with just 15 colors (piecewise constant)

Yet both x and produce almost same blurry image

IDEA: compute , which has much simpler structure

High-level idea: how to reduce ill-posedness?

25

MRF-based Blind Image deconvolution

26


27


28


29


30


31


32


33 33

The Image Completion Problem Based only on the observed part of an incomplete image, fill its

missing part in a visually plausible way

We want to be able to handle: complex natural images with (possibly) large missing regions in an automatic way (i.e. without user intervention)

Many applications: photo editing, film post-production, object removal, text removal, image repairing etc.

34 34

The Image Completion Problem We would also like our method to be able to handle the

related problem of texture synthesis

In texture synthesis, we are given as input a small texture and we want to generate a larger texture of arbitrary size (specified by the user)

35 35

Exemplar-based approaches Key idea: fill missing region by copying exemplars i.e. pixels

(or patches) from the observed image part

Disadvantages:

Successful if missing region consists of only one texture e.g. texture synthesis

Greedy approach: image is filled one patch at a time

36 36

Image Completion as a Discrete Global Optimization Problem

Labels L = all wxh patches from source region S

MRF nodes = all lattice points whose neighborhood intersects target region T

potential = how well source patch xp agrees with source region around p

potential = how well source patches xp, xq agree on their overlapping region

S

T

sample labels

42 42

Mid-Level Vision

Object Segmentation / Optical Flow Estimation / Deformable Fusion / Graph Matching

Variables: control points

Labels: 2d/3d

Graph Connectivity: Pair-wise / higher order

43

Pose Invariant Segmentation of the Heart

Challenges

- Human variability - Complex background

- Low contrast

- Noise

Goal - Automatic

- Robust

- Pose-invariant !

Fig. Manual segmentation on 3D CT images

Fig. Human variability

B. XIANG et al, 3D Cardiac Segmentation with Pose-invariant

Higher-order MRFs, ISBI 2012

44

Shape representation

Point distribution model

Point distribution model

1{ , , }nX x x

Y X

{( , , )}i j kT x x x

S T

Third-order cliques

Triangulated mesh

45

Statistical shape prior

Local constraints

Global shape

jx

kx

ix

1( ) ( , )c

c C

P X PZ

( , , ) ( , )i j kP

46

Statistical shape prior

Local constraints

Global shape

Pose-invariant (i.e. translation, rotation, scale) !

jx

kx

ix

1( ) ( , )c

c C

P X PZ

( , , ) ( , )i j kP

47

Qualitative Results

Fig. Segmentation results of 3D CT volumes

Accurate boundaries with low contrast images

48

Dense Image Registration using MRFs

49

Basic Idea of Intensity-based Registration

Image registration as an optimization problem

Target and source Image:

Transformation:

Image metric:

50

Dimensionality Reduction

Linear combination of control points

e.g. Free-Form Deformations (Sederberg et al. 1986; Rueckert et al. 1999)

51

(Weighted) Block Matching

Redefinition of data term w.r.t. control lattice

Pixel-wise image metrics weighted by normalized basis functions

image points closer to a control point gain more influence on its matching energy

Statistical image metrics (e.g. mutual information, cross correlation)

evaluation of image metric in local patches centered at the control points

block size depends on control lattice resolution

52

Discrete Labeling Problem

Markov Random Field formulation with pairwise interactions

Unary potentials (matching):

Pairwise potentials (smoothness):

p q r

s t u

v w x

p q r

s t u

v w x

Nodes

Edges

53

Some Results

Affine 12-DOF Rueckert Ours

Average Surface Distance 1.66 mm 1.14 mm 1.00 mm

Average Surface Distance 1.92 mm 1.31 mm 1.06 mm

54

Experimental Validation Data Set [CMA GMH Harvard]

55

Experimental Validation Qualitative Results

56

Experimental Validation Qualitative Results

57

HIGHER-ORDER NON-RIGID 3D SURFACE MATCHING

58

High-order Graph Matching

Graph 1 Graph 2

59


Graph 1 Graph 2

60


Graph 1 Graph 2

61

High-order Graph Matching a

b

c

Graph 1 Graph 2

62

Experimental Results

63


64

High-level Vision

View-Point invariant 2.5D-3D/ Large-scale Parsing with shape grammars

Variables: control points

Labels: 2d/3d displacements

Graph Connectivity: Pair-wise / higher order

65

Goals

Segmentation Tracking Depth ordering (2.5D)

2D image plane 3D Scene

Occlusion Relationship

66

An illustrative example:

Joint 2.5D Layered Modeling

Object-level representation Pixel-level representation

2D parametric shape model

1 = 0

2 = 1

0 = 2

Relative depth An object i can occlude object j only if < Background has the biggest depth

Label : the associated layers index Depth : the associated layers depth

Object 0 (Background )

Object 2

Object 1

= 1 = 0

67

What do we infer?

Joint 2.5D Layered Modeling

Object 0 (Background ) Object 2 Object 1

0 = const =?; = ? =?; = ?

Object-level representation

Pixel-level representation

= ? ; =?

What is the connection between

the two representations?

68

Markov Random Field Formulation (1) An example for two tracked objects

69


= ( , ) (shape parameters, depth)

70


= , (label, depth)

= ( , ) (shape parameters, depth)

71

Experimental Results (1)

Video from: Huang & Essa. Tracking Multiple Objects through Occlusions. CVPR05

72

Goals

2D image plane 3D Scene

Shape matching Statistical shape modeling Knowledge-based 3D segmentation

Segmentation Tracking Depth ordering (2.5D)

3D landmark model inference from 2D images (2D-3D)

73

Illustration of projection prior:

(2)|(3), exp ,

Error function defined on a quadruplet of point:

, = , \t, \t

, = ,

74

Qualitative Results for 3D Model Inference

Setting

Dataset: 101 samples from BU-4DFE [Yin et al. 2008]

Leave-one-out cross validation

Blue: ground truth; Red: result


75

Image-based Modeling of Architecture using Shape Grammars

76

Procedural Modeling of Facades Start from an axiom (Image)

Sequentially apply replacement rules

The derivation tree keeps track of the building structure

76

77




77

78




78

79




79

80

Procedural Modeling of Facades

80

Start from an axiom (Image)



To be optimized: topology & geometry

81

What about real buildings ?

81

82

A small district

82

84

Segmentation energy

Single Pixel x

Single Region R

R

Rx Rx

xxR fcpcc )(log)()(

)(log)( xx fcpc

85

Segmentation energy

Single Pixel x

Single Region R

Segmentation

i Rx

xi

i

iR

i

ifcpcE )(log)()(

)(log)( xx fcpc

Rx Rx

xxR fcpcc )(log)()(

86

Architecturally consistent Robust to illumination conditions, hard cast shadow, reflections

Qualitative Results

86

87

Qualitative Results

88

Multi-view parsing using genetic algorithm

single rectified image calibrated image sequence

89

Qualitative Results

90

Discrete Artificial Vision

Given: Parameters from a graph

A neighborhood System Discrete label set

Assign labels (to objects) that minimize the energy:

edges objects

pairwise potential unary potential

MRF optimization ubiquitous in vision (and beyond)

91

Optimization of high-order models

Hypergraph

Parameters

Hyperedges/cliques

High-order energy minimization problem

high-order potential (one per clique)

unary potential (one per node)

hyperedges

parameters

92

Conclusions

Discrete Graphical Models, is a promising answer to artificial vision

Curse of Dimensionality : Prior Knowledge either through anatomy of machine learning techniques towards dimensionality reduction

Curse of Non-linearity: Model Decomposition / Data association allows direct support estimation of parameter selection from the images

Curse of Non-Convexity: Regularization terms / dropping out of constraints can improve the optimality properties of the obtained solution

Curse of Non-Modularity: Model/Data Association/Inference Decomposition and use of gradient free methods

Lecture 01 Artificial Vision Lecture1

Documents

human vision

model parameters

computer simulations

dataassociation optimization

visual task

curse of nonconvexity

right objective function

processor information