Top Banner
1 Discrete Inference & Learning in Artificial Vision M. Nikos Paragios & M. Pawan Kumar Lecture 1 Introduction to artificial vision with discrete graphical models
92

Lecture 01 Artificial Vision Lecture1

Sep 13, 2015

Download

Documents

Alex Ch

artificial vision 1
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 1

    Discrete Inference & Learning

    in Artificial Vision

    M. Nikos Paragios & M. Pawan Kumar

    Lecture 1 Introduction to artificial vision with discrete graphical models

  • 2

    Human Vision

    The sensor (iris - diaphragm in a camera, the cornea and the lens are both lens-like objects, the retina is where the image is recorded - CCD sensor)

    The processor (information is transferred through the optic nerve to the striate cortex brain part where massive processing is performed towards complete real-time visual scene understanding almost 50% of the human brain)

  • 3

    Artificial Vision

    The input (static, video, depth,

    monochromatic, color high dynamic range, etc sensors)

    The processor (powerful

    computers exploring input, prior knowledge and models)

    The process (expressing task-specific visual understanding tasks as mathematical inference problems and solve them approximately through computer simulations)

  • 4

    Why artificial vision is so complex?

    The input

    Large variety of sensors Images/signals of varying quality

    The processor

    Even the most powerful individual processor does not match up with a tiny portion of the human brain processing capacities

    The mathematical inference

    We are ending up solving problems being ill-defined, ill-posed, non convex, involving non-linear objective functions with numerous local minima

    This is what you see

    This is what your computer sees

  • 5

    Artificial Vision

    10/01/2014 5

  • 6

    Artificial Vision Paradigm

    Model

    Optimization Procedure

    Model/Data Association

    Observations

  • 7

    Left Ventricle Segmentation (risk of heart attack)

    Parameters

    Mathematical Model

    Computer Vision Paradigm

    Medical image Modalities

    Mathematical Model

    Model-to-Data Association

    Optimization

    Clinical Problem

    Model-to-data-association

    Optimization

  • 8

    Main Challenges

    Curse of Dimensionality : find a compromise between the expression power of the model and its complexity [finding the right model]

    Curse of Non-linearity: the association of the model parameters and the observations are highly non-linear [finding the right relation between measurements and parameters to be estimated]

    Curse of Non-Convexity: the designed objective function leaves in a high-dimensional non-convex space [finding the right objective function and be able to solve it]

    Curse of Non-Modularity: any solution is hardly portable to another application setting or another problem [do not repeat the process from scratch when moving from one visual task to another]

  • 9

    Relationship with other fields

    @ Wikipedia

  • 10

    Artificial Vision

  • 11

    Artificial Vision

  • 12

    Safety & Security

    Intrusion detection

    Traffic management / control

    Activity Analysis though motion/trajectories/tracking

    Facial/Iris/Fingerprint/ Recognition

    Parking driving assistance / Pedestrian detection

    Lane detection/ automatic cruise control

  • 13

    Computer Aided Diagnosis & Vision

    Image Reconstruction

    Organ Segmentation & Matching

    Multimodal Image Fusion/Registration

    Image-based biomarkers

    Computer Assisted Surgery Navigation

    Population modeling, computational anatomy

    Functional understanding of the human brain

  • 14

    Media, Consumer Products

    Computer Games/Interaction

    Special Visual Effects

    Large Scale 3d Modeling

    High Resolution/Dynamic Range

    Human Computer Interaction

    Augmented Reality & new media

  • 15

    Discrete Artificial Vision

    Given: Parameters from a graph

    A neighborhood System Discrete label set

    Assign labels (to objects) that minimize the energy:

    edges objects

    pairwise potential unary potential

    MRF optimization ubiquitous in vision (and beyond)

  • 16

    Optimization of high-order models

    Hypergraph

    Parameters

    Hyperedges/cliques

    High-order energy minimization problem

    high-order potential (one per clique)

    unary potential (one per node)

    hyperedges

    parameters

  • 17

    Low Level Vision

    Blind Image Deconvolution / Image Inpainting

    Variables: Pixels

    Labels: Intensity Values

    Graph Connectivity: Pair-wise

  • 18

    Blind Image Deconvolution

  • 19

    =

    Blurred image generation process

  • 20

    =

    Blurred image generation process

  • 21

    =

    Blurred image generation process

  • 22

    =

    Blurred image generation process

    blur kernel = camera motion

  • 23

    Blind Image deconvolution

    observed blurred image

    latent sharp image

    blur kernel

    noise

    Goal: given just I compute both x and k

  • 24

    = quantized version of image x with just 15 colors (piecewise constant)

    Yet both x and produce almost same blurry image

    IDEA: compute , which has much simpler structure

    High-level idea: how to reduce ill-posedness?

  • 25

    MRF-based Blind Image deconvolution

  • 26

    MRF-based Blind Image deconvolution

  • 27

    MRF-based Blind Image deconvolution

  • 28

    MRF-based Blind Image deconvolution

  • 29

    MRF-based Blind Image deconvolution

  • 30

    MRF-based Blind Image deconvolution

  • 31

    MRF-based Blind Image deconvolution

  • 32

    MRF-based Blind Image deconvolution

  • 33 33

    The Image Completion Problem Based only on the observed part of an incomplete image, fill its

    missing part in a visually plausible way

    We want to be able to handle: complex natural images with (possibly) large missing regions in an automatic way (i.e. without user intervention)

    Many applications: photo editing, film post-production, object removal, text removal, image repairing etc.

  • 34 34

    The Image Completion Problem We would also like our method to be able to handle the

    related problem of texture synthesis

    In texture synthesis, we are given as input a small texture and we want to generate a larger texture of arbitrary size (specified by the user)

  • 35 35

    Exemplar-based approaches Key idea: fill missing region by copying exemplars i.e. pixels

    (or patches) from the observed image part

    Disadvantages:

    Successful if missing region consists of only one texture e.g. texture synthesis

    Greedy approach: image is filled one patch at a time

  • 36 36

    Image Completion as a Discrete Global Optimization Problem

    Labels L = all wxh patches from source region S

    MRF nodes = all lattice points whose neighborhood intersects target region T

    potential = how well source patch xp agrees with source region around p

    potential = how well source patches xp, xq agree on their overlapping region

    S

    T

    sample labels

  • 37 37

  • 38 38

  • 39 39

  • 40 40

  • 41 41

  • 42 42

    Mid-Level Vision

    Object Segmentation / Optical Flow Estimation / Deformable Fusion / Graph Matching

    Variables: control points

    Labels: 2d/3d

    Graph Connectivity: Pair-wise / higher order

  • 43

    Pose Invariant Segmentation of the Heart

    Challenges

    - Human variability - Complex background

    - Low contrast

    - Noise

    Goal - Automatic

    - Robust

    - Pose-invariant !

    Fig. Manual segmentation on 3D CT images

    Fig. Human variability

    B. XIANG et al, 3D Cardiac Segmentation with Pose-invariant

    Higher-order MRFs, ISBI 2012

  • 44

    Shape representation

    Point distribution model

    Point distribution model

    1{ , , }nX x x

    Y X

    {( , , )}i j kT x x x

    S T

    Third-order cliques

    Triangulated mesh

  • 45

    Statistical shape prior

    Local constraints

    Global shape

    jx

    kx

    ix

    1( ) ( , )c

    c C

    P X PZ

    ( , , ) ( , )i j kP

  • 46

    Statistical shape prior

    Local constraints

    Global shape

    Pose-invariant (i.e. translation, rotation, scale) !

    jx

    kx

    ix

    1( ) ( , )c

    c C

    P X PZ

    ( , , ) ( , )i j kP

  • 47

    Qualitative Results

    Fig. Segmentation results of 3D CT volumes

    Accurate boundaries with low contrast images

  • 48

    Dense Image Registration using MRFs

  • 49

    Basic Idea of Intensity-based Registration

    Image registration as an optimization problem

    Target and source Image:

    Transformation:

    Image metric:

  • 50

    Dimensionality Reduction

    Linear combination of control points

    e.g. Free-Form Deformations (Sederberg et al. 1986; Rueckert et al. 1999)

  • 51

    (Weighted) Block Matching

    Redefinition of data term w.r.t. control lattice

    Pixel-wise image metrics weighted by normalized basis functions

    image points closer to a control point gain more influence on its matching energy

    Statistical image metrics (e.g. mutual information, cross correlation)

    evaluation of image metric in local patches centered at the control points

    block size depends on control lattice resolution

  • 52

    Discrete Labeling Problem

    Markov Random Field formulation with pairwise interactions

    Unary potentials (matching):

    Pairwise potentials (smoothness):

    p q r

    s t u

    v w x

    p q r

    s t u

    v w x

    Nodes

    Edges

  • 53

    Some Results

    Affine 12-DOF Rueckert Ours

    Average Surface Distance 1.66 mm 1.14 mm 1.00 mm

    Average Surface Distance 1.92 mm 1.31 mm 1.06 mm

  • 54

    Experimental Validation Data Set [CMA GMH Harvard]

  • 55

    Experimental Validation Qualitative Results

  • 56

    Experimental Validation Qualitative Results

  • 57

    HIGHER-ORDER NON-RIGID 3D SURFACE MATCHING

  • 58

    High-order Graph Matching

    Graph 1 Graph 2

  • 59

    High-order Graph Matching

    Graph 1 Graph 2

  • 60

    High-order Graph Matching

    Graph 1 Graph 2

  • 61

    High-order Graph Matching a

    b

    c

    Graph 1 Graph 2

  • 62

    Experimental Results

  • 63

    Experimental Results

  • 64

    High-level Vision

    View-Point invariant 2.5D-3D/ Large-scale Parsing with shape grammars

    Variables: control points

    Labels: 2d/3d displacements

    Graph Connectivity: Pair-wise / higher order

  • 65

    Goals

    Segmentation Tracking Depth ordering (2.5D)

    2D image plane 3D Scene

    Occlusion Relationship

  • 66

    An illustrative example:

    Joint 2.5D Layered Modeling

    Object-level representation Pixel-level representation

    2D parametric shape model

    1 = 0

    2 = 1

    0 = 2

    Relative depth An object i can occlude object j only if < Background has the biggest depth

    Label : the associated layers index Depth : the associated layers depth

    Object 0 (Background )

    Object 2

    Object 1

    = 1 = 0

  • 67

    What do we infer?

    Joint 2.5D Layered Modeling

    Object 0 (Background ) Object 2 Object 1

    0 = const =?; = ? =?; = ?

    Object-level representation

    Pixel-level representation

    = ? ; =?

    What is the connection between

    the two representations?

  • 68

    Markov Random Field Formulation (1) An example for two tracked objects

  • 69

    Markov Random Field Formulation (1) An example for two tracked objects

    = ( , ) (shape parameters, depth)

  • 70

    Markov Random Field Formulation (1) An example for two tracked objects

    = , (label, depth)

    = ( , ) (shape parameters, depth)

  • 71

    Experimental Results (1)

    Video from: Huang & Essa. Tracking Multiple Objects through Occlusions. CVPR05

  • 72

    Goals

    2D image plane 3D Scene

    Shape matching Statistical shape modeling Knowledge-based 3D segmentation

    Segmentation Tracking Depth ordering (2.5D)

    3D landmark model inference from 2D images (2D-3D)

  • 73

    Illustration of projection prior:

    (2)|(3), exp ,

    Error function defined on a quadruplet of point:

    , = , \t, \t

    , = ,

  • 74

    Qualitative Results for 3D Model Inference

    Setting

    Dataset: 101 samples from BU-4DFE [Yin et al. 2008]

    Leave-one-out cross validation

    Blue: ground truth; Red: result

    Experimental Results

  • 75

    Image-based Modeling of Architecture using Shape Grammars

  • 76

    Procedural Modeling of Facades Start from an axiom (Image)

    Sequentially apply replacement rules

    The derivation tree keeps track of the building structure

    76

  • 77

    Procedural Modeling of Facades Start from an axiom (Image)

    Sequentially apply replacement rules

    The derivation tree keeps track of the building structure

    77

  • 78

    Procedural Modeling of Facades Start from an axiom (Image)

    Sequentially apply replacement rules

    The derivation tree keeps track of the building structure

    78

  • 79

    Procedural Modeling of Facades Start from an axiom (Image)

    Sequentially apply replacement rules

    The derivation tree keeps track of the building structure

    79

  • 80

    Procedural Modeling of Facades

    80

    Start from an axiom (Image)

    Sequentially apply replacement rules

    The derivation tree keeps track of the building structure

    To be optimized: topology & geometry

  • 81

    What about real buildings ?

    81

  • 82

    A small district

    82

  • 83 83

  • 84

    Segmentation energy

    Single Pixel x

    Single Region R

    R

    Rx Rx

    xxR fcpcc )(log)()(

    )(log)( xx fcpc

  • 85

    Segmentation energy

    Single Pixel x

    Single Region R

    Segmentation

    i Rx

    xi

    i

    iR

    i

    ifcpcE )(log)()(

    )(log)( xx fcpc

    Rx Rx

    xxR fcpcc )(log)()(

  • 86

    Architecturally consistent Robust to illumination conditions, hard cast shadow, reflections

    Qualitative Results

    86

  • 87

    Qualitative Results

  • 88

    Multi-view parsing using genetic algorithm

    single rectified image calibrated image sequence

  • 89

    Qualitative Results

  • 90

    Discrete Artificial Vision

    Given: Parameters from a graph

    A neighborhood System Discrete label set

    Assign labels (to objects) that minimize the energy:

    edges objects

    pairwise potential unary potential

    MRF optimization ubiquitous in vision (and beyond)

  • 91

    Optimization of high-order models

    Hypergraph

    Parameters

    Hyperedges/cliques

    High-order energy minimization problem

    high-order potential (one per clique)

    unary potential (one per node)

    hyperedges

    parameters

  • 92

    Conclusions

    Discrete Graphical Models, is a promising answer to artificial vision

    Curse of Dimensionality : Prior Knowledge either through anatomy of machine learning techniques towards dimensionality reduction

    Curse of Non-linearity: Model Decomposition / Data association allows direct support estimation of parameter selection from the images

    Curse of Non-Convexity: Regularization terms / dropping out of constraints can improve the optimality properties of the obtained solution

    Curse of Non-Modularity: Model/Data Association/Inference Decomposition and use of gradient free methods