Computer Vision Group University of California Berkeley 1 Scale-Invariant Random Fields for Mid-level Vision Xiaofeng Ren, Charless Fowlkes and Jitendra.

Computer Vision GroupUniversity of California Berkeley

1

Scale-Invariant Random Fieldsfor

Mid-level Vision

Xiaofeng Ren, Charless Fowlkes and Jitendra Malik

University of California at Berkeley


2

Overview• Scale Invariance in Natural Images• A Scale-Invariant Representation from bottom-up

using Constrained Delaunay Triangulation (CDT)• Scale-Invariant Models for Contour Completion

– A baseline local model

– A global model using Conditional Random Fields (CRF)

– Results and Quantitative Evaluations

• Extensions of the CDT/CRF Framework– Image segmentation

– Figure/ground organization

– Finding people in a single image


3










4

Scale Invariance• An intuitive understanding

• A mathematical characterization

… …

or ?

or ?

X1

X2

FX1( · ) = F( · ; s1)

FX2( · ) = F( · ; s2)


5

Sources of Scale Invariance• Arbitrary viewing distance

• Hierarchy of Parts

Finger

LegTorso


6

Scale Invariance in Natural Images• Scale Space Theory and Gaussian Pyramids

– Witkin, Lindenberg, Koenderink, …

• Empirical Studies: Power Laws in Natural Images– Power spectra (Ruderman, 1994)

– Wavelet coefficients (e.g. Simoncelli)

– Probabilistic Models (e.g. Mumford)

– Power laws in the statistics

of boundary contours

[Ren and Malik 2002]


7

How to Handle Scale Invariance?• Top-down Search

• Bottom-up Scale Selection– Scale space (e.g. zero crossing of the Laplacian)

– Lowe’s SIFT operator

– Our boundary-oriented approach based on piecewise approximation of edges and constrained Delaunay triangulation (CDT)

… 1 2 3 4 5 6 …


8










9

Low-level Contour Detection• Use Pb (probability of boundary) as input

– Combines local brightness, texture and color cues

– Trained from human-marked segmentation boundaries

– Outperforms existing local boundary detectors including Canny

• Canny’s hysteresis thresholding

• Use Pb (probability of boundary) as input – Combines local brightness, texture and color cues

– Trained from human-marked segmentation boundaries

– Outperforms existing local boundary detectors including Canny

• Canny’s hysteresis thresholding

[Martin, Fowlkes & Malik 02]


10

Piecewise Linear Approximation• Recursively split the boundaries until each piece is

approximately straight• Recursively split the boundaries until each piece is

approximately straight


11

Delaunay Triangulation

• Standard in computational geometry

• Dual of the Voronoi Diagram

• Unique triangulation that maximizes the minimum angle– avoiding long skinny triangles

• Efficient and simple randomized algorithm

• Standard in computational geometry

• Dual of the Voronoi Diagram

• Unique triangulation that maximizes the minimum angle– avoiding long skinny triangles

• Efficient and simple randomized algorithm


12

Constrained Delaunay Triangulation

• A variant of the standard Delaunay Triangulation

• Keeps a given set of edges in the triangulation

• A variant of the standard Delaunay Triangulation

• Keeps a given set of edges in the triangulation


13

“Gap-Filling” Property of CDT• A typical scenario of contour completion• A typical scenario of contour completion

low contrast

high contrasthigh contrast

• CDT picks the “right” edge, completing the gap• CDT picks the “right” edge, completing the gap


14


15

The CDT Graph

scale-invariant

fast to compute

<1000 edges

completes gaps

little loss of

structure


16

Berkeley Segmentation Dataset

[Martin et al, ICCV 2001]1,000 images, >14,000 segmentations


17

Evaluating Boundary Operators• Precision-Recall Curves

– threshold the output boundary map

– bipartite matching with the groundtruth

• Precision-Recall Curves– threshold the output boundary map

– bipartite matching with the groundtruth

m pixels on human-marked boundaries

n detected pixels above a given threshold

k matched pairs

Precision = k/n, percentage of true positives

Recall = k/m, percentage of groundtruth being detected

• Project CDT edges back to the pixel-grid• Project CDT edges back to the pixel-grid


18

No Loss of Structure

Use Phuman the soft groundtruthlabel defined on CDT graphs:precision close to 100%

Pb averaged over CDT edges: no worse than the orignal Pb

Increase in asymptotic recall rate: completion of gradientless contours


19

CDT vs. K-Neighbor Completion

An alternative scheme for completion: connect to k-nearest neighbor vertices, subject to visibility

CDT achieves higher asymptotic recall rates


20

The CDT Graph

scale-invariant

fast to compute

<1000 edges

completes gaps

little loss of

structure


21

“Superpixels”• Idea: use oversegmentation as preprocessing and

work with homogenous regions instead of pixels.

• Empirical Validation [Ren and Malik 2003]

How much do we lose when we go from pixels to superpixels?


22

So the CDT graph is nice to work with…

We can develop sophisticated models

on top of the CDT graph, e.g.

for Mid-level vision


23

Mid-level Vision• What is Mid-level Vision?

– It’s not low-level vision (e.g. edge detection)

– It’s not high-level vision (e.g. object recognition)

region grouping/image segmentation

contour grouping/visual completion

figure/groundorganization

• Key Problems in Mid-level Vision


24

Mid-level Vision• What is Mid-level Vision?

– It’s not low-level vision (e.g. edge detection)

– It’s not high-level vision (e.g. object recognition)

region grouping/image segmentation

contour grouping/visual completion

figure/groundorganization

• Key Problems in Mid-level Vision


25

Boundary Detection• Edge detection: 20 years after Canny• Edge detection: 20 years after Canny

• Pb (Probability of Boundary): learning to combine brightness, color and texture contrasts

• Pb (Probability of Boundary): learning to combine brightness, color and texture contrasts

• There is psychophysical evidence that we might have been approaching the limit of local edge detection

• There is psychophysical evidence that we might have been approaching the limit of local edge detection


26

Curvilinear Continuity• Boundaries are smooth in nature

• A number of associated phenomena– Good continuation

– Visual completion

– Illusory contours

• Well studied in human vision– Wertheimer, Kanizsa, von der Heydt, Kellman,

Field, Geisler, …

• Extensively explored in computer vision– Shashua, Zucker, Mumford, Williams, Jacobs,

Elder, Jermyn, Wang, …

• Is the net effect of completion positive? Or negative? Lack of quantitative evaluation

• Boundaries are smooth in nature

• A number of associated phenomena– Good continuation

– Visual completion

– Illusory contours

• Well studied in human vision– Wertheimer, Kanizsa, von der Heydt, Kellman,

Field, Geisler, …

• Extensively explored in computer vision– Shashua, Zucker, Mumford, Williams, Jacobs,

Elder, Jermyn, Wang, …

• Is the net effect of completion positive? Or negative? Lack of quantitative evaluation


27










28

Inference on the CDT Graph

Xe

Xe

Xe

Xe

Xe

Xe

Xe

Xe

Xe

XeXeXe

Xe

Xe

Xe

Xe

Xe

Xe

Local inference:

XeXe

Global inference:


29

Baseline Local Model

“Bi-gram” model:

contrast + continuity

binary classification (0,0) vs (1,1)

logistic classifier

“Tri-gram” model:

1 2

L LPbL

=

Xe


30

For each edge i, define a set of features{g1,g2,…,gh}

Potential function exp(i) at edge i

hhi ggg 2211expexp

For each junction j, define a set of features{f1,f2,…,fk}

Potential function exp(j) at juncion j

kkj fff 2211expexp

X={X1,X2,…,Xm}

[Pietra, Pietra & Lafferty 97][Lafferty, McCallum & Pereira 01]

Conditional Random Fields


31

X={X1,X2,…,Xm}

Conditional Random Fields

Potential function on edges {exp(i)}

Potential function on junctions {exp(j)}

This defines a probability distribution over X:

Z

XX

XPj

ji

i exp

X j

ji

i XXZ expwhere

Estimate P(Xi|) an undirected graphical model with potential functions in the exponential family


32

Building a CRF Model

• What are the features?– edge features: Pb, G

– junction features:

• Junction type

• Continuity

• How to make inference?– Loopy Belief Propagation

• How to learn the parameters?– Gradient Descent on Max. Likelihood

• What are the features?– edge features: Pb, G

– junction features:

• Junction type

• Continuity

• How to make inference?– Loopy Belief Propagation

• How to learn the parameters?– Gradient Descent on Max. Likelihood

X={X1,X2,…,Xm}

Estimate P(Xi|)


33

Junctions and Continuity• Junction types (degg,degc):• Junction types (degg,degc):

baXf cgba degdeg),(

degg=1,degc=0 degg=0,degc=2 degg=1,degc=2

• Continuity term for degree-2 junctions• Continuity term for degree-2 junctions

degg+degc=2

),(),( exp baba f

2degdeg exp cgg

degg=0,degc=0


34

Inference w/ Belief Propagation

• Loopy Belief Propagation

– just like belief propagation

– iterates message passing until convergence

– lack of theoretical foundations and known to have convergence issues

– however becoming popular in practice

– typically applied on pixel-grid

• Works well on CDT graphs– converges fast (<10 iterations)

– produces empirically sound results

• Loopy Belief Propagation

– just like belief propagation

– iterates message passing until convergence

– lack of theoretical foundations and known to have convergence issues

– however becoming popular in practice

– typically applied on pixel-grid

• Works well on CDT graphs– converges fast (<10 iterations)

– produces empirically sound results


35

Maximum Likelihood Learning

• Maximum-likelihood estimation in CRFLet denote the groundtruth labeling on the CDT graph

• Maximum-likelihood estimation in CRFLet denote the groundtruth labeling on the CDT graph

XXq

qtt

ti

iss

s XfXgZ

XL~factor edge

exp1~

Xtqqt

t ZFfXL exp

1~~log

factor

• Many possible optimization techniques– gradient descent, iterative scaling, conjugate gradient, …

• Gradient descent works well

• Many possible optimization techniques– gradient descent, iterative scaling, conjugate gradient, …

• Gradient descent works well

X~

)|()(~

XPtXPt ff


36

Interpreting the Parameters

=2.46 =0.87 =1.14 =0.01

=-0.59

=-0.98

Line endings and junctions are rare

Completed edges are weak


37

Quantitative Evaluation

• Baseball player dataset [Mori et al 04]

– 30 news photos of baseball players in various poses, 15 training and 15 testing

• Horse dataset [Borenstein & Ullman 02]

– 350 images of standing horses facing left, 175 training and 175 testing

• Berkeley Segmentation Dataset [Martin et al 01]

– 300 Corel images of various natural scenes and ~2500 segmentations, 200 training and 100 testing

• Baseball player dataset [Mori et al 04]

– 30 news photos of baseball players in various poses, 15 training and 15 testing

• Horse dataset [Borenstein & Ullman 02]

– 350 images of standing horses facing left, 175 training and 175 testing

• Berkeley Segmentation Dataset [Martin et al 01]

– 300 Corel images of various natural scenes and ~2500 segmentations, 200 training and 100 testing


38

Continuity improves boundary detection in both low-recall and high-recall ranges

Global inference helps; mostly in low-recall/high-precision

Roughly speaking,

CRF>Local>CDT only>Pb


39


40


41

Image Pb Local Global


42



43



44

Conclusion I• Constrained Delaunay Triangulation is a scale-invariant

discretization of images with little loss of structure;• Constrained Delaunay Triangulation is a scale-invariant

discretization of images with little loss of structure;

• Moving from 100,000 pixels to <1000 edges, CDT achieves great statistical and computational efficiency;

• Moving from 100,000 pixels to <1000 edges, CDT achieves great statistical and computational efficiency;

• Curvilinear Continuity improves boundary detection;

– the local model of continuity is simple yet very effective

– global inference of continuity further improves performance

– Conditional Random Fields w/ loopy belief propagation works well on CDT graphs

• Curvilinear Continuity improves boundary detection;

– the local model of continuity is simple yet very effective

– global inference of continuity further improves performance

– Conditional Random Fields w/ loopy belief propagation works well on CDT graphs

• Mid-level vision is useful.• Mid-level vision is useful.


45










46










47

Summary

CRF

Conditional Random Fields on triangulated images, trained to integrate low/mid/high-level grouping cues


48

Inference on the CDT Graph

Xe

Xe

Xe

Xe

Xe

Xe

Xe

Xe

Xe

XeXeXe

Xe

Xe

Xe

Xe

Xe

Xe

Yt

Yt

Yt

Yt

Yt

Yt

Yt

Yt

YtYt

Z

Contour variables {Xe}

Region variables {Yt}

Object variable {Z}

Integrating {Xe},{Yt} and{Z}: low/mid/high-level cues

Xe

Xe

Xe

Xe

Xe

Xe

Xe

Xe

Xe

XeXeXe

Xe

Xe

Xe

Xe

Xe

Xe

Yt

Yt

Yt

Yt

Yt

Yt

Yt

Yt

YtYt

Z


49

Grouping Cues• Low-level Cues

– Edge energy along edge e

– Brightness/texture similarity between two regions s and t

• Mid-level Cues– Edge collinearity and junction frequency at

vertex V

– Consistency between edge e and two adjoining regions s and t

• High-level Cues– Texture similarity of region t to exemplars

– Compatibility of region support with pose

– Compatibility of local edge shape with pose

• Low-level Cues– Edge energy along edge e

– Brightness/texture similarity between two regions s and t

• Mid-level Cues– Edge collinearity and junction frequency at

vertex V

– Consistency between edge e and two adjoining regions s and t

• High-level Cues– Texture similarity of region t to exemplars

– Compatibility of region support with pose

– Compatibility of local edge shape with pose

L1(Xe|I)L2(Ys,Yt|I)

M1(XV|I)

M2(Xe,Ys,Yt)

H1(Yt|I)H2(Yt,Z|I)H3(Xe,Z|I)


50

Conditional Random Fields for Cue Integration

,,|,exp),(

1,,, IZYXE

IZIZYXP

ts

tse

e IYYLIXLE,

21 |,|

ts

etsV

V XYYMIXM,

21 ,,|

e

et

tt

t IZXHIZYHIYH |,|,| 321

Estimate the marginal posteriors of X, Y and Z


51

Encoding Object Knowledge

(Region-based) Support Mask

Yt

Z

(Edge-based) Shapemes

Xe

Z

Xe


52

H3(Xe,Z|I): local shape and pose

distribution ON(x,y,i)

shapeme j(vertical pairs) distribution ON(x,y,j)

Let S(x,y) be the shapeme at image location (x,y); (xo,yo) be the object location in Z. Compute average log likelihood SON(e,Z) as:

eyx

oo yxSyyxxONe ,

),(,,log1

eOFFOFF

eONONe

XZeS

XZeSIZXH

),(

),(|,3

Then we have:

SOFF(e,Z) is defined similarly.

shapeme i(horizontal line)


53


54


55

Results

Input Input Pb Output Contour Output Figure


56



57



58

Conclusion II• Constrained Delaunay Triangulation provides a scale-

invariant discrete structure which enables efficient probabilistic inference.

• Conditional random fields combine joint contour and region grouping and can be efficiently trained.

• Mid-level cues are useful for figure/ground segmentation, even when object-specific cues are present.

• Constrained Delaunay Triangulation provides a scale-invariant discrete structure which enables efficient probabilistic inference.

• Conditional random fields combine joint contour and region grouping and can be efficiently trained.

• Mid-level cues are useful for figure/ground segmentation, even when object-specific cues are present.


59










60

Figure/Ground Organization

• A classical problem in Gestalt psychology– [Rubin 1921]

• “Perceptual organization after grouping”

• Gestalt principles for figure/ground– surroundedness, size, convexity, parallelism, symmetry,

lower-region, common fate, familiar configuration, …

• Very few computational studies[Hinton 86], [von der Heydt 93], …

• A classical problem in Gestalt psychology– [Rubin 1921]

• “Perceptual organization after grouping”

• Gestalt principles for figure/ground– surroundedness, size, convexity, parallelism, symmetry,

lower-region, common fate, familiar configuration, …

• Very few computational studies[Hinton 86], [von der Heydt 93], …


61

Shapemes: Prototypical Shapes

local figure/ground cues: size/convexity, parallelism, …


62

Reasoning about T-junction

F

G

F

F

GG

common

F

G

F

G

GF

uncommon


63

CRF for Figure/Ground

F={F1,F2,…,Fm}

Fi{Left,Right}

• One feature for each junction type

F

G

F

F

GG

F

G

F

G

G F

FG

F

G


64

Results on Natural Images

• Chance error rate

• Baseline size/convexity

• Local model w/ shapemes

• Using human segmentations:– Averaging local cues on human-marked

boundaries

– CRF w/ junction type and continuity

• Using low-level edges– Averaging local cues


• Dataset inconsistency

• Chance error rate

• Baseline size/convexity

• Local model w/ shapemes

• Using human segmentations:– Averaging local cues on human-marked

boundaries


• Using low-level edges– Averaging local cues


• Dataset inconsistency

50%

44%

36%

29%

22%

34%

32%

12%

50%

44%

36%

29%

22%

34%

32%

12%


65


66


67










68

Finding People

The Challenges: pose, clothing, lighting, , clutter, …

The Setting: single image, no pose restriction

Approaches:• Top-down holistic template matching• Assembly of 2D parts from bottom-up

The Assembly Problem:• Dynamic programming on tree-model• Brute-force search / pruning• Sampling


69

Our Pipeline• Preprocessing with Constrained Delaunay Triangulation

• Detecting Candidate Parts using Parallelism

• Learning Pairwise Constraints between Parts

• Assembling Parts by Integer Quadratic Programming (IQP)

• Preprocessing with Constrained Delaunay Triangulation

• Detecting Candidate Parts using Parallelism

• Learning Pairwise Constraints between Parts

• Assembling Parts by Integer Quadratic Programming (IQP)


70

Detecting Parts with Parallelism• Candidate parts as parallel

line segments (Ebenbreite) • Automatic scale selection

from bottom-up• Feature combination with a

logistic classifier

• Candidate parts as parallel line segments (Ebenbreite)

• Automatic scale selection from bottom-up

• Feature combination with a logistic classifier


71

Assembly as Assignment

Candidates {Ci} Parts {Lj}

(Lj1,Ci1=(Lj1))

(Lj2,Ci2=(Lj2))

Cost for a partial assignment {(Lj1,Ci1), (Lj2,Ci2)}:

assignment

2

)22)(11( 2,1

2,1)22(),11(

kijij jj

k

jjk

ijijkf

H


72


73


74


75

Conclusion (again)• Scale-invariance is a key issue in vision and we develop

a scale-invariant representation using Constrained Delaunay Triangulation (CDT).

• The CDT graph is a superpixel representation with little loss of structure and enables efficient training and testing of probabilistic models.

• On top of the CDT graph we use Conditional Random Fields (CRF) to solve mid-level vision problems including contour completion, image segmentation and figure/ground organization.

• By quantitative evaluation on large datasets of natural images, we show that our CRF models significantly improve performance.

• Scale-invariance is a key issue in vision and we develop a scale-invariant representation using Constrained Delaunay Triangulation (CDT).

• The CDT graph is a superpixel representation with little loss of structure and enables efficient training and testing of probabilistic models.

• On top of the CDT graph we use Conditional Random Fields (CRF) to solve mid-level vision problems including contour completion, image segmentation and figure/ground organization.

• By quantitative evaluation on large datasets of natural images, we show that our CRF models significantly improve performance.


76

Thank You


77


78


79


80

Contour Geometry• First-Order Markov Model

– [Mumford 94, Williams & Jacobs 95]

– Curvature: white noise ( independent from position to position )

– Tangent t(s): random walk

– Markov assumption: the tangent at the next position, t(s+1), only depends on the current tangent t(s)

• First-Order Markov Model

– [Mumford 94, Williams & Jacobs 95]

– Curvature: white noise ( independent from position to position )

– Tangent t(s): random walk

– Markov assumption: the tangent at the next position, t(s+1), only depends on the current tangent t(s)

t(s)

t(s+1)

s

s+1


81

Contours are Smooth

P( t(s+1) | t(s) )

marginal distribution of tangent change

t(s)

t(s+1)

s

s+1


82

Testing the Markov Assumption

Segment the contours at high-curvature positions


83

Prediction: Exponential

• If the first-order Markov assumption holds…• At every step, there is a constant probability p that a high curvature

event will occur

• High curvature events are independent from step to step

– Let L be the length of a segment between high-curvature points

– P( L>=k ) = (1-p)k

– P( L=k ) = p(1-p)k

– L has an exponential distribution

• If the first-order Markov assumption holds…• At every step, there is a constant probability p that a high curvature

event will occur

• High curvature events are independent from step to step

– Let L be the length of a segment between high-curvature points

– P( L>=k ) = (1-p)k

– P( L=k ) = p(1-p)k

– L has an exponential distribution


84

Empirical: Power Law

Contour segment length L

Pro

babi

lit

y62.1)length(

1.Prob


85

Scale Invariance of CDT

Computer Vision Group University of California Berkeley 1 Scale-Invariant Random Fields for Mid-level Vision Xiaofeng Ren, Charless Fowlkes and Jitendra.

Documents

berkeley slide

scale selection scale

overview scale invariance

scaleinvariant representation

scaleinvariant random

single image slide

baseline local model

global model