Top Banner
Carsten Rother Microsoft Research Cambridge
84

Carsten Rother Microsoft Research Cambridge

Feb 13, 2017

Download

Documents

vankien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Carsten Rother Microsoft Research Cambridge

Carsten Rother Microsoft Research Cambridge

Page 2: Carsten Rother Microsoft Research Cambridge

~140 employees (~100 Researchers, ~30 RSDEs, ~10 Admin)

Six different groups:

Computer-Mediated Living

Machine Learning & Perception

Cambridge Innovation Development

Computational Science

Programming Principles & Tools

Systems & Networking

Page 3: Carsten Rother Microsoft Research Cambridge

• Computer Vision group: medical vision, recognition, reconstruction, image editing, …

• Machine learning group: Infer.Net, Online Services and Advertisement, Xbox Ranking

• Constrained Reasoning group: Planning and Optimization

• Socio-Digital Systems: Understanding human needs for future technology

• Sensors and Devices:

SenseCam, Gadeteer, …

• Interactive 3D Technologies group

Page 4: Carsten Rother Microsoft Research Cambridge

Machine Learning

Hardware design

Human studies

I3D mission: new user experiences

Graphics

Computer Vision

Intersection workshop (Mai 2012, Cambridge): http://research.microsoft.com/en-us/events/intersection12/

Page 5: Carsten Rother Microsoft Research Cambridge
Page 6: Carsten Rother Microsoft Research Cambridge
Page 7: Carsten Rother Microsoft Research Cambridge

• All factors in the graph are trees • Discriminatively training of millions of Parameters • We can handle many loss-function

Decision/Regression Trees Random Fields

+

Page 8: Carsten Rother Microsoft Research Cambridge

Discrete labelling tasks:

Noisy input Ours [Zoran, Weiss, ICCV ‘11]

Continuous labelling tasks:

Test input Ground Truth

Trees Trees & Field

Page 9: Carsten Rother Microsoft Research Cambridge

• PatchMatch stereo, BMVC ’11 PatchMatchBP stereo, BMVC ‘12

• SurfaceStereo, ObjectStereo, CVPR ‘10,’11 – a review • SceneStereo, ECCV ‘12

• Learning interactive image segmentation, IJCV ‘12

Page 10: Carsten Rother Microsoft Research Cambridge

• PatchMatch stereo, BMVC ’11 PatchMatchBP stereo, BMVC ‘12

• SurfaceStereo, ObjectStereo, CVPR ‘10,’11 – a review • SceneStereo, ECCV ‘12

• Learning interactive image segmentation, IJCV ‘12

Page 11: Carsten Rother Microsoft Research Cambridge
Page 12: Carsten Rother Microsoft Research Cambridge

Left view Right view

Page 13: Carsten Rother Microsoft Research Cambridge

Depth map

Page 14: Carsten Rother Microsoft Research Cambridge

Left view Right view

Page 15: Carsten Rother Microsoft Research Cambridge

Local stereo matching: rectangular region (patch) check photo-consistency

Page 16: Carsten Rother Microsoft Research Cambridge

Local stereo matching: rectangular region (patch) check photo-consistency

Page 17: Carsten Rother Microsoft Research Cambridge

Fails at discontinuities

Fails at non-fronto-parallel planes

No continuous depth label

Slow

Page 18: Carsten Rother Microsoft Research Cambridge
Page 19: Carsten Rother Microsoft Research Cambridge
Page 20: Carsten Rother Microsoft Research Cambridge

Adaptive support weights [Yoon, CVPR ‘05]

Page 21: Carsten Rother Microsoft Research Cambridge

Fails at discontinuities

Fails at non-fronto-parallel planes

No continuous depth label

Slow

Page 22: Carsten Rother Microsoft Research Cambridge
Page 23: Carsten Rother Microsoft Research Cambridge

3 continuous parameters (depth + normal) for each pixel

Page 24: Carsten Rother Microsoft Research Cambridge
Page 25: Carsten Rother Microsoft Research Cambridge

Fails at discontinuities

Fails at non-fronto-parallel planes

No continuous depth label

Slow

Page 26: Carsten Rother Microsoft Research Cambridge
Page 27: Carsten Rother Microsoft Research Cambridge

Depth map

Page 28: Carsten Rother Microsoft Research Cambridge

Depth map

Page 29: Carsten Rother Microsoft Research Cambridge

Depth map

Page 30: Carsten Rother Microsoft Research Cambridge

Red Pixel means in the 4-neighborhood is a better solution

Page 31: Carsten Rother Microsoft Research Cambridge

Red Pixel means in the 4-neighborhood is a better solution

Page 32: Carsten Rother Microsoft Research Cambridge

Red Pixel means in the 4-neighborhood is a better solution

Page 33: Carsten Rother Microsoft Research Cambridge

Red Pixel means in the 4-neighborhood is a better solution

Page 34: Carsten Rother Microsoft Research Cambridge

Red Pixel means in the 4-neighborhood is a better solution

Page 35: Carsten Rother Microsoft Research Cambridge

Red Pixel means in the 4-neighborhood is a better solution

Page 36: Carsten Rother Microsoft Research Cambridge

1. Random initialization 2. Go through pixel in sequential order: 2a. consider solution from left/top neighbour 2b. sample around current solution 0 1

Left image –

Reindeer

(Middlebury) Left and right disparity maps (intermediate step of iteration 1)

Page 37: Carsten Rother Microsoft Research Cambridge
Page 38: Carsten Rother Microsoft Research Cambridge

Left image – Sawtooth

(Middlebury)

Image consists of 3 planes -

~80.000 guesses for yellow plane Ground truth disparities

Randomization is in our favour

No cost volume needed: well suited for large images and large depth range

Page 39: Carsten Rother Microsoft Research Cambridge

Left view Right view

Page 40: Carsten Rother Microsoft Research Cambridge

PatchMatch Stereo result

Page 41: Carsten Rother Microsoft Research Cambridge

Unary term (photo-consistency)

Pairwise term (local curvature)

Add a Markov Random Field:

Continuous 3-dimension

Page 42: Carsten Rother Microsoft Research Cambridge

Cost ≠ 0: local curvature or discontinuity

Cost = 0 both planes are aligned in 3D

Page 43: Carsten Rother Microsoft Research Cambridge

So far, we have been running with λ = 0

For non-zero λ, with super high-dimensional u:

Gradient descent

Gradient descent + Fusion move

Relaxation + Gradient descent

Simulated Annealing

Continuos Belief Propagation

Page 44: Carsten Rother Microsoft Research Cambridge

M2->3

Operation 1: compute neg-log Belief

s

Operation 2: re-compute Message

t s

M1->2

Sequential schedule

M1->4

Final output: us* = argmin Bs(us) us

Page 45: Carsten Rother Microsoft Research Cambridge

target

Page 46: Carsten Rother Microsoft Research Cambridge

Source (shifted 4.0 + noise)

Ground Truth

Page 47: Carsten Rother Microsoft Research Cambridge

Error: 0.618; Unary only

Error: 0.251

Ground Truth

12x12 discrete labels

Page 48: Carsten Rother Microsoft Research Cambridge

target

Page 49: Carsten Rother Microsoft Research Cambridge

Source (shifted 4.2 + noise)

GT

Page 50: Carsten Rother Microsoft Research Cambridge

Error: 0.66

Error: 1.9; unary only GT

12x12 discrete labels

Page 51: Carsten Rother Microsoft Research Cambridge

Error: 5.68

Error: 3.46; unary only GT

12x12 discrete labels

Page 52: Carsten Rother Microsoft Research Cambridge

M2->3 M1->2

Sequential schedule

M1->4

0 1

Each pixel has different set of particles:

t

0 1

s

Comment: we do max-product, hence we may not want to approximate true continuous distributions

t

us

ut

Bs(us)

(neg. log Belief) Bt(ut)

Page 53: Carsten Rother Microsoft Research Cambridge

t

M2->3 M1->2

Sequential schedule

M1->4 s

0 1 0 1

0 1

= (us-ut)2

ut us

us

Page 54: Carsten Rother Microsoft Research Cambridge

M2->3 M1->2

Sequential schedule

M1->4

0 1

Sample around current particles

0 1

s

us us

Final output: us* = argmin Bs(us) us

Page 55: Carsten Rother Microsoft Research Cambridge

GT

Error: 5.68 discrete

Energy: 47308 Error: 0.9713

Random init

Energy: 42628

Error: 0.8259 Best unary init (144 discrete)

Page 56: Carsten Rother Microsoft Research Cambridge

t s

The message Mt->s has high values for s = t since smoothness term is (us-ut)2

PM idea: sample also at your neighbours solutions!

We call this variant of Particle BP PatchMatch BP (PMBP)

0 1 0 1

= (us-ut)2

ut us

Page 57: Carsten Rother Microsoft Research Cambridge

GT

Energy: 42628

Error: 0.8259 Best unary init

Random init Energy: 21959

Error: 0.4159 50 particles

Random init Energy: 22593 Error: 0.3864 1 particle

Page 58: Carsten Rother Microsoft Research Cambridge

1 particles

Energy: 22593

Error: 0.3864

Energy: 21959

Error: 0.4159

50 particles

Page 59: Carsten Rother Microsoft Research Cambridge
Page 60: Carsten Rother Microsoft Research Cambridge

PatchMatch is a special Form of Particle BP

λ = 0

1 particle per node

Sample from neighbour nodes

Page 61: Carsten Rother Microsoft Research Cambridge
Page 62: Carsten Rother Microsoft Research Cambridge

Iterate two steps (in a nutshell):

1) Run full BP until convergence (convex version which solves the LP relaxation)

2) Sample all nodes individually

Page 63: Carsten Rother Microsoft Research Cambridge

Highly ranked in Middlebury Table

Page 64: Carsten Rother Microsoft Research Cambridge
Page 65: Carsten Rother Microsoft Research Cambridge
Page 66: Carsten Rother Microsoft Research Cambridge

• PatchMatch stereo, BMVC ‘11 • PatchMatchBP stereo, BMVC ‘12

• SurfaceStereo, ObjectStereo, CVPR ‘10,’11 • SceneStereo, ECCV ‘12

• Learning interactive image segmentation, IJCV ‘12

Page 67: Carsten Rother Microsoft Research Cambridge

Ultimate Goal: Recover: geometry, light, material Recognise: object instances, attributes … and do that jointly

Theoretical Challenges: statistical models of the world and the captured images Combines statistical Priors and physical constraints Practical Challenges: Robustness Real-time inference Task-driven, e.g. Robotics

To achieve this: latest machine learning latest optimization techniques

Page 68: Carsten Rother Microsoft Research Cambridge

Assignment of pixels to surfaces

Simple explanation: describe the scene by a few low-degree surfaces (splines, planes) Goal: depth estimation improves

Without prior With prior

Page 69: Carsten Rother Microsoft Research Cambridge

Simple explanation: describe scene by a few Objects: - compact in 3D - Connected in 3D - each object has a color model Goal: depth estimation improves

Objects o

Depth d

Objects o

Page 70: Carsten Rother Microsoft Research Cambridge

Simple explanation: describe scene by a few Objects: - compact in 3D (use bbox) - each object has a color model - Physical constraints Goal: 1) depth estimation improves 2) improves object extraction

Page 71: Carsten Rother Microsoft Research Cambridge

1) Create proposal pool

2) Rank proposal pool

3) Combine best objects and recognize

Use stereo images

boat

sky

water

Goals: • Reason in 3D with

physical constraints

• Improve depth estimation

Page 72: Carsten Rother Microsoft Research Cambridge

Left input image

Object labelling proposal 1

Object labelling proposal 2

Output: - Object labelling - Depth labelling - Object 3D bounding boxes - Object colour distribution

Page 73: Carsten Rother Microsoft Research Cambridge

Stereo: photo-consistency

Objects:

colour model

Prior on number of objects

Left input image PatchMatch Stereo Result

Object mask

Depth map

Page 74: Carsten Rother Microsoft Research Cambridge

Physical properties:

Bounding Box tightness

Bounding Box intersection

Bounding Box Gravity

Page 75: Carsten Rother Microsoft Research Cambridge

Merging (simulated annealing, patchmatch)

Exploration (mean-shift, patchmatch)

Object maps

Multiple Scene Proposals by varying the prior on number of objects

Page 76: Carsten Rother Microsoft Research Cambridge

Good rank in Middlebury table

Green: this term is useful

All Terms are useful

Page 77: Carsten Rother Microsoft Research Cambridge

Images

Ground truth

Our labelling

2D

Ours

GT

Object stereo

2D

Object stereo

Ours

Page 78: Carsten Rother Microsoft Research Cambridge

Large Scale Train and Test

Real-time

Do full 3D reconstruction (KinectFusion)

Model all physical properties: Light, Material

Use graphics engine for train and test “analysis by synthesis”

Page 79: Carsten Rother Microsoft Research Cambridge

• PatchMatch stereo, BMVC ‘11 PatchMatchBP stereo, BMVC ‘12

• SurfaceStereo, ObjectStereo, CVPR ‘10,’11 SceneStereo, ECCV ‘12

• Learning interactive image segmentation, IJCV ‘12

Page 80: Carsten Rother Microsoft Research Cambridge

Weights w

Training Time

How much user input shall we use for learning?

predictions

Testing Time

prediction

prediction

Page 81: Carsten Rother Microsoft Research Cambridge

Static brush

Static trimap

Training Time Testing Time

Goal: User should reach a satisfying result in as few interactions as possible

Define: “interaction” and “satisfying”

Page 82: Carsten Rother Microsoft Research Cambridge

Human (averaged over 6 users)

Computer (simulated brush strokes)

Algorithmic State

Suggested action

Ground Truth

Current Solution

Page 83: Carsten Rother Microsoft Research Cambridge

What type of user? (novice user, advanced user)

Adjusting weights with the learning curve of the user

Other interactive systems

Page 84: Carsten Rother Microsoft Research Cambridge

• PatchMatch stereo, BMVC ‘11 PatchMatchBP stereo, BMVC ‘12

• SurfaceStereo, ObjectStereo, CVPR ‘10,’11 SceneStereo, ECCV ‘12

• Learning interactive image segmentation, IJCV ‘12