Top Banner
Learning Correspondence from the Cycle-consistency of Time CVPR 2019 Oral Xiaolong Wang CMU Allan Jabri UC Berkeley Alexei A. Efros UC Berkeley
23

Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Jul 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Learning Correspondence from the Cycle-consistency of TimeCVPR 2019 Oral

Xiaolong Wang CMU Allan Jabri UC BerkeleyAlexei A. Efros UC Berkeley

Page 2: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Task: Visual Correspondence

— A Young Student: “What are the three most important problems in computer vision?”— Takeo Kanade: “Correspondence, correspondence, correspondence!”

This paper: A self-supervised method for learning visual

correspondence from unlabeled videoshttps://ajabri.github.io/timecycle/

“Correspondence is the glue that links disparate visual percepts into persistent entities and underlies visual reasoning in space and time”

Page 3: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

The main idea is to use cycle-consistency in time as free supervision signal

Motivation: Cycle-Consistency

Feature Representation Tracker

Complementary

Learn both representation and tracking simultaneously in a self-supervised manner.The learned representation can be used at test-time as a distance metric for correspondence.

In this example, the blue patch in frame t is tracked backward

to frame t-2 and tracked forward back to frame t. And the distance between the blue

and red patch in frame t can be used as the loss function.

In this self-supervised manner, the training data is unlimited.

Page 4: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Motivation: Challenges

1. Learning can take shortcuts (e.g. a static tracker) >>> Force re-localizing

2. The cycle may corrupt (e.g. sudden changes in object pose or occlusions) >>> Skip-cycles

3. Correspondence may be poor early in training (e.g. shorter cycles may ease learning) >>> Cycles with different lengths

Page 5: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Method: Formulation

Feature encoder: Find Correspondence at testing

Differentiable tracker: Only for trainingShould be weak so that the we can learn a strong representation

Recurrent Tracking Formulation:

1. Encode image sequence and the patch to track:

2. Find the most similar patch in image features: :

4. Iterative forward tracking:

3. Iterative backward tracking:

Page 6: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Method: Learning Objectives

Cycle-consistency (Full)

Cycle-consistency (Skip)

Patch Similarity

Cycles with difference lengths, k=4

Page 7: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Method: Encoder

The architecture of the encoder determines the type of correspondence.

A mid-level deep feature map is used, which is coarser than pixel space but with sufficient spatial resolution to support tasks that require localization.

• ResNet-50 architecture without the final 3 residual blocks• Input frames are 240 × 240 pixels, spatial features are thus 30 × 30. • Patches are randomly cropped 80 × 80, spatial features are thus 10 × 10.

Page 8: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Method: Tracker

Page 9: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Method: Sampler: Theta 2, 1, 180

1 2 3 4 33 2 0 3 00 2 4 4 51 1 3 2 22 2 2 3 1

x

y

Image

Page 10: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Method: Sampler: Theta 2, 1, 180

Translation: (2, 1)

1 2 3 4 33 2 0 3 00 2 4 4 51 1 3 2 22 2 2 3 1

x

y 2 3 4

2 3 4

2 3 4

1 1 1

2 2 2

3 3 3

x ySampling grids 3x3

Image

Page 11: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Method: Sampler: Theta 2, 1, 180

Translation: (2, 1)

Rotation: 180’

1 2 3 4 33 2 0 3 00 2 4 4 51 1 3 2 22 2 2 3 1

x

y

4 3 2

4 3 2

4 3 2

3 3 3

2 2 2

1 1 1

x y

2 3 4

2 3 4

2 3 4

1 1 1

2 2 2

3 3 3

x ySampling grids

Image

Page 12: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Method: Sampler: Theta 2, 1, 180

Translation: (2, 1)

Rotation: 180’

1 2 3 4 33 2 0 3 00 2 4 4 51 1 3 2 22 2 2 3 1

x

y

4 3 2

4 3 2

4 3 2

3 3 3

2 2 2

1 1 1

x y

4,3 3,3 2,3

4,2 3,2 2,2

4,1 3,1 2,1

(x, y)

2 3 4

2 3 4

2 3 4

1 1 1

2 2 2

3 3 3

x ySampling grids

Image

Page 13: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Method: Sampler: Theta 2, 1, 180

Translation: (2, 1)

Rotation: 180’

1 2 3 4 33 2 0 3 00 2 4 4 51 1 3 2 22 2 2 3 1

x

y

4 3 2

4 3 2

4 3 2

3 3 3

2 2 2

1 1 1

x y

4,3 3,3 2,3

4,2 3,2 2,2

4,1 3,1 2,1

(x, y)

2 2 3

5 4 4

0 3 0

2 3 4

2 3 4

2 3 4

1 1 1

2 2 2

3 3 3

x ySampling grids

Image

SampledPatch

Page 14: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Method: Learning Objectives

Cycle-consistency (Full)

Cycle-consistency (Skip)

Patch Similarity

Cycles with difference lengths, k=4

Page 15: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Method: Sampler: Theta 2, 1, 180

Translation: (2, 1)

Rotation: 180’

1 2 3 4 33 2 0 3 00 2 4 4 51 1 3 2 22 2 2 3 1

x

y

4 3 2

4 3 2

4 3 2

3 3 3

2 2 2

1 1 1

x y

4,3 3,3 2,3

4,2 3,2 2,2

4,1 3,1 2,1

(x, y)

2 2 3

5 4 4

0 3 0

2 3 4

2 3 4

2 3 4

1 1 1

2 2 2

3 3 3

x ySampling grids

Image

SampledPatch

Page 16: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Method: Sampler: Theta 2, 1, 180

Translation: (2, 1)

Rotation: 180’

1 2 3 4 33 2 0 3 00 2 4 4 51 1 3 2 22 2 2 3 1

x

y

4 3 2

4 3 2

4 3 2

3 3 3

2 2 2

1 1 1

x y

4,3 3,3 2,3

4,2 3,2 2,2

4,1 3,1 2,1

(x, y)

2 2 3

5 4 4

0 3 0

2 3 4

2 3 4

2 3 4

1 1 1

2 2 2

3 3 3

x ySampling grids

Image

SampledPatch

Page 17: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Experiments: Setup

Training:• VLOG dataset• 114K videos • 344 hours • No annotation• No pre-training• No fine-tuning

Tasks:Label propagation from the first frame:• Video object segmentation (DAVIS2017)• Human pose keypoints (JHMDB)• Instance-level and semantic-level masks (VIP)

Testing:Propagation by k-NN

Compared to:- Baselines:• Identity Propagation• Optical Flow • SIFT Flow - Other self-supervised methods:• Video Colorization • Transitive Invariance • DeepCluster - ImageNet Pre-training - Fully-Supervised Methods

Page 18: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Experiments: Example

Page 19: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Experiments: Video object segmentation

Page 20: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Experiments: Keypoints propagation

Page 21: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Experiments: Instance-level and semantic-level masks propagation

Page 22: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Experiments: Visualization

Page 23: Learning Correspondence from the Cycle-consistency of Time · Motivation: Cycle-Consistency Feature Representation Tracker Complementary Learn both representation and tracking simultaneously

Thank you