Top Banner
Discriminative Correlation Filters for Visual Tracking Martin Danelljan
124

Discriminative Correlation Filters for Visual Tracking

Jan 15, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Discriminative Correlation Filters for Visual Tracking

Discriminative Correlation Filters for Visual TrackingMartin Danelljan

Page 2: Discriminative Correlation Filters for Visual Tracking

Overview – Part I

Part I: Basics of Discriminative Correlation Filters

1. The Visual Tracking problem

2. DCF – the simple case

3. Multi-channel, multi-sample DCF

4. Special cases and approximative inference

5. Tracking pipeline and practical considerations

6. Kernels

7. Scale estimation

8. Periodic assumption: problem and solutions

Page 3: Discriminative Correlation Filters for Visual Tracking

Overview – Part II

Part II: Advanced topics in DCF tracking

1. Training set management

2. Deep image representations for tracking

3. Continuous-space formulation

4. Efficient Convolution Operators (ECO)

5. End-to-end Learning with DCF

6. Empowering deep features

Page 4: Discriminative Correlation Filters for Visual Tracking

Visual Tracking

Page 5: Discriminative Correlation Filters for Visual Tracking

Visual Tracking

Page 6: Discriminative Correlation Filters for Visual Tracking

Visual Tracking

• Only initial target location in known

• Challenges

– Environmental: occlusions, blur, clutter, illumination

– Motion/transformations: rotations, fast motion, scale change

– Appearance changes: deformations

Page 7: Discriminative Correlation Filters for Visual Tracking

Applications

Robotics, AR/VR, autonomous driving, video analysis …

Page 8: Discriminative Correlation Filters for Visual Tracking

Discriminative Correlation Filters (DCF)- The Basics

Page 9: Discriminative Correlation Filters for Visual Tracking

Discriminative Correlation Filters

What is it?

• Discriminatively learn a correlation filter

• Utilize the Fourier transform for efficiency

Why use it?

• Translation invariance ⇒ Correlation

• State-of-the-art since 2014

• Accuracy (even sub-pixel)

• Generic and customizable

Page 10: Discriminative Correlation Filters for Visual Tracking

DCF Popularity and Performance

• Hundreds of papers since 2014

• Winner of Visual Object Tracking (VOT) Challenge 2014, 2016, 2017 and 2018

• In VOT 2018: all top-5 trackers are based on DCF

Page 11: Discriminative Correlation Filters for Visual Tracking

DCF – the Simple Case

DFT

Page 12: Discriminative Correlation Filters for Visual Tracking

DCF – the Simple Case

Page 13: Discriminative Correlation Filters for Visual Tracking

DCF – the Simple Case

Target prediction:

Page 14: Discriminative Correlation Filters for Visual Tracking
Page 15: Discriminative Correlation Filters for Visual Tracking

Standard DCF Formulation

1. Multiple training samples

2. Multidimensional sophisticated features

Page 16: Discriminative Correlation Filters for Visual Tracking

Standard DCF Formulation

Page 17: Discriminative Correlation Filters for Visual Tracking

Standard DCF Formulation

weights

Page 18: Discriminative Correlation Filters for Visual Tracking

Inference

• DFT and Parseval’s theorem:

Page 19: Discriminative Correlation Filters for Visual Tracking

Inference

Page 20: Discriminative Correlation Filters for Visual Tracking

Inference

Page 21: Discriminative Correlation Filters for Visual Tracking

Inference

Page 22: Discriminative Correlation Filters for Visual Tracking

Inference

Blocks of 𝑚 × 𝐷

Blocks of D × 𝐷

Page 23: Discriminative Correlation Filters for Visual Tracking

Special Case 1: 𝐷 = 1

Only a single feature channel:

The original MOSSE filter [Bolme et al., CVPR 2010].

Page 24: Discriminative Correlation Filters for Visual Tracking

Dual form

Blocks of 𝑚 ×𝑚

Page 25: Discriminative Correlation Filters for Visual Tracking

Special Case 2: m = 1

Only a single training sample:

[Danelljan et al., BMVC 2014, PAMI 2017]

Page 26: Discriminative Correlation Filters for Visual Tracking

Approximate inference

1. Independent samples:

– Optimal for 𝑚 = 1

2. Independent channels:

– Optimal for D = 1

3. Combination:

– Optimal for 𝑚 = 1

– Optimal for D = 1

Page 27: Discriminative Correlation Filters for Visual Tracking

General tracking pipeline

1. Initialize model in first frame

2. Track in the new frame

3. Update model and goto 2.

Page 28: Discriminative Correlation Filters for Visual Tracking

Tracking pipeline: example

1. Initialize

2. Track

3. UpdateTarget location

Learning rate

Page 29: Discriminative Correlation Filters for Visual Tracking

Practical considerations

1. Multiply samples with cosine window

– Reduces boundary effects

Page 30: Discriminative Correlation Filters for Visual Tracking

Practical considerations

2. For : use Gaussian function

– Centered at target location

– Peak width parameter

– Motivation: minimizes the uncertainty principle

Page 31: Discriminative Correlation Filters for Visual Tracking

Kernelized Correlation Filters

Page 32: Discriminative Correlation Filters for Visual Tracking

Kernelized Correlation Filters (KCF)

• Henriques et al. [ECCV 2012, PAMI 2014]

• Idea: apply the kernel trick to the DCF

Kernel:

Shift invariant:

Example:

Shift operator:

Page 33: Discriminative Correlation Filters for Visual Tracking

Kernelized Correlation Filters (KCF)

• Kernelized correlation:

• Train model:

• Target scores:

• Approximative update rules [Henriques et al., 2012; Danelljan et al., 2014].

Page 34: Discriminative Correlation Filters for Visual Tracking

Kernelized Correlation Filters (KCF)

Should you use kernels?

More complicated learning

Harder to generalize

More costly

Similar or poorer performance

Essence of deep learning:

- Learn you feature mapping instead

Page 35: Discriminative Correlation Filters for Visual Tracking

Scale Estimation

Martin Danelljan, Gustav Häger, Fahad Shahbaz Khan, and Michael Felsberg. “Discriminative Scale Space Tracking”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 39.8 (2017), pp. 1561–1575.

Page 36: Discriminative Correlation Filters for Visual Tracking

Scale Estimation

Page 37: Discriminative Correlation Filters for Visual Tracking

Approach 1: Multi-scale detection

1. Extract test samples at multiple scales

2. Compute scores at each scale

3. Find max position and scale

Page 38: Discriminative Correlation Filters for Visual Tracking

Approach 2: Scale filter

• Idea: train a separate 1-dimensional scale DCF

• Directly discriminates between scales

• Discriminative Scale Space Tracker (DSST)[Danelljan et al., BMVC 2014, PAMI 2017]

Page 39: Discriminative Correlation Filters for Visual Tracking

Discriminative Scale Space Tracker

Scale training sample x Desired output y

Targ

et s

ize

Confidence score

Page 40: Discriminative Correlation Filters for Visual Tracking

Discriminative Scale Space Tracker

Page 41: Discriminative Correlation Filters for Visual Tracking

Discriminative Scale Space Tracker

Page 42: Discriminative Correlation Filters for Visual Tracking

Evaluation Measures

Predicted box

Ground-truth box

Page 43: Discriminative Correlation Filters for Visual Tracking

Scale Estimation Results

OTB-2013 dataset[Wu et al., CVPR 2013]

Page 44: Discriminative Correlation Filters for Visual Tracking

Scale estimation: Comparison

Approach 2 (scale filter):

• Faster

• Generic (used in many different trackers)

• Often more accurate for simple DCF trackers

Approach 1 (multi-scale detection):

• Slower

• Often more accurate for advanced DCF trackers

presented next

Page 45: Discriminative Correlation Filters for Visual Tracking

The Periodic Assumption: Problem and Solutions

Page 46: Discriminative Correlation Filters for Visual Tracking

Periodic Assumption in DCF

What we want… What actually happens…

Page 47: Discriminative Correlation Filters for Visual Tracking

Larger Samples?

Page 48: Discriminative Correlation Filters for Visual Tracking

Why?

Learned filter

Page 49: Discriminative Correlation Filters for Visual Tracking

Effects of Periodic Assumption

Forces a small sample size in training/detection

Effects:

• Limits training data

• Corrupts data

• Limits search region

Page 50: Discriminative Correlation Filters for Visual Tracking

Tackling the Periodic Assumption

We need means of controlling the filter extent!

• Enables larger samples.

Approaches:

1. Constrained optimization

2. Spatial regularization

Page 51: Discriminative Correlation Filters for Visual Tracking

Constrained Optimization

• Idea: Constrain filter coefficients to be zero outside the target bounding box.

• Rewrite constraint:

background pixels

Inverse Fourier transform

Page 52: Discriminative Correlation Filters for Visual Tracking

Constrained Optimization

• Fourier domain formulation:

target pixels

Page 53: Discriminative Correlation Filters for Visual Tracking

Constrained Optimization

• Generates dense normal equations

• Use iterative solvers:

– ADMM [H.K. Galoogahi, CVPR 2015]

– Proximal gradient [J.A. Fernandez, PAMI 2015]

• Requires iterating between spatial and Fourier

Page 54: Discriminative Correlation Filters for Visual Tracking

Spatially Regularized DCF (SRDCF)[M. Danelljan, ICCV 2015]

Page 55: Discriminative Correlation Filters for Visual Tracking

Spatially Regularized DCF (SRDCF)[M. Danelljan, ICCV 2015]

Page 56: Discriminative Correlation Filters for Visual Tracking

Spatially Regularized DCF (SRDCF)

DFT

Page 57: Discriminative Correlation Filters for Visual Tracking

Spatially Regularized DCF (SRDCF)

DFT

Page 58: Discriminative Correlation Filters for Visual Tracking

Spatially Regularized DCF (SRDCF)

Convolution matrix

Page 59: Discriminative Correlation Filters for Visual Tracking

Spatially Regularized DCF (SRDCF)

What we had… What we achieved…

Page 60: Discriminative Correlation Filters for Visual Tracking

Spatially Regularized DCF

Page 61: Discriminative Correlation Filters for Visual Tracking

Spatially Regularized DCF

Page 62: Discriminative Correlation Filters for Visual Tracking

Spatially Regularized DCF

OTB-2015 dataset

Page 63: Discriminative Correlation Filters for Visual Tracking

Adaptive Training Set Management

Martin Danelljan, Gustav Häger, Fahad Shahbaz Khan, and Michael Felsberg. “Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking”. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016.

Page 64: Discriminative Correlation Filters for Visual Tracking

Model Drift

Page 65: Discriminative Correlation Filters for Visual Tracking

Adaptive Training Set Management

Page 66: Discriminative Correlation Filters for Visual Tracking

Discriminative Tracking Methods

Page 67: Discriminative Correlation Filters for Visual Tracking

Our Approach - Motivation

• Continuous weights

– More control of importance

– Helps in ambiguous cases (e.g. partial occlusions)

• Re-determination of importance in each frame

– Exploit later samples

– Use all available information

• Prior information

– E.g. how old the sample is

– Or number of samples in a frame

Page 68: Discriminative Correlation Filters for Visual Tracking

Our Approach

Page 69: Discriminative Correlation Filters for Visual Tracking

Adaptive Sample Weights

Page 70: Discriminative Correlation Filters for Visual Tracking

Deep Image Representations For Tracking

Page 71: Discriminative Correlation Filters for Visual Tracking

Hand-crafted FeaturesColor Features[M. Danelljan, CVPR 2014]

Color Names[Weijer and Schmid, TIP 2009]

Shape features

Histogram of Oriented Gradients (HOG)[Dalal and Triggs, 2005]

Page 72: Discriminative Correlation Filters for Visual Tracking

Deep Convolutional Features

Page 73: Discriminative Correlation Filters for Visual Tracking

Evaluation of Convolutional Feature Layers

• On OTB-2013 dataset

[M. Danelljan, ICCVW 2015]

Page 74: Discriminative Correlation Filters for Visual Tracking

Learning Continuous Convolution Operators

Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan, and Michael Felsberg. “Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking”. In: European Conference on Computer Vision (ECCV) 2016.

Page 75: Discriminative Correlation Filters for Visual Tracking

Discriminative Correlation Filters (DCF)

Single-resolution feature map

Limitations:Coarse output

scores

Page 76: Discriminative Correlation Filters for Visual Tracking

Our Approach: Overview

Continuous filters Continuous

outputMulti-

resolution features

Page 77: Discriminative Correlation Filters for Visual Tracking

DCF Limitations:1. Single-resolution feature map

• Why a problem?

– Combine convolutional layers of a CNN

• Shallow layers: low invariance – high resolution

• Deep layers: high invariance – low resolution

• How to solve?

– Explicit resampling?

• Artefacts, information loss, redundant data

– Independent DCFs with late fusion?

• Sub-optimal, correlations between layers

Page 78: Discriminative Correlation Filters for Visual Tracking

DCF Limitations:2. Coarse output scores

• Why a problem?

– Accurate localization

• Sub-grid (e.g. HOG grid) or sub-pixel accuracy

• More accurate annotations=> less drift

• How to solve?

– Interpolation?

• Which interpolation strategy?

– Interweaving?

• Costly

Page 79: Discriminative Correlation Filters for Visual Tracking

DCF Limitations:3. Coarse labels

• Why a problem?

– Accurate learning

• Sub-grid or sub-pixel supervision

• How to solve?

– Interweaving?

• Costly

– Explicit interpolation of features?

• Artefacts

Page 80: Discriminative Correlation Filters for Visual Tracking

Interpolation Operator

Page 81: Discriminative Correlation Filters for Visual Tracking

Convolution Operator

Page 82: Discriminative Correlation Filters for Visual Tracking

Training Loss

Page 83: Discriminative Correlation Filters for Visual Tracking

Training Loss – Fourier Domain

Page 84: Discriminative Correlation Filters for Visual Tracking

Optimization: Conjugate Gradient

• Solve

• Use Conjugate Gradient:

– Only need to evaluate

– => No sparse matrix handling!

– Warm start estimate and search direction

– Preconditioner important

• Details: “On the Optimization of Advanced DCF-Trackers”, J. Johnander, G. Bhat, M. Danelljan, F. Khan, M. Felsberg. VOT Challenge ECCV Workshop, 2018.

Page 85: Discriminative Correlation Filters for Visual Tracking

How to set and ?

• Use periodic summation of functions :

• Gaussian function for

• Cubic spline kernel for

• Fourier coefficients with Poisson’s summation formula:

Page 86: Discriminative Correlation Filters for Visual Tracking

Results

• Layer fusion on OTB-2015 dataset

75

76

77

78

79

80

81

82

83

Conv1 Conv1 + Conv5 RGB + Conv1 +Conv5

Mea

n O

verl

ap P

reci

sio

n +3.8% +0.6%

Page 87: Discriminative Correlation Filters for Visual Tracking

Sub-pixel Localization with CCOT

Page 88: Discriminative Correlation Filters for Visual Tracking

Sub-pixel Localization with CCOT

Page 89: Discriminative Correlation Filters for Visual Tracking

Feature Point Tracking Framework

• Grayscale pixel features,

• Uniform regularization,

Page 90: Discriminative Correlation Filters for Visual Tracking

CCOT Feature Point Tracking

Page 91: Discriminative Correlation Filters for Visual Tracking

Experiments: Feature Point Tracking

• The Sintel dataset

Page 92: Discriminative Correlation Filters for Visual Tracking

Efficient Convolution Operators (ECO)

Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. “ECO: Efficient Convolution Operators for Tracking”. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR 2017.

Page 93: Discriminative Correlation Filters for Visual Tracking

Issues With C-COT

1. Slow

– ~10 FPS with hand-crafted features

– ~1 FPS with deep features

2. Overfitting

– ~0.5M parameters updated online

– Memory focusing on recent samples

Page 94: Discriminative Correlation Filters for Visual Tracking

Factorized Convolution

• Learn filter 𝑓 and matrix 𝑃 jointly

• Gauss Newton iterations with Conjugate Gradient

• 80% reduction in parameters

Page 95: Discriminative Correlation Filters for Visual Tracking

Factorized Convolution

C-COT filters ECO filters

Page 96: Discriminative Correlation Filters for Visual Tracking

Generative Sample Space Model

• Online Gaussian Mixture Model of training samples

• ⟹ 90% reduction in training samples

ECO:GMM clusters

Previous:Linear memory

Page 97: Discriminative Correlation Filters for Visual Tracking

Speedup

• 10x speedup compared to C-COT

• Same or better performance

• 60 FPS on CPU with handcrafted features

• 15 FPS on GPU with deep features

Notes:

• Matlab/Mex

• “Slow” network

Page 98: Discriminative Correlation Filters for Visual Tracking

End-to-end Learning with DCF

Page 99: Discriminative Correlation Filters for Visual Tracking

End-to-end Learning

• Could we learn the underlying features?

• Use the DCF solution for a single training sample as a layer in a deep network:

• Train in Siamese fashion:

– On image pairs

Network parameters

test sample

desired output

Page 100: Discriminative Correlation Filters for Visual Tracking

End-to-end Learning: CFNet

[J. Valmadre et al., CVPR 2017]

• Logistic loss

Page 101: Discriminative Correlation Filters for Visual Tracking

End-to-end Learning: CFCF

[E. Gondogdu and A. Alatan, TIP 2018]

• 𝐿2-loss. Finetune VGG-m.

• Integrate learned features in C-COT

Page 102: Discriminative Correlation Filters for Visual Tracking

Unveiling the Power of Deep Tracking

Goutam Bhat, Joakim Johnander, Martin Danelljan, Fahad Shahbaz Khan, and Michael Felsberg. “Unveiling the Power of Deep Tracking”. In: European Conference on Computer Vision (ECCV) 2018.

Page 103: Discriminative Correlation Filters for Visual Tracking

ECOTracking Performance, NFS

Page 104: Discriminative Correlation Filters for Visual Tracking

104

Motivation

• Challenges: Deformations, In-plane/Out-of-plane rotations

• Can we utilize the invariance of deep features?

Page 105: Discriminative Correlation Filters for Visual Tracking

105

Motivation

• How about using deeper networks?

Tracking Performance, NFS

Page 106: Discriminative Correlation Filters for Visual Tracking

106

Motivation

• Features unsuitable for tracking?

– Let's train features for tracking

Tracking Performance, NFS

Page 107: Discriminative Correlation Filters for Visual Tracking

107

Causes 1: Training data

• Limited training data in the first frame

• Training data only models translations

Page 108: Discriminative Correlation Filters for Visual Tracking

108

Data augmentation

• Can simulate commonly encountered challenges in object tracking, e.g. rotations, motion blur, occlusions

Page 109: Discriminative Correlation Filters for Visual Tracking

109

Data augmentationImpact of data augmentation, OTB-2015

Deep: ResNet-50 (Conv4)

Shallow: HOG+Color Names

Page 110: Discriminative Correlation Filters for Visual Tracking

110

Cause 2: Accuracy-Robustness Tradeoff

Image Shallow Model Deep Model

Page 111: Discriminative Correlation Filters for Visual Tracking

111

Cause 2: Accuracy-Robustness Tradeoff

Let’s revisit training in ECO

• Training data: Shifted versions of the target

• Width of label function determines how the samples are labelled

• Sharp label function ⇒ Enforce Accuracy

• Wide label function ⇒ Prefer Robustness

Page 112: Discriminative Correlation Filters for Visual Tracking

112

Cause 2: Accuracy-Robustness TradeoffImpact of label width, OTB-2015

Deep: ResNet-50 (Conv4)

Shallow: HOG+Color Names

Page 113: Discriminative Correlation Filters for Visual Tracking

113

Accuracy-Robustness tradeoff

Tracking Performance, NFS

Page 114: Discriminative Correlation Filters for Visual Tracking

114

Accuracy-Robustness tradeoff

Image Shallow Model Deep Model

Page 115: Discriminative Correlation Filters for Visual Tracking

115

New framework

Extract Features

Shallow

Deep

Train Separate Filters Apply filter

Fusion?

Page 116: Discriminative Correlation Filters for Visual Tracking

116

Adaptive Model Fusion

We want the score function to have a single, sharp peak

Image Deep Score Shallow Score

Page 117: Discriminative Correlation Filters for Visual Tracking

117

Adaptive Model Fusion

• Prediction Quality Measure

Page 118: Discriminative Correlation Filters for Visual Tracking

118

Results

Need For Speed dataset (100 videos)

Page 119: Discriminative Correlation Filters for Visual Tracking

119

Results

Generalization to networks

Page 120: Discriminative Correlation Filters for Visual Tracking

State-of-the-Art and Conclusions

Page 121: Discriminative Correlation Filters for Visual Tracking

Current state-of-the-art

• VOT2018 sequestered dataset

Directly based on ECO

[“The Visual Object Tracking VOT2018 Challenge Results”, M. Kristan et al., 2018]

Page 122: Discriminative Correlation Filters for Visual Tracking

Conclusions and Future Work

• DCF is a versatile framework for tracking

• Highly adaptable for specific applications

• Efficient online learning

• Future work:

– Richer output: towards segmentation

– Long-term tracking robustness

– Better end-to-end integration and learning

Page 123: Discriminative Correlation Filters for Visual Tracking

Gustav Häger Fahad Khan Michael FelsbergGoutam Bhat Joakim Johnander

Acknowledgements

• EMC2, funded by Vetenskapsrådet

• Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation

Page 124: Discriminative Correlation Filters for Visual Tracking

www.liu.se

Martin Danelljan