Top Banner
Juergen Gall An Introduction to Temporal Action Segmentation From Fully Supervised Learning to Weakly Supervised Learning
62

Juergen Gall An Introduction to Temporal Action ...

Apr 08, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Juergen Gall An Introduction to Temporal Action ...

Juergen Gall

An Introduction to Temporal Action Segmentation

From Fully Supervised Learning to

Weakly Supervised Learning

Page 2: Juergen Gall An Introduction to Temporal Action ...

Action Recognition

• Large annotated datasets

• UCF101 (98.2%), HMDB (82.5%), Kinetics-400

(82.8%), Epic-Kitchens (36.7%)

• http://actionrecognition.net

• Continuous data streams

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Page 3: Juergen Gall An Introduction to Temporal Action ...

Action Segmentation vs.

Action Detection

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

• Action Detection (THUMOS, ActivityNet)

• Action Segmentation (Breakfast, 50 Salads, GTEA)

Page 4: Juergen Gall An Introduction to Temporal Action ...

Action Segmentation vs.

Action Detection

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

• Action Detection (Object Detection)

• Action Segmentation (Semantic Segmentation)

Page 5: Juergen Gall An Introduction to Temporal Action ...

Action Segmentation vs.

Action Detection

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

• Action Detection (THUMOS, ActivityNet)

• Action Segmentation (Breakfast, 50Salads, GTEA)

Page 6: Juergen Gall An Introduction to Temporal Action ...

Why Action Segmentation?

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Page 7: Juergen Gall An Introduction to Temporal Action ...

Datasets

• Breakfast

https://serre-lab.clps.brown.edu/resource/breakfast-

actions-dataset/

• 50 Salads

https://cvip.computing.dundee.ac.uk/datasets/foodpre

paration/50salads/

• GTEA

http://cbs.ic.gatech.edu/fpv/#gtea

• COIN

https://coin-dataset.github.io/

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Page 8: Juergen Gall An Introduction to Temporal Action ...

Let’s build a baseline…

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Page 9: Juergen Gall An Introduction to Temporal Action ...

Hidden Markov Model

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 9

Simon J.D. Prince

[ S. Prince. Computer Vision: Models, Learning, and Inference. Cambridge

University Press ]

Page 10: Juergen Gall An Introduction to Temporal Action ...

Inference

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 10

MAP inference:

Substituting:

Simon J.D. Prince

[ S. Prince. Computer Vision: Models, Learning, and Inference. Cambridge

University Press ]

HMM:

Global minimum by dynamic programming

Page 11: Juergen Gall An Introduction to Temporal Action ...

Features: Dense Trajectories

• Dense sampling of features

• Feature tracking

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 11

[ H. Wang et al. Dense Trajectories and

Motion Boundary Descriptors for Action

Recognition. International Journal of

Computer Vision 2013 ]

Page 12: Juergen Gall An Introduction to Temporal Action ...

Hidden Markov Model

• Hidden Markov Model (HMM) for each activity

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 12

[ H. Kuehne et al. An end-to-end generative framework for video segmentation and

recognition. WACV 2016 ]

Page 13: Juergen Gall An Introduction to Temporal Action ...

Baseline

• HMM + GMM (IDT)

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 13

[ H. Kuehne et al. An end-to-end generative framework for video segmentation and

recognition. WACV 2016 ]

Page 14: Juergen Gall An Introduction to Temporal Action ...

Baseline

• HMM + GMM (IDT)

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 14

[ H. Kuehne et al. An end-to-end generative framework for video segmentation and

recognition. WACV 2016 ]

Page 15: Juergen Gall An Introduction to Temporal Action ...

Baseline

• HMM + GMM (IDT)

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 15

[ H. Kuehne et al. An end-to-end generative framework for video segmentation and

recognition. WACV 2016 ]

Page 16: Juergen Gall An Introduction to Temporal Action ...

Grammar

• Transitions between activity HMMs are modeled by

context free grammar

• SIL: start and end points

• Transition probability is 1 if connection exists

otherwise 0

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 16

[ H. Kuehne et al. An end-to-end generative framework for video segmentation and

recognition. WACV 2016 ]

Page 17: Juergen Gall An Introduction to Temporal Action ...

Baseline

• Breakfast dataset (~65 hours)

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Method Frame-wise

Accuracy (%)

Kuehne et al. 2016 (HMM+GMM) 56.3

[ H. Kuehne et al. An end-to-end generative framework for video segmentation and

recognition. WACV 2016 ]

Page 18: Juergen Gall An Introduction to Temporal Action ...

Hybrid RNN-HMM

• HMM + RNN with Gated Recurrent Units (GRU)

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 18

Page 19: Juergen Gall An Introduction to Temporal Action ...

Gated Recurrent Units (GRU)

• Similar to LSTM, but it does not need an additional

memory cell

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 19

[ K. Cho et al. On the Properties of Neural Machine Translation: Encoder-Decoder

Approaches. Workshop SSST 2014 ]

[ J. Chung et al. Empirical Evaluation of Gated Recurrent Neural Networks on

Sequence Modeling. NIPS Workshop 2014 ]

Page 20: Juergen Gall An Introduction to Temporal Action ...

Hybrid RNN-HMM

• Breakfast dataset (~65 hours)

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Method Frame-wise

Accuracy (%)

Kuehne et al. 2016 (HMM+GMM) 56.3

Richard et al. 2017 (HMM+RNN) 60.6

Kuehne et al. 2020 (HMM+RNN) 61.3

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

[ H. Kuehne et al. A Hybrid RNN-HMM Approach for Weakly Supervised Temporal

Action Segmentation. PAMI 2020 ]

Page 21: Juergen Gall An Introduction to Temporal Action ...

Temporal Convolutional Neural Network

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 21

[ C. Lea et al. Temporal

Convolutional

Networks for Action

Segmentation and

Detection. CVPR 2017 ]

Page 22: Juergen Gall An Introduction to Temporal Action ...

Temporal Convolutional Network

• Breakfast dataset (~65 hours)

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Method Frame-wise

Accuracy (%)

Lea et al. 2017 (ED-TCN)* 43.3

Kuehne et al. 2016 (HMM+GMM) 56.3

Richard et al. 2017 (HMM+RNN) 60.6

Kuehne et al. 2020 (HMM+RNN) 61.3

[ C. Lea et al. Temporal Convolutional Networks for Action Segmentation and

Detection. CVPR 2017 ]

*[ L. Ding and C. Xu. Weakly-supervised action segmentation with iterative soft

boundary assignment. CVPR 2018 ]

Page 23: Juergen Gall An Introduction to Temporal Action ...

Temporal Convolutional Neural Network

• Dilated convolutions for audio

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ van den Oord et al. WaveNet: A Generative Model for Raw Audio. SSW 2016 ]

Page 24: Juergen Gall An Introduction to Temporal Action ...

Temporal Convolutional Neural Network

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 24

Dilated convolutions

capture long

temporal receptive

field

Causal convolutions:

Input for t depends

only on previous

observations

[ C. Lea et al. Temporal Convolutional Networks for Action Segmentation and

Detection. CVPR 2017 ]

Page 25: Juergen Gall An Introduction to Temporal Action ...

Temporal Convolutional Network

• 50 Salads

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Method Frame-

wise

Accuracy

(%)

Lea et al. 2017 (ED-TCN) 64.7

Lea et al. 2017 (Dilated TCN) 59.3

[ C. Lea et al. Temporal Convolutional Networks for Action Segmentation and

Detection. CVPR 2017 ]

Page 26: Juergen Gall An Introduction to Temporal Action ...

Temporal Convolutional Network

• 50 Salads

• Edit distance (sensitive to oversegmentation):

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Method 1 – Norm.

Edit

Distance

(%)

Frame-

wise

Accuracy

(%)

Lea et al. 2017 (ED-TCN) 59.8 64.7

Lea et al. 2017 (Dilated TCN) 43.1 59.3

Page 27: Juergen Gall An Introduction to Temporal Action ...

Multi-Stage Temporal Convolutional

Network

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ Y. Abu Farha and J. Gall. MS-TCN: Multi-Stage Temporal Convolutional Network

for Action Segmentation. CVPR 2019 ]

Page 28: Juergen Gall An Introduction to Temporal Action ...

Multi-Stage Temporal Convolutional

Network

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ Y. Abu Farha and J. Gall.

MS-TCN: Multi-Stage

Temporal Convolutional

Network for Action

Segmentation. CVPR 2019 ]

Page 29: Juergen Gall An Introduction to Temporal Action ...

Multi-Stage Temporal Convolutional

Network

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ Y. Abu Farha and J. Gall.

MS-TCN: Multi-Stage

Temporal Convolutional

Network for Action

Segmentation. CVPR 2019 ]

Page 30: Juergen Gall An Introduction to Temporal Action ...

Over-segmentation

• Frame-wise classification loss:

• Additional loss is required to avoid over-

segmentation:

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ Y. Abu Farha and J. Gall. MS-TCN: Multi-Stage Temporal Convolutional Network

for Action Segmentation. CVPR 2019 ]

Page 31: Juergen Gall An Introduction to Temporal Action ...

Loss

• Frame-wise classification loss

• Additional loss is required to avoid over-

segmentation:

• Loss functions of all stages s:

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ Y. Abu Farha and J. Gall. MS-TCN: Multi-Stage Temporal Convolutional Network

for Action Segmentation. CVPR 2019 ]

Page 32: Juergen Gall An Introduction to Temporal Action ...

Loss

• Additional loss is required to avoid oversegmentation

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Page 33: Juergen Gall An Introduction to Temporal Action ...

Impact of stages

• Impact of stages (50 Salads)

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ Y. Abu Farha and J. Gall. MS-TCN: Multi-Stage Temporal Convolutional Network

for Action Segmentation. CVPR 2019 ]

Page 34: Juergen Gall An Introduction to Temporal Action ...

Multi-Stage Temporal Convolutional

Network• Breakfast dataset (~65 hours)

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Method Frame-wise

Accuracy (%)

Lea et al. 2017 (ED-TCN)* 43.3

Kuehne et al. 2016 (HMM+GMM) 56.3

Richard et al. 2017 (HMM+RNN) 60.6

Kuehne et al. 2020 (HMM+RNN) 61.3

MS-TCN (TCN) 65.1

MS-TCN (TCN+I3D) 66.3

[ Y. Abu Farha and J. Gall. MS-TCN: Multi-Stage Temporal Convolutional Network

for Action Segmentation. CVPR 2019 ]

[ J. Carreira and A. Zisserman. Quo Vadis, Action Recognition? A New Model and

the Kinetics Dataset. CVPR 2017 ]

Page 35: Juergen Gall An Introduction to Temporal Action ...

Temporal Action Segmentation

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Page 36: Juergen Gall An Introduction to Temporal Action ...

MS-TCN

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ Y. Abu Farha and J. Gall.

MS-TCN: Multi-Stage

Temporal Convolutional

Network for Action

Segmentation CVPR 2019 ]

Page 37: Juergen Gall An Introduction to Temporal Action ...

MS-TCN++

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ S. Li et al. MS-TCN++: Multi-

Stage Temporal Convolutional

Network for Action

Segmentation. arXiv ]

Page 38: Juergen Gall An Introduction to Temporal Action ...

MS-TCN++

• Breakfast dataset

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Method Frame-wise

Accuracy (%)

Lea et al. 2017 (TCN)* 43.3

Kuehne et al. 2016 (HMM+GMM) 56.3

Richard et al. 2017 (HMM+RNN) 60.6

Kuehne et al. 2020 (HMM+RNN) 61.3

MS-TCN (TCN) 65.1

MS-TCN (TCN+I3D) 66.3

MS-TCN++ (TCN+I3D) 67.6

[ S. Li et al. MS-TCN++: Multi-Stage Temporal Convolutional Network for Action

Segmentation. arXiv ]

Page 39: Juergen Gall An Introduction to Temporal Action ...

MS-TCN++ vs. MS-TCN

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Page 40: Juergen Gall An Introduction to Temporal Action ...

Weakly Supervised Learning

• Training video

• Fully supervised:

• Weakly supervised (transcripts)

A → C → F → D → A → E → H

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

A C F D E HA

Page 41: Juergen Gall An Introduction to Temporal Action ...

Recall: Hybrid RNN-HMM

• HMM + RNN with Gated Recurrent Units (GRU)

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 41

Page 42: Juergen Gall An Introduction to Temporal Action ...

Weakly Supervised Learning

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Page 43: Juergen Gall An Introduction to Temporal Action ...

Weakly Supervised Learning

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Page 44: Juergen Gall An Introduction to Temporal Action ...

Model

• The transcripts define the order of activities:

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Page 45: Juergen Gall An Introduction to Temporal Action ...

Model

• The transcripts define the order of activities:

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Page 46: Juergen Gall An Introduction to Temporal Action ...

Model

• The transcripts define the order of activities:

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Page 47: Juergen Gall An Introduction to Temporal Action ...

Weakly Supervised Learning

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Page 48: Juergen Gall An Introduction to Temporal Action ...

Weakly Supervised Learning

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Page 49: Juergen Gall An Introduction to Temporal Action ...

Weakly Supervised Learning

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Page 50: Juergen Gall An Introduction to Temporal Action ...

Weakly Supervised Learning

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Page 51: Juergen Gall An Introduction to Temporal Action ...

Weakly Supervised Learning

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Page 52: Juergen Gall An Introduction to Temporal Action ...

Results

• Disadvantage: Offline and sensitive to initialization

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Breakfast

frame accuracy (%)

pseudo-GT (HMM+RNN)

Richard et al. 2017

33.3

pseudo-GT (HMM+RNN)

Kuehne et al. 2020

36.7

Fully supervised (HMM+RNN)

Kuehne et al. 2020

61.3

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

[ H. Kuehne et al. A Hybrid RNN-HMM Approach for Weakly Supervised Temporal

Action Segmentation. PAMI 2020 ]

Page 53: Juergen Gall An Introduction to Temporal Action ...

Incremental learning

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ A. Richard et al. NeuralNetwork-Viterbi: A Framework for Weakly Supervised

Video Learning. CVPR 2018 ]

Viterbi Decoding

(action transcript)

Neural Network

forw

ard

(input video)

backprop

Page 54: Juergen Gall An Introduction to Temporal Action ...

Results

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Breakfast

frame accuracy (%)

pseudo-GT (HMM+RNN)

Richard et al. 2017

33.3

pseudo-GT (HMM+RNN)

Kuehne et al. 2020

36.7

NN-Viterbi (HMM+RNN)

Richard et al. 2018

43.0

Fully supervised (HMM+RNN)

Kuehne et al. 2020

61.3

[ A. Richard et al. NeuralNetwork-Viterbi: A Framework for Weakly Supervised

Video Learning. CVPR 2018 ]

Page 55: Juergen Gall An Introduction to Temporal Action ...

Pseudo GT vs. NN-Viterbi

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Page 56: Juergen Gall An Introduction to Temporal Action ...

Evaluation Issues

• Weakly supervised approaches are sensitive to

initialization

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ Y. Souri et al. On Evaluating Weakly Supervised Action Segmentation Methods.

arXiv ]

[ J. Li et al. Weakly supervised energy-based learning for action segmentation.

ICCV 2019 ]

[ L. Ding and C. Xu. Weakly-supervised action segmentation with iterative soft

boundary assignment. CVPR 2018 ]

Page 57: Juergen Gall An Introduction to Temporal Action ...

Features

• Some approaches struggle with pre-trained features

(I3D)

• Dimensionality is just one issue

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ Y. Souri et al. On Evaluating Weakly Supervised Action Segmentation Methods.

arXiv ]

Page 58: Juergen Gall An Introduction to Temporal Action ...

Weakly Supervised Learning

• Training video

• Fully supervised:

• Weakly supervised (transcripts)

A → C → F → D → A → E → H

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

A C F D E HA

Page 59: Juergen Gall An Introduction to Temporal Action ...

Weakly Supervised Learning

• Fully supervised:

• Weakly supervised (transcripts)

A → C → F → D → A → E → H

• Weakly supervised (action set)

{A, C, D, E, F, H}

• Order unknown

• Number of occurrence unknown

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

A C F D E HA

[ M. Fayyaz and J. Gall. SCT: Set Constrained Temporal Transformer for Set

Supervised Action Segmentation. CVPR 2020 ]

Page 60: Juergen Gall An Introduction to Temporal Action ...

Results

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Supervision frame

accuracy

(%)

SCT (TCN+I3D)

Fayyaz and Gall 2020

Action set 30.4

pseudo-GT (HMM+RNN)

Kuehne et al. 2020

Transcript 36.7

NN-Viterbi (HMM+RNN)

Richard et al. 2018

Transcript 43.0

HMM+RNN

Kuehne et al. 2020

Full 61.3

MS-TCN++ (TCN+I3D)

Li et al. arXiv

Full 67.6

Page 61: Juergen Gall An Introduction to Temporal Action ...

Source Code

• MS-TCN: https://github.com/yabufarha/ms-tcn

• ISBA: https://github.com/Zephyr-D/TCFPN-ISBA

• NN-Viterbi: https://github.com/alexanderrichard/NeuralNetwork-Viterbi

• CDFL: https://github.com/JunLi-Galios/CDFL

• Action sets: https://github.com/alexanderrichard/action-sets

• SCT: https://github.com/MohsenFayyaz89/SCT(Codes not uploaded yet)

• Unsupervised learning: https://github.com/Annusha/unsup_temp_embed

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup

Page 62: Juergen Gall An Introduction to Temporal Action ...

Thank you for your attention.

03 .08 .2 02 0 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup