Top Banner
Temporal Action Localization in Untrimmed Videos via Multi- Stage CNNs Slides by Alberto Montes Computer Vision Group Reading Group , June 13th, 2016 [arXiv ] [code ] Zheng Shou, Dongang Wang and Shih-Fu Chang
26

Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Jan 12, 2017

Download

Technology

Xavier Giro
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Temporal Action Localization in Untrimmed Videos via Multi-Stage CNNs

Slides by Alberto MontesComputer Vision Group Reading Group,

June 13th, 2016

[arXiv] [code]

Zheng Shou, Dongang Wang and Shih-Fu Chang

Page 2: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Introduction

Page 3: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Previous Work

Improved Dense Trajectory (iDT)

Fisher Vector2D Convolution

Page 4: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Segment-CNN

Page 5: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Segment-CNN

Page 6: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Segment-CNN

Page 7: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Segment-CNN

Page 8: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Problem Definition

Video:

frame # frames

Annotations:

Candidates:

action category

action categorystart and ending frame

Page 9: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Multi-Scale Segment Generation

◉ Each frame resized to 171x128 pixels◉ Temporal sliding windows:

○ 16, 32, 64, 128, 256, 512 frames○ 75% overlap

◉ Construct segment s by uniformly sampling 16 frames

Page 10: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Network Architecture

C3D Network

Page 11: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Training Proposal and Classification Network

◉ lr=0.0001 except fc8 lr=0.01, momentum=0.9, weight decay factor=0.0005

◉ Drop lr by factor of 2 every 10K iterations

Proposal Network:

● fc8: 2 nodes

Classification Network:

● fc8: K+1 nodes

Page 12: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Localization Network

Add Custom Loss function

Page 13: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Localization Network

true class label

overlap sensitivity

Try to boost segments with high overlap

Works best with: λ = 1, α = 0.25

Page 14: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Localization Network

Learning target:

Page 15: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Localization Network

Page 16: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Prediction and Post-processing

◉ Keep segments with Ppro

> 0.7◉ Remove background segments◉ P

loc multiply with class-specific frequency of

occurrence for each window length in the training data to leverage window length distribution patterns

◉ NMS based on Ploc

to remove redundancy.

(θ - 0.1)

Page 17: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Experiments

Page 18: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

MEXaction2

“Bull Charge Cape” and

“Horse Riding” videos

77 hours of videos

Training set: 1336 instances

Validation set: 310 instances

Test set: 329 instances

Datasets

THUMOS 2014

Temporal Action Detection Task

20 categories

Training set: 2755 videos

Validation set: 1010 videos and 3007 instances

Test set: 1574 videos and 3358 instances

Page 19: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Results MEXaction2

DFT: Dense Trajectory Features + SVM

Page 20: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Results MEXaction2

Page 21: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Results MEXaction2

Page 22: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Evaluation

Page 23: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Evaluation

Page 24: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Evaluation

Impact of individual networks:

Page 25: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Conclusions

Propose a multi-stage framework Semgent-CNN to address temporal action location

Page 26: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Thank you!Questions?