Top Banner
Human Action Recognition without Human He Yun 1,2 , Soma Shirakabe 1,2 , Yutaka Satoh 1,2 , Hirokatsu Kataoka 1 1 Computer Vision Research Group, AIST, Japan 2 Human-Centered Vision Lab., University of Tsukuba, Japan
15

【ECCV 2016 BNMW】Human Action Recognition without Human

Apr 16, 2017

Download

Science

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 【ECCV 2016 BNMW】Human Action Recognition without Human

Human Action Recognition without Human

He Yun1,2, Soma Shirakabe1,2, Yutaka Satoh1,2, Hirokatsu Kataoka1

1Computer Vision Research Group, AIST, Japan 2Human-Centered Vision Lab., University of Tsukuba, Japan

Page 2: 【ECCV 2016 BNMW】Human Action Recognition without Human

Motion representation

•  Database: UCF101, HMDB51, ActivityNet

•  Approach: IDT, Two-Stream CNN

–  DBs and approaches have been prepared in the field

Page 3: 【ECCV 2016 BNMW】Human Action Recognition without Human

Action Database

h"p://www.thumos.info/

Page 4: 【ECCV 2016 BNMW】Human Action Recognition without Human

The problem setting in action recognition

•  Video-level prediction

–  1 action-label prediction per input video

TennisSwing

Mo6onDescriptor

Page 5: 【ECCV 2016 BNMW】Human Action Recognition without Human

Dense Trajectories (DT) [Wang+, CVPR11]

•  Trajectory-based representation

–  A large amount of trajectories

–  Feature description (HOG, HOF, MBH)

–  Codeword vector is generated

Page 6: 【ECCV 2016 BNMW】Human Action Recognition without Human

Two-Stream CNN [Simonyan+, NIPS14]

•  Spatial and temporal convolution

–  Spatial-stream: From a RGB image

–  Temporal-stream: From a stacked flows

–  Score fusion: Average or SVM

Page 7: 【ECCV 2016 BNMW】Human Action Recognition without Human

Is background enough to classify actions?

•  RGB input is too strong!

–  The two-stream CNN[Simonyan+, NIPS14] reported spatial-stream can understand an

action more than expected

•  72.4% with spatial-stream (RGB) @UCF101

•  “Human Action Recognition without Human”

Page 8: 【ECCV 2016 BNMW】Human Action Recognition without Human

Without Human?

•  Human action recognition can be done just by motion of the

background?

TennisSwing

Mo6onDescriptor

TennisSwing?

Mo6onDescriptor

Page 9: 【ECCV 2016 BNMW】Human Action Recognition without Human

Detailed setting of w/ and w/o Human

•  With and without human setting

–  Without human setting: center-blind image with UCF101

–  With human setting: inverse of the without human setting

I(x,y) f(x,y)* I’(x,y)

1/2 1/41/4

1/2

1/4

1/4

I(x,y) f(x,y)* I’(x,y)

1/2 1/41/4

1/2

1/4

1/4ー ー

WithoutHumanSeIng WithHumanSeIng

Page 10: 【ECCV 2016 BNMW】Human Action Recognition without Human

Framework –  Baseline: Very deep two-stream CNN [Wang+, arXiv15]

–  Two different scenarios: without human and with human

Page 11: 【ECCV 2016 BNMW】Human Action Recognition without Human

Exploration experiment

•  @UCF101

–  UCF101 pre-trained model with very deep two-stream CNN

–  With/Without Human Setting

Page 12: 【ECCV 2016 BNMW】Human Action Recognition without Human

Visual results (Full Image)

Page 13: 【ECCV 2016 BNMW】Human Action Recognition without Human

Visual results (Without Human Setting)

Page 14: 【ECCV 2016 BNMW】Human Action Recognition without Human

Without Human

•  The concept of ”Human Action Recognition without Human”

–  The accuracies are very close

•  With human is +9.49% better than without human

–  The current motion representation heavily rely on the backgrounds

Page 15: 【ECCV 2016 BNMW】Human Action Recognition without Human

Future work

•  This is a suggestive reality

–  We must accept this reality to realize better motion representation

–  Pure motion representation is an urgent work!

•  More sophisticated approach

•  Human only motion