Top Banner
Video Understanding What to Expect Today and Tomorrow? Cees Snoek
15

Cees Snoek (UvA) @ CMC Video Formats

Feb 19, 2017

Download

Technology

iMMovator
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cees Snoek (UvA) @ CMC Video Formats

VideoUnderstandingWhattoExpectTodayandTomorrow?

CeesSnoek

Page 2: Cees Snoek (UvA) @ CMC Video Formats

2

Today: Find the cat

Page 3: Cees Snoek (UvA) @ CMC Video Formats

3

How difficult is the problem?

Humanvisionconsumes50%brainpower…

Van Essen, Science 1992

Page 4: Cees Snoek (UvA) @ CMC Video Formats

4

Video recognition in a nutshell

Visualization by Jasper Schulte

Page 5: Cees Snoek (UvA) @ CMC Video Formats

5

 NIST TRECVID Benchmark

 Promote progress in video retrieval research

 Open big data, tasks, evaluation and innovation

International video competition

http://trecvid.nist.gov/

Aircraft

Beach Mountain

People marching

Police/Security

Flower

Concept detection task

Page 6: Cees Snoek (UvA) @ CMC Video Formats

6

Competition overview

From University-lab to spin-off and your mobile phone

• = 1000+ others

Universities win Start-ups win

Snoek et al., TRECVID 2004-2015

Page 7: Cees Snoek (UvA) @ CMC Video Formats

7

Latest jump due to deep learning 2006 2009 2015

Mea

n av

erag

e pr

ecis

ion

Progress in video recognition

Page 8: Cees Snoek (UvA) @ CMC Video Formats

8

Video search demo’s

Social media Forensics Cultural heritage

Page 9: Cees Snoek (UvA) @ CMC Video Formats

9

Qualcomm Zeroth provides on-device video recognition

Page 10: Cees Snoek (UvA) @ CMC Video Formats

10

Tomorrow: The Internet of things that video

Page 11: Cees Snoek (UvA) @ CMC Video Formats

11

Need to understand what is happening where and when?

Visualization by Jan van Gemert

Page 12: Cees Snoek (UvA) @ CMC Video Formats

12

Proposals facilitate complex representations

1. Generate action proposals

Ground truth Super-voxel segmentation Proposals from merged voxels

[Jain et al, CVPR14]

Page 13: Cees Snoek (UvA) @ CMC Video Formats

13

2. Encode video proposals as 15,000 object scores

Actions have object preference, relation is generic

15k 15k

[Jain et al, CVPR15]

AlexNet learned from 15K objects Objects make sense for actions

Page 14: Cees Snoek (UvA) @ CMC Video Formats

14

Enables understanding without video and action examples

3. Translate objects to actions

[Jain et al, ICCV15]

Prediction Ground truth

Page 15: Cees Snoek (UvA) @ CMC Video Formats

15

 Video recognition technology is ready for use, today

 Spatiotemporal video understanding is coming soon

Conclusion

www.ceessnoek.info