From Unsupervised to Semi-Supervised Event Detection Wen-Sheng Chu Robotics Institute, Carnegie Mellon University July 9, 2013 1 Jeffery Cohn Fernando De la Torre
53
Embed
From Unsupervised to Semi-Supervised Event Detection
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1. From Unsupervised to Semi-Supervised Event Detection
Wen-Sheng Chu Robotics Institute, Carnegie Mellon University July
9, 2013 1 Jeffery CohnFernando De la Torre
2. Outline 1. Unsupervised Temporal Commonality Discovery (Chu
et al, ECCV12) 2. Personalized Facial Action Unit Detection (Chu et
al, CVPR13) 2
3. Unsupervised Commonality Discovery in Images Where are the
repeated patterns? 3 (Chu10, Mukherjee11, Collins12)
4. Unsupervised Commonality Discovery in Videos? We name it
Temporal Commonality Discovery (TCD). Goal: Given two videos,
discover common events in an unsupervised fashion. 4
5. TCD is hard! 1) No prior knowledge on commonalities We do
not know what, where and how many commonalities exist in the video
2) Exhaustive search are computationally prohibitive E.g., two
videos with 300 frames have >8,000,000,000 possible matches.
possible locations possible lengths possibilities/sequence Another
possibilities/sequence 5
6. Formulation 6 Integer programming!
7. Optimization: Interpretation 7
8. Optimization: Native Search Complexity 8
9. Optimization: Branch-and-Bound Similar to the idea of ESS
(Lampert08), we search the space by splitting intervals. 9
12. Unlikely search regions (B1,E1,B2,E2; -10) Searching
Structure (B1,E1,B2,E2; 32) Priority queue (sorted by bound scores)
(B1,E1,B2,E2; -50) (B1,E1,B2,E2; -105) State S = (Rectangle set;
score) 12
13. (B1,E1,B2,E2; -105) Algorithm (B1,E1,B2,E2; 32) Priority
queue (sorted by bound scores) (B1,E1,B2,E2; -50) (B1,E1,B2,E2;
-105) Top state 1. Pop out the top state 2. Split 13
14. (B1,E1,B2,E2; -105) Algorithm (B1,E1,B2,E2; 32) Priority
queue (sorted by bound scores) (B1,E1,B2,E2; -50) Top state
(B1,E1,B2,E2; -76) (B1,E1,B2,E2; -61) 3. Compute bounding scores 4.
Push back the split states 14
15. Algorithm (B1,E1,B2,E2; 32) Priority queue (sorted by bound
scores) (B1,E1,B2,E2; -50) Top state (B1,E1,B2,E2; -76)
(B1,E1,B2,E2; -61) The algorithm stop when the top state contains
an unique rectangle. Omit most of the search space with large
distances 15
16. Compare with Relevant Work 1. Difference between TCD and
ESS [1]/STBB[2] Different learning framework: Unsupervised v.s.
Supervised New bounding functions for TCD 2. Difference between TCD
and [3] Different objective: Commonality Discovery v.s. Temporal
Clustering [1] Efficient subwindow search: A branch and bound
framework for object localization, PAMI 2009. [2] Discriminative
video pattern search for efficient action detection, PAMI 2011.
16
17. Experiment (1): Synthesized Sequence Histograms of the
discovered pair of subsequences 17
18. Experiment (2): Discover Common Facial Actions RU-FACS
dataset* Interview videos with 29 subjects 5000~8000 frames/video
Collect 100 segments that containing smiley mouths (AU- 12)
Evaluate in terms of averaged precision 18 * Automatic recognition
of facial actions in spontaneous expressions, Journal of Multimedia
2006.
19. Experiment (2): Discover Common Facial Actions 19
20. Parametric settings for Sliding Windows (SW) Log of
#evaluations: Quality of discovered patterns: a Experiment (2):
Speed Evaluation Speed #evaluation of the distance function log nT
C D nSW i d(r SW i ) d(r T C D ) 20
21. Experiment (2): Discover Common Facial Actions Compare with
LCCS* on -distance 21 * Frame-level temporal calibration of
unsynchronized cameras by using Longest Consecutive Common
Subsequence, ICASSP 2009.
22. Experiment (3): Discover Multiple Common Human Motions
CMU-Mocap dataset: http://mocap.cs.cmu.edu/ 15 sequences from
Subject 86 1200~2600 frames and up to 10 actions/seq Exclude the
comparison with SW because it needs >1012 evaluations 22
23. Experiment (3): Discover Multiple Common Human Motions
23
24. Experiment (3): Discover Multiple Common Human Motions
Compare with LCCS* on -distance 24
25. Extension: Video Indexing Goal: Given a query , find the
best common subsequence in the target video A straightforward
extension: Temporal Search Space 25
26. A Prototype for Video Indexing 26
27. Summary 27
28. Questions? [1+ Common Visual Pattern Discovery via
Spatially Coherent Correspondences, In CVPR 2010. [2+
MOMI-cosegmentation: simultaneous segmentation of multiple objects
among multiple images, In ACCV 2010. [3+ Scale invariant
cosegmentation for image groups, In CVPR 2011. [4+ Random walks
based multi-image segmentation: Quasiconvexity results and
GPU-based solutions, In CVPR 2012. [5+ Frame-level temporal
calibration of unsynchronized cameras by using Longest Consecutive
Common Subsequence, In ICASSP 2009. [6+ Efficient ESS with
submodular score functions, In CVPR 2011. 28
http://humansensing.cs.cmu.edu/wschu/
29. Outline 1. Unsupervised Temporal Commonality Discovery (Chu
et al, ECCV12) 2. Selective Transfer Machine for Personalized
Facial Action Unit Detection (Chu et al, CVPR13) 29
30. AU 6+12 Facial Action Units (AU) 30
31. Main Idea 31
32. Related Work: Features 32
33. Related Work: Classifiers 33
34. Feature Bias Person specific! 34
35. Occurrence Bias 35
36. Selective Transfer Machine (STM) Formulation Maximizes
margin of penalized SVM Minimize distribution mismatch 36
38. Goal (2): Minimize Distribution Mismatch Kernel Mean
Matching (KMM)* 38 * Covariate shift by kernel mean matching,
Dataset shift in machine learning, 2009.
39. Goal (2): Minimize Distribution Mismatch Groundtruth Bad
estimator for testing data! 39
40. Better fitting! Groundtruth Selection by reweighting
training data 40 Goal (2): Minimize Distribution Mismatch
41. 41
42. 42 Optimization: Alternate Convex Search
43. 43 Optimization: Alternative Convex Search
44. Compare with Relevant Work 44 [1] "Covariate shift by
kernel mean matching," Dataset shift in machine learning, 2009. [2]
"Transductive inference for text classification using support
vector machines," In ICML 1999. [3] "Domain adaptation problems: A
DASVM classification technique and a circular validation strategy,"
PAMI 2010.
45. Experiments Features SIFT descriptors on 49 facial
landmarks Preserve 98% energy using PCA 45 Datasets #Subjects
#Videos #Frm/vid Content CK+ 123 593 ~20 NeutralPeak GEMEP-FERA 7
87 20~60 Acting RU-FACS 29 29 5000~7500 Interview
46. Experiment (1): Synthetic Data 46
47. Two protocols PS1: train/test are separate data of the same
subject PS2: training subjects include test subject (same protocol
in [2]) GEMEP-FERA Experiment (2): Comparison with Person- specific
(PS) Classifiers 47
52. Summary Person-specific biases exist among face- related
problems, esp. facial expression We propose to alleviate the biases
by personalizing classifiers using STM Next Joint optimization in
terms of Reduce the memory cost using SMO Explore more potential
biases in face problems, e.g., occurrence bias 52
53. Questions? [1] "Covariate shift by kernel mean matching,"
Dataset shift in machine learning, 2009. [2] "Transductive
inference for text classification using support vector machines,"
In ICML 1999. [3] "Domain adaptation problems: A DASVM
classification technique and a circular validation strategy," PAMI
2010. *4+ Integrating structured biological data by kernel maximum
mean discrepancy, Bioinformatics 2006. *5+ Meta-analysis of the
first facial expression recognition challenge, IEEE Trans. on
Systems, Man, and Cybernetics, Part B, 2012. 53
http://humansensing.cs.cmu.edu/wschu/