Unsupervised Modelling , Detection and Localization of Anomalies in Surveillance Videos Project Advisor : Prof. Amitabha Mukerjee Deepak Pathak (10222) Abhijit Sharang (10007)
Dec 17, 2015
Unsupervised Modelling , Detection and Localization of Anomalies in Surveillance Videos
Project Advisor : Prof. Amitabha Mukerjee
Deepak Pathak (10222)Abhijit Sharang (10007)
What is an “Anomaly” ?
• Anomaly refers to the unusual (or rare event) occurring in the video• Definition is ambiguous and depends on context
Idea :• Learn the “usual” events in the video and use the
information to tag the rare events.
Modelling • UnsupervisedModelling
Detection• Anomalous
Clip Detection
Localization• Spatio-
Temporal Anomaly Localization
Step 1 : Unsupervised Modelling• Model the “usual” behaviour of scene using parametric bayesian
modelling.
• Topic Models : Leveraged from Natural Language Processing
• Given: Document and Vocabulary• Document is histogram over vocabulary• Goal: Identify topics in a given set of Documents
[Topics are latent variables]
Alternate view : • Clustering in topic space• Dimensionality reduction
NLP to Vision : Notations
Text Analysis Video Analysis
Vocabulary of words Vocabulary of visual words
Text documents Video clips
Topics Actions/Events
Video Clips (or Documents)
• 45 minute video footage of traffic available• 25 frames per second• 4 kinds of anomaly• Divided into clips of fixed size of 4 seconds (obtained
empirically last semester)
Feature Extraction
• Three components of visual word :• Location• Spatio-Temporal Gradient and Flow Information• Object size
• Features are extracted only from foreground pixels for increasing the efficiency
Foreground Extraction
• Extracted using ViBe foreground algorithm and smoothened afterwards using morphological filters
Visual Word• Location :
• Each frame of dimension m x n is divided into blocks of 20 x 20
• HOG - HOF descriptor :• For each block, a foreground pixel was selected at random and spatio-temporal
descriptor was computed around it.• From the descriptors obtained from the training set, 200,000 descriptors were randomly
selected. 20 cluster centres were obtained from these descriptors by k-means clustering.• Each descriptor was assigned to one of these centres.
• Size :• In each block , we compute the connected components of the foreground pixels• The size of the connected components is quantised to two values: large and small
pLSA : Topic Model• Fixed number of topics : . Each word in
the vocabulary is attached with a single topic.
• Topics are hidden variables. Used for modelling the probability distribution
• Computation• Marginalise over hidden variables• Conditional independence assumption:
p(w|z) and p(d|z) are independent of each other
Step 2 : Detection
• We propose “Projection Model Algorithm” with the following key idea –
Project the information learnt in training onto the test document word space, and analyze each word individually to tag it as usual or anomalous.
• Robust to the quantity of anomaly present in video clip.
Preliminaries• Bhattacharyya Distance between documents :
• For documents and represented by the probability distributions in topic space and respectively, the distance is defined by
• Cumulative histogram of m documents: • A histogram obtained by stacking the word count histogram of the m
documents.
• Spatial neighbourhood of a word : • For a word at location , all words at locations , and with the same flow and
size quantisation
• Significant distribution of neighbourhood word : The distribution of a word is significant if its frequency in the cumulative histogram is greater than a threshold
word
Test document
m nearest training documents
Bhattacharya distance
Cumulative histogram of
words
Check Frequency
Eight Spatial neighbours of
wordWord occurs more than times
More than neighbours have significant distribution
Word is “Usual”
Detection :
• Now each visual word has been labelled as “anomalous” or “usual”.
• Depending on the amount of anomalous words, call the complete test document as anomalous or usual.
Step 3 : Localization
• Spatial Localization :
Since every word has location information in it, w can directly localize the anomalous words in test document to their spatial locality.
• Temporal Localization :
This requires some book-keeping while creating term-frequency matrix of documents. We could maintain a list of frame numbers corresponding to document-word pair.
Results Demo
• Anomaly detection• Anomaly localization
Results : Precision-Recall Curve
Results : ROC Curve
Main Contributions
• Richer word feature space by incorporating local spatio-temporal gradient-flow information.
• Proposed “projection model algorithm” which is agnostic to quantity of anomaly present.
• Anomaly Localization in spatio-temporal domain.
• Other Benefit :Extraction of common actions corresponding to mostprobable topics.
References• Varadarajan, Jagannadan, and J-M. Odobez. "Topic models for scene analysis and abnormality
detection." Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on. IEEE, 2009.
• Niebles, Juan Carlos, Hongcheng Wang, and Li Fei-Fei. "Unsupervised learning of human action categories using spatial-temporal words." International Journal of Computer Vision 79.3 (2008): 299-318.
• Olivier Barnich and Marc Van Droogenbroeck. “Vibe: A universal background subtraction algorithm for video sequences”. Image Processing, IEEE Transactions on, 20(6):1709-1724, 2011.
• Mahadevan, Vijay, et al. "Anomaly detection in crowded scenes." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010.
• Roshtkhari, Mehrsan Javan, and Martin D. Levine. "Online Dominant and Anomalous Behavior Detection in Videos.“
• Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld. “Learning realistic human actions from movies”. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1-8. IEEE, 2008.
• Hofmann, Thomas. "Probabilistic latent semantic indexing." Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1999.
• Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research 3 (2003): 993-1022.
Summary (Last Semester)• Related Work• Image Processing
– Foreground Extraction– Dense Optical Flow– Blob extraction
• Implementing adapted pLSA• Empirical estimation of certain parameters• Tangible Actions/Topics Extraction
Extra Slides
• About• Background subtraction• HOG HOF• pLSA and its EM• Previous results
Background subtraction
• Extraction of foreground from image• Frame difference• D(t+1) = | I(x,y,t+1) – I(x,y,t) |• Thresholding on the value to get a binary output
• Simplistic approach(can do with extra data but cannot miss any essential element)
• Foreground smoothened using median filter
Optical flow example
(a) Translation perpendicular to a surface. (b) Rotation about axis perpendicular to image plane. (c) Translation parallel to a surface at a constant distance. (d) Translation parallel to an obstacle in front of a more distant background.
Slides from Apratim Sharma’s presentation on optical flow,CS676
Optical flow mathematics
• Gradient based optical flow• Basic assumption:• I(x+Δx,y+Δy,t+Δt) = I(x,y,t)• Expanded to get IxVx+IyVy+It = 0
• Sparse flow or dense flow• Dense flow constraint:
• Smoothness : motion vectors are spatially smooth• Minimise a global energy function
pLSA : Topic Model• Fixed number of topics : . Each word in
the vocabulary is attached with a single topic.
• Topics are hidden variables. Used for modelling the probability distribution
• Computation• Marginalise over hidden variables• Conditional independence assumption:
p(w|z) and p(d|z) are independent of each other
EM Algorithm: Intuition
• E-Step• Expectation step where expectation of the likelihood function is
calculated with the current parameter values• M-Step• Update the parameters with the calculated posterior probabilities• Find the parameters that maximizes the likelihood function
EM: Formalism
EM in pLSA: E Step
• It is the probability that a word w occurring in a document d, is explained by aspect z
(based on some calculations)
EM in pLSA: M Step
• All these equations use p(z|d,w) calculated in E Step
• Converges to local maximum of the likelihood function
Results (ROC Plot)
Results (PR Curve)