Unsupervised Modelling, Detection and Localization of Anomalies in Surveillance Videos Project Advisor : Prof. Amitabha Mukerjee Deepak Pathak (10222)

Unsupervised Modelling , Detection and Localization of Anomalies in Surveillance Videos

Project Advisor : Prof. Amitabha Mukerjee

Deepak Pathak (10222)Abhijit Sharang (10007)

What is an “Anomaly” ?

• Anomaly refers to the unusual (or rare event) occurring in the video• Definition is ambiguous and depends on context

Idea :• Learn the “usual” events in the video and use the

information to tag the rare events.

Modelling • UnsupervisedModelling

Detection• Anomalous

Clip Detection

Localization• Spatio-

Temporal Anomaly Localization

Step 1 : Unsupervised Modelling• Model the “usual” behaviour of scene using parametric bayesian

modelling.

• Topic Models : Leveraged from Natural Language Processing

• Given: Document and Vocabulary• Document is histogram over vocabulary• Goal: Identify topics in a given set of Documents

[Topics are latent variables]

Alternate view : • Clustering in topic space• Dimensionality reduction

NLP to Vision : Notations

Text Analysis Video Analysis

Vocabulary of words Vocabulary of visual words

Text documents Video clips

Topics Actions/Events

Video Clips (or Documents)

• 45 minute video footage of traffic available• 25 frames per second• 4 kinds of anomaly• Divided into clips of fixed size of 4 seconds (obtained

empirically last semester)

Feature Extraction

• Three components of visual word :• Location• Spatio-Temporal Gradient and Flow Information• Object size

• Features are extracted only from foreground pixels for increasing the efficiency

Foreground Extraction

• Extracted using ViBe foreground algorithm and smoothened afterwards using morphological filters

Visual Word• Location :

• Each frame of dimension m x n is divided into blocks of 20 x 20

• HOG - HOF descriptor :• For each block, a foreground pixel was selected at random and spatio-temporal

descriptor was computed around it.• From the descriptors obtained from the training set, 200,000 descriptors were randomly

selected. 20 cluster centres were obtained from these descriptors by k-means clustering.• Each descriptor was assigned to one of these centres.

• Size :• In each block , we compute the connected components of the foreground pixels• The size of the connected components is quantised to two values: large and small

pLSA : Topic Model• Fixed number of topics : . Each word in

the vocabulary is attached with a single topic.

• Topics are hidden variables. Used for modelling the probability distribution

• Computation• Marginalise over hidden variables• Conditional independence assumption:

p(w|z) and p(d|z) are independent of each other

Step 2 : Detection

• We propose “Projection Model Algorithm” with the following key idea –

Project the information learnt in training onto the test document word space, and analyze each word individually to tag it as usual or anomalous.

• Robust to the quantity of anomaly present in video clip.

Preliminaries• Bhattacharyya Distance between documents :

• For documents and represented by the probability distributions in topic space and respectively, the distance is defined by

• Cumulative histogram of m documents: • A histogram obtained by stacking the word count histogram of the m

documents.

• Spatial neighbourhood of a word : • For a word at location , all words at locations , and with the same flow and

size quantisation

• Significant distribution of neighbourhood word : The distribution of a word is significant if its frequency in the cumulative histogram is greater than a threshold

word

Test document

m nearest training documents

Bhattacharya distance

Cumulative histogram of

words

Check Frequency

Eight Spatial neighbours of

wordWord occurs more than times

More than neighbours have significant distribution

Word is “Usual”

Detection :

• Now each visual word has been labelled as “anomalous” or “usual”.

• Depending on the amount of anomalous words, call the complete test document as anomalous or usual.

Step 3 : Localization

• Spatial Localization :

Since every word has location information in it, w can directly localize the anomalous words in test document to their spatial locality.

• Temporal Localization :

This requires some book-keeping while creating term-frequency matrix of documents. We could maintain a list of frame numbers corresponding to document-word pair.

Results Demo

• Anomaly detection• Anomaly localization

Results : Precision-Recall Curve

Results : ROC Curve

Main Contributions

• Richer word feature space by incorporating local spatio-temporal gradient-flow information.

• Proposed “projection model algorithm” which is agnostic to quantity of anomaly present.

• Anomaly Localization in spatio-temporal domain.

• Other Benefit :Extraction of common actions corresponding to mostprobable topics.

References• Varadarajan, Jagannadan, and J-M. Odobez. "Topic models for scene analysis and abnormality

detection." Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on. IEEE, 2009.

• Niebles, Juan Carlos, Hongcheng Wang, and Li Fei-Fei. "Unsupervised learning of human action categories using spatial-temporal words." International Journal of Computer Vision 79.3 (2008): 299-318.

• Olivier Barnich and Marc Van Droogenbroeck. “Vibe: A universal background subtraction algorithm for video sequences”. Image Processing, IEEE Transactions on, 20(6):1709-1724, 2011.

• Mahadevan, Vijay, et al. "Anomaly detection in crowded scenes." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010.

• Roshtkhari, Mehrsan Javan, and Martin D. Levine. "Online Dominant and Anomalous Behavior Detection in Videos.“

• Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld. “Learning realistic human actions from movies”. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1-8. IEEE, 2008.

• Hofmann, Thomas. "Probabilistic latent semantic indexing." Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1999.

• Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research 3 (2003): 993-1022.

Summary (Last Semester)• Related Work• Image Processing

– Foreground Extraction– Dense Optical Flow– Blob extraction

• Implementing adapted pLSA• Empirical estimation of certain parameters• Tangible Actions/Topics Extraction

Extra Slides

• About• Background subtraction• HOG HOF• pLSA and its EM• Previous results

Background subtraction

• Extraction of foreground from image• Frame difference• D(t+1) = | I(x,y,t+1) – I(x,y,t) |• Thresholding on the value to get a binary output

• Simplistic approach(can do with extra data but cannot miss any essential element)

• Foreground smoothened using median filter

Optical flow example

(a) Translation perpendicular to a surface. (b) Rotation about axis perpendicular to image plane. (c) Translation parallel to a surface at a constant distance. (d) Translation parallel to an obstacle in front of a more distant background.

Slides from Apratim Sharma’s presentation on optical flow,CS676

Optical flow mathematics

• Gradient based optical flow• Basic assumption:• I(x+Δx,y+Δy,t+Δt) = I(x,y,t)• Expanded to get IxVx+IyVy+It = 0

• Sparse flow or dense flow• Dense flow constraint:

• Smoothness : motion vectors are spatially smooth• Minimise a global energy function

pLSA : Topic Model• Fixed number of topics : . Each word in

the vocabulary is attached with a single topic.

• Topics are hidden variables. Used for modelling the probability distribution

• Computation• Marginalise over hidden variables• Conditional independence assumption:

p(w|z) and p(d|z) are independent of each other

EM Algorithm: Intuition

• E-Step• Expectation step where expectation of the likelihood function is

calculated with the current parameter values• M-Step• Update the parameters with the calculated posterior probabilities• Find the parameters that maximizes the likelihood function

EM: Formalism

EM in pLSA: E Step

• It is the probability that a word w occurring in a document d, is explained by aspect z

(based on some calculations)

EM in pLSA: M Step

• All these equations use p(z|d,w) calculated in E Step

• Converges to local maximum of the likelihood function

Results (ROC Plot)

Results (PR Curve)

Unsupervised Modelling, Detection and Localization of Anomalies in Surveillance Videos Project Advisor : Prof. Amitabha Mukerjee Deepak Pathak (10222)

Documents

usual slide

small slide

semester slide

efficiency slide

preliminaries slide

topic model slide

word test document

visual word location