This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Extract and structure knowledge (invariants & models) for• Perception for video understanding (perceptual, visual world)• Maintenance of the 3D coherency throughout time (physical world of 3D spatio-temporal
objects)• Event recognition (semantics world)
• Evaluation, control and learning (systems world)
Video Understanding: Approach
4
• Strong impact for visual surveillance in transportation (metro station, trains, airports, aircraft, harbors)
• Control access, intrusion detection and Video surveillance in building
• Traffic monitoring (parking, vehicle counting, street monitoring, driver assistance)
• Bank agency monitoring
• Risk management (simulation)
• Video communication (Mediaspace)
• Sports monitoring (Tennis, Soccer, F1, Swimming pool monitoring)
• New application domains : Aware House, Health (HomeCare), Teaching, Biology, Animal Behaviors, …
Creation of a start-up Keeneo July 2005 (15 persons): http://www.keeneo.com/
Video Understanding Applications
5
SceneScene ModelsModels (3D)(3D) - Scene objects- Scene objects - zones - zones - calibration matrices - calibration matrices
• Petri net, grammar, constraint resolution and propagation, verification of temporal constraints.
12
Event Recognition
Outline:• Event Representation
• Temporal Scenario Recognition• Scenarios representation• recognition process
• Results: recognition of several scenarios
• CARETAKER: management of large multimedia collections• Trajectory clustering• Activity clustering
• Frequent Composite Event Discovery in Videos (Learning Scenario Models)
13
Event Representation
Several entities are involved in the scene understanding process:
• Moving region: any intensity change between images.
• Context object: predefined static object of the scene environment (entrance zone, wall, equipment, door...).
• Physical object : any moving region which has been tracked and classified (person, group of persons, vehicle, … etc).
• Physical object of interest: meaningful object, but depending on applications (person/ door, parked vehicle, … etc).
14
Event Representation
Events and scenarios : large variety • more or less composed of sub-events (running/fighting).• involving few/many actors (football game).• general (standing)/sensor and application/view (sit down, stop) dependent.• spatial granularity: the view observed by one camera/the whole site.• temporal granularity: instantaneous/long term with complex relationships
(synchronize).
3 levels of complexity depending on the complexity of temporal relations and on the number of physical objects :
• non-temporal constraint relative to one physical object (sitting). Intuitive association of probabilities to get precision.
• temporal sequence of sub-scenarios relative to one physical object (open the door, go toward the chair then sit down).
• complex temporal constraints relative to several physical objects (A meets B at the coffee machine then C gets up and leaves). Need of logic expressiveness but high complexity for meaningful probabilities and sensitive to vision errors.
15
Video events: real world notions corresponding to short actions up to activities.
• Primitive State: a spatio-temporal property linked to vision routines involving one or several actors, valid at a given time point or stable on a time interval
Ex : « close», « walking», « sitting»
• Composite State: a combination of primitive states
• Primitive Event: a significant change of states
Ex : « enters», « stands up», « leaves »
• Composite Event: a combination of states and events. Corresponds to a long term (symbolic, application dependent) activity.
Ex : « fighting», « vandalism»
Event Representation
16
A video event is mainly constituted of five parts:• Physical objects: all real world objects present in the scene observed by the
cameras
Mobile objects, contextual objects, zones of interest
• Components: list of states and sub-events involved in the event
• Forbidden Components: list of states and sub-events that must not be detected in the event
• Constraints: symbolic, logical, spatio-temporal relations between components or physical objects
• Action: a set of tasks to be performed when the event is recognized
Event Representation
17
Event RepresentationRepresentation Language to describe Temporal Events of interest.
(e1 before e3) (e2 before e4) (e4 during e3) ) action (“Bank attack!!!”) )
18
Scenario Recognition: Temporal Constraints
• Overview of the recognition process
• Recognition of elementary scenarios
• Scenario compilation
• Recognition of composed scenarios
• Scenario recognition and uncertainty
• Example of the recognition of a “Bank attack” scenario and more…
19
Scenario Recognition: issues
Many event representations are • not easy to use and not re-usable (need learning, e.g. incremental, supervised)
• scenarios defined at one time point / interval,• do not let the experts to describe their scenarios in a natural way (context awareness,
user feedback).
Event recognition approaches :
• allow an efficient recognition of events, but • some temporal constraints cannot be processed (e.g. static scenes,
synchronization)• they require that the events are bounded in time (temporal complexity).
• deal partially with inaccuracy and uncertainty (e.g. lost tracks), are not linked to signal.
20
• Scenario (algorithmic notion): any type of video events
• Two types of scenarios:
• elementary (primitive states)
• composed (composite states and events).
• Algorithm in two steps.
Scenario Recognition: Temporal Constraints(T. Vu)
1) Recognize all Elementary Scenario models
2) Trigger the recognition of selected Composed Scenario models
1) Recognize all triggered Composed Scenario models
2) Trigger the recognition of other Composed Scenario models
Tracked Mobile Objects
Recognized Scenarios
A priori Knowledge- Scenario knowledge base - 3D geometric & semantic information of the observed
environment
21
Scenario Recognition: Elementary Scenario
• The recognition of a compiled elementary scenario model me consists of a loop:
1. Choosing a physical object for each physical-object variable
2. Verifying all constraints linked to this variable
me is recognized if all the physical-object variables are assigned a value
and all the linked constraints are satisfied.
22
Scenario Recognition: Composed Scenario
• Problem:
• given a scenario model mc = (m1 before m2 before m3);
• if a scenario instance i3 of m3 has been recognized
• then the main scenario model mc may be recognized.
• However, the classical algorithms will try all combinations of scenario instances of m1 and of m2 with i3
a combinatorial explosion.
• Solution:
decompose the composed scenario models into simpler scenario models in an initial (compilation) stage such as each composed scenario model is composed of two components: mc = (m4 before m3)
• A generic formalism to help experts model intuitively states, events and scenarios.
• Recognition algorithm processes temporal operators in an efficient way.
• The recognition of complex scenarios (large number of actors) becomes real time.
• However, uncertainty needs to be taken care
27
Scenario recognition: capacity of prediction
• Issue: in the bank monitoring application, an alert “Bank attack!!!” is triggered when a scenario “Bank_attack” is recognized. However, it can be too late for security agents to cope with the situation.
• Requirement: is the temporal scenario recognition method able to predict scenarios that may occur in the future?
• Answer: YES, the recognition algorithm can predict scenarios that may occur by adding automatically alerts (during the compilation) to some generated scenario models. This task can be specified in scenario models.
28
Scenario recognition : uncertainty
• Temporal precision
• Issue: several scenario models are defined with too precise temporal constraints they cannot be recognized with real videos.
• Solution: we defined a temporal tolerance Δt as an integer, then all temporal comparisons are estimated using an approximation of Δt.
• Incorrect mobile object tracking
• Issue: the vision algorithms may loose the track of several detected mobile objects the system cannot recognize correctly scenario occurrences in several videos.
• Solution1: experts describe different scenario models representing various situations corresponding to several combinations of physical objects.
29
Uncertainty RepresentationSolution2: management of the vision uncertainty (likelihood):
• within predefined event models (off-line)
– coefficients (on mobile objects and components) are provided by default.
• propagated (on-line) through the event instances
1. mobile objects: computed by vision algorithms.
2. primitive states (elementary):
– a coefficient to each physical object for representing the likelihood relation between the state and each involved mobile object.
3. events and composite states (composed):
– a coefficient to each component for representing the likelihood relation between the event and each component.
– defining a threshold into each state/event model for specifying at which likelihood level the given state/event should be recognized.
30
Uncertainty Representation
PrimitiveState (Person_Close_To_Vehicle,Physical Objects ( (p : Person, 0.7), (v : Vehicle, 0.3) )Constraints ((p distance v ≤ close_distance)
Scenario recognition: Results Bank agency monitoring : Paris (M. Maziere)
33Scenario Recognition: Results Vandalism in metro in Nuremberg
34
Video Understanding for Trichogramma Monitoring
35Scenario recognition: Results HomeCare Monitoring (N. Zouba)
Visualization of a recognized event in the Gerhome laboratory
• The person is recognized with the posture "standing with The person is recognized with the posture "standing with one arm upone arm up”, “located ”, “located in the in the kitchenkitchen” and “using the ” and “using the microwavemicrowave”.”.
36
•Example of the Unloading Front Operation (global)
• Semantic knowledge extracted by the off-line long term analysis of on-line interactions between moving objects and contextual objects:
• 70% of people are coming from north entrance• Most people spend 10 sec in the hall• 64% of people are going directly to the gates without stopping at the ticket
machine• At rush hours people are 40% quicker to buy a ticket, …
• Issues:• At which level(s), should be designed clustering techniques: low level (image
features)/ middle level (trajectories, shapes)/ high level (primitive events)? • to learn what : visual concepts, scenario models? • uncertainty (noise/outliers/rare), what are the activities of interest?• Parameter tuning (e.g. distance, clustering tech.) and • performance evaluation (criteria, ground-truth).
Knowledge Discovery: achievements
58
Video Understanding : Learning Scenario Models (A. Toshev)
or Frequent Composite Event Discovery in Videosevent time series
59
• Why unsupervised model learning in Video Understanding?
• Complex models containing many events,
• Large variety of models,• Different parameters for different
models
The learning of models should be automated.
Learning Scenarios: Motivation
Video surveillance in a parking lot
60
• Input: A set of primitive events from the vision module:object-inside-zone(Vehicle, Entrance) [5,16]
• Output: frequent event patterns.
• A pattern is a set of events:object-inside-zone(Vehicle, Road) [0, 35]object-inside-zone(Vehicle, Parking_Road) [36, 47]object-inside-zone(Vehicle, Parking_Places) [62, 374]object-inside-zone(Person, Road) [314, 344]
Learning Scenarios: Problem Definition
• Goals:• Automatic data-driven modeling of composite events,
• Reoccurring patterns of primitive events correspond to frequent activities,
Find classes with large size & similar patterns.
Zones
61
• Approach:• Iterative method from data mining for efficient frequent patterns discovery in large
datasets,• A PRIORI: Sub-patterns of frequent patterns are also frequent (Agrawal & Srikant,
1995),• At i th step consider only i-patterns which have frequent (i-1) – sub-patterns the search space is thus pruned.
• A PRIORI-property for activities represented as classes:
size(C m-1) ≥ size(C m)
where C m is a class containing patterns of length m, C m-1 is a sub-activity of C m.
Learning Scenarios: A PRIORI Method
62
Learning Scenarios: A PRIORI Method
Merge two i-patterns with (i-1) primitive events in common to form an (i+1)-pattern:
63
2 types of Similarity Measure between event patterns :• similarities between event attributes• similarities between pattern structures
Generic Similarity Measure :• Generic properties when possible easy usage in different domains,• It should incorporate domain-dependent properties relevance to the
concrete application.
Learning Scenarios: Similarity Measure
64
Attributes: the corresponding events in two patterns should have similar (same) attributes (duration, names, object types,...).
Learning Scenarios: Attribute Similarity
• Comparison between corresponding events (same type, same color).
• For numeric attributes: G(x,y)=
• attr(pi, pj) = average of all event attribute similarities.
xy
yx
e
2
65
Test data:
•Video surveillance at a parking lot,
•4 hours records from 2 days in 2 test sets,
•Every test set contains appr. 100 primitive events.
Learning Scenarios: Evaluation
Results: In both test sets the following event pattern was recognized:object-inside-zone(Vehicle, Road)
object-inside-zone(Vehicle, Parking_Road)
object-inside-zone(Vehicle, Parking_Places)
object-inside-zone(Person, Parking_Road)
66
Test data:
•Video surveillance at a parking lot,
•4 hours records from 2 days in 2 test sets,
•Every test set contains appr. 100 primitive events.
Learning Scenarios: Evaluation
Results: In both test sets the following event pattern was recognized:object-inside-zone(Vehicle, Road)
object-inside-zone(Vehicle, Parking_Road)
object-inside-zone(Vehicle, Parking_Places)
object-inside-zone(Person, Parking_Road)
67
Test data:
•Video surveillance at a parking lot,
•4 hours records from 2 days in 2 test sets,
•Every test set contains appr. 100 primitive events.
Learning Scenarios: Evaluation
Results: In both test sets the following event pattern was recognized:object-inside-zone(Vehicle, Road)
object-inside-zone(Vehicle, Parking_Road)
object-inside-zone(Vehicle, Parking_Places)
object-inside-zone(Person, Parking_Road)
68
Test data:
•Video surveillance at a parking lot,
•4 hours records from 2 days in 2 test sets,
•Every test set contains appr. 100 primitive events.
Learning Scenarios: Evaluation
Results: In both test sets the following event pattern was recognized:object-inside-zone(Vehicle, Road)
object-inside-zone(Vehicle, Parking_Road)
object-inside-zone(Vehicle, Parking_Places)
object-inside-zone(Person, Parking_Road)
Maneuver Parking!
69
Conclusion:• Application of a data mining approach,• Handling of uncertainty without losing computational effectiveness,• General framework: only a similarity measure and a primitive event library
must be specified.
Future Work:• Other similarities,• Handling of different aspects of uncertainty,• Qualification of the learned patterns,
• Frequent equal interesting ?• Different applications: different event libraries or features.
Learning Scenarios: Conclusion & Future Work
70
A global framework for building video understanding systems,
• For Individuals, Groups of People, Vehicles, Crowd, or Animals …
Perspectives:
•Object and video event detection• Finer human shape description: gesture models, face detection • Video analysis robustness: reliability computation (e.g. tracking)
• Knowledge Acquisition and Learning• Design of learning techniques (clustering of low/middle/high level features) to complement a
• System Reusability• Use of program supervision techniques: dynamic configuration of programs and parameters • Scaling issue: managing large network of heterogeneous sensors (cameras, PTZ,