3 rd SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES REPRESENTATION, RECOGNITION AND VISUALISATION OF HUMAN BEHAVIOURS FOR VIDEO INTERPRETATION - 1 - François BREMOND, Monique THONNAT and Thinh VU Van ORION lab, INRIA Sophia-Antipolis, FRANCE
34
Embed
3 rd SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES
- 1 - 10/11/2014. 3 rd SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES. REPRESENTATION, RECOGNITION AND VISUALISATION OF HUMAN BEHAVIOURS FOR VIDEO INTERPRETATION. François BREMOND, Monique THONNAT and Thinh VU Van - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
3rd SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB
TECHNOLOGIES
REPRESENTATION, RECOGNITION AND VISUALISATION OF HUMAN BEHAVIOURS
FOR VIDEO INTERPRETATION
- 1 - 04/20/23
François BREMOND, Monique THONNAT and Thinh VU Van
ORION lab, INRIA Sophia-Antipolis, FRANCE
Plan of presentation
Part I: Video interpretation – Global framework– Scenario recognition
Part II: Visualisation of the interpretation– Scene context (3D geometry)– Human body– Human behaviour– Results
Conclusion
- 2 - 04/20/23
Part I: Video interpretation:Global framework
Our goal: to model the interpretation process of video sequences from pixel up to behaviour.
Main issue: current video interpretation systems are based on specific (ad hoc) routines:
– depend on sensors (camera orientation).– dedicated to specific scenarios (detection of fighting
people) and sites (metro stations).
time
...
Recognised scenario
Video stream
Mobile object detection &
tracking
“Car accident?”
“Two strangers exchanging objects?”
Scenario recognition
Video interpretation: Global framework
We define several entities: Context object: predefined static object of the scene environment
(entrance zone, bench, walls, equipment,...). Moving region: any intensity change between a reference and the
current images. Mobile object: any moving region which has been tracked and
classified (person, group of persons, vehicle, noise, … etc). Basic action: spatio-temporal property, instantaneous, numerical,
generic state and event. Scenario: long term, symbolic, application dependent, behaviour and
activity.
- 4 - 04/20/23
Video interpretation:Global framework
- 5 - 04/20/23
A priori knowledge
Video stream
Moving region
detection
Mobile object
tracking
Recognition of actions
Recognition of scenario 1
Recognition of scenario 2
...Recognition of scenario n
Recognised scenario
Scenario recognition
module
Mobile object classes Context
objectsScenario library
Sensors information
Tracked object types
Descriptions of action
recognition routines
Definition: a priori knowledge describing: – the sensors (cameras, optical cells and contact sensors): 3D
position of the sensor, camera type (colour, resolution), field of view and calibration matrix.
– context objects: equipment (bench, trash, door), walls, interesting zones (entrance zone), areas of interest.
3D geometry: 3D location of the object and its volume. Semantic information: type of the object (equipment), its
characteristics (yellow, fragile) and its function (seat).
Role: – to keep the interpretation independent from the sensors and the
sites.– to provide additional knowledge to interpret up to the scenario
level.
Video interpretation: scene context
- 6 - 04/20/23
Issues: large variety of actions and scenarios– more or less abstract (running/fighting).– general (standing)/sensor and application (sit down) dependent.– spatial granularity: the view observed by one camera/the whole site.– temporal granularity: instantaneous/long term.– 3 levels of complexity depending on the complexity of temporal
relations and on the number of actors: non-temporal constraint relative to one actor (being seated). temporal sequence of sub-scenarios relative to one actor (open the door,
go toward the chair then sit down). complex temporal constraints relative to several actors (A meets B at the
coffee machine then C gets up and leaves).
Video interpretation: basic actions and scenarios
- 7 - 04/20/23
We use several formalisms Action and scenario representation:
– n-ary tree.– finite state automaton.– graph.– set of constraints.
Action and scenario recognition: – specific routines.– classification.– bayes.– HMM.– propagation of temporal constraints.– constraint resolution.
Video interpretation: basic actions and scenarios
- 8 - 04/20/23
Example: a scenario is represented by a set of constraints.
Scenario(vandalism_against_ticket_machine,
Actors((p : Person), (eq : Equipment, Name = “Ticket_Machine”) )
Constraints( (exist ( (action s1: p move_close_to eq) (action s2: p stay_at eq)
(action s3: p move_away_from eq)
(action s4: p move_close_to eq) (action s5: p stay_at eq) )
( (s1 != s4) (s2 != s5)
(s1 before s2) (s2 before s3)
(s3 before s4) (s4 before s5) ) ) )
Production( (sc : Scenario)
( (Name of sc := "vandalism_against_ticket_machine")
(StartTime of sc := StartTime of s1)
(EndTime of sc := EndTime of s5) ) ) )
Video interpretation: basic actions and scenarios
- 9 - 04/20/23
Video interpretation:basic actions and scenarios
- 10 - 04/20/23
Video interpretation:Part I: conclusion
Approach: a framework combining several formalisms:– structure the knowledge to obtain a general model.– to have a declarative description of the knowledge.– to make the knowledge explicit.– to mix bottom-up and top-down processing.– to use evaluation and learning techniques.
- 11 - 04/20/23
Part II: Visualisation of the interpretation
Development of a test platform for an AVIS (Automatic Video Interpretation System): (a) visualisation of the scenarios recognised by an AVIS.
(b) simulation of the input of an AVIS.
(c) verification that the test platform is coherent with the AVIS.
(d) validation of the AVIS.
- 12 - 04/20/23
Visualisation of the interpretation
3 tasks of the test platform:(1) generation of realistic 3D animations corresponding to
the scenarios recognised by an interpretation system.
(2) generation of videos from 3D animations using a model of a virtual camera.
(3) generation of realistic 3D animations corresponding to the scenarios described by an expert.
- 13 - 20/04/23
Scenario recognition
1Recognised scenario
State, event, scenario models
for the recognition
Scene context model
AVIS
1Image sequence acquired by a camera
- 14 - 04/20/23
Scenario recognition
Scenario visualisation
1,2Image sequence acquired by a camera
2,3Generated image sequence
3Scenario described by experts
1,2,3Recognised scenario
1,2,3
3D Animation corresponding to the scenario
State, event, scenario models
for the recognition
Scene context model
Human body, action, scenario and animation
models for the visualisation
Scene context model
AVIS Test platform
- 15 - 04/20/23
Visualisation of the interpretation: approach
Conception of the test platform based on six generic models:
visualisation by using GEOMVIEW.
Animation
Scene context
Scenarios Actions
Human body
Camera
- 16 - 04/20/23
(1) : visualisation of a scene context for a metro station
(2) : example of a context object : a bench
(2)(1)
- 17 - 04/20/23
Visualisation of the interpretation: Scene context
Visualisation of the interpretation: Human body
Model: hierarchical and articulated.
The human body parts are build based on three primitives:(1) sphere.
(2) truncated cone.
(3) parallelepiped.
1
2
3
- 18 - 04/20/23
Generic model of the human body parts:(1) the relative position of the body part in the referential of
the super body part. For example, the hand is defined relatively to the arm.
(2) the angular co-ordinates of the body part in its referential.
(3) the size of the body part along its referential axis.
(4) the sub-parts or/and geometric primitives that constitute the body part.
(5) the colour of the body part.
- 19 - 04/20/23
Visualisation of the interpretation: Human body
Definition of 14 classes of human body parts: human body, head, arm, leg, shoulder,…
Different views:
(3) (4) (5)
(1) (2)
- 20 - 04/20/23
Visualisation of the interpretation: Human body
Human behaviours for interpretation systems:– basic action:
state: characterises an individual at a given time. event: change of states at two successive times.
– scenario: combination of actions.
Human behaviours for the test platform:– posture: corresponds to all body parameters of an
individual at a given time.– action: change of body parameters of an individual.– scenario: combination of actions.
- 21 - 04/20/23
Visualisation of the interpretation: Human behaviour
Generic model of action: concerned human body part. initial/final positions. variation of rotation angles around its
referential. global period of the action. list of sub actions with:
– the concerned sub part of human body.– the variation of rotation angles around the sub part referential.– their relative period.
visualisation speed. fixed part of human body on the ground.
- 22 - 04/20/23
Visualisation of the interpretation: Human behaviour: action
t = t1 t = t2
21 classes of actions: «walking», «running»,…
Actions «walking», «running» and «hand up»
- 23 - 04/20/23
Visualisation of the interpretation: Human behaviour: action
calculation of the current posture from the previous instant.
calculation of the global position of the individual: – automatic recognition: use the position of the detected
individual.– expert description: based
on a fixed point on the ground.
visualisation of geometric primitives through GEOMVIEW.
- 24 - 04/20/23
t = 100 t = 150
Fixed point on the ground
Visualisation of the interpretation: Human behaviour: visualisation of action
A scenario is a set of actions combining the individuals of the scene and the context objects which are relevant to the same activity.
Sequence of sub scenarios ordered by their period. Elementary scenario: action.
- 25 - 04/20/23
t = 80 t = 240
Visualisation of the interpretation: Human behaviour: scenario
«Walking on the platform» «Person A and person B meet at the coffee machine M»
- 26 - 04/20/23
Visualisation of the interpretation: Human behaviour: animation
«Pushing someone on the tracks» «Following another person»
- 27 - 04/20/23
Visualisation of the interpretation: Human behaviour: animation
Construction of models: • human body with 25 primitives.• 21 types of individual actions.• 4 types of scenarios.• 4 types of animations.
Generation of 7 types of 3D animations from descriptions.
Generation of 3D animations visualising individuals tracked by AVIS.
Checking the coherence by taking animations as input for AVIS.
- 28 - 04/20/23
Visualisation of the interpretation:Results
- 29 - 04/20/23
Raw video Tracked individuals
Animation of tracked individuals
Animation from a synthesised video
Visualisation of the interpretation:Results: comparison of 2 animations
Six generic models:• scene context.• virtual camera.• human body.• individual actions.• scenarios.• animations.
A description language for modeling the knowledge of the scene.
Validation of these models on metro scenes.
- 30 - 04/20/23
Visualisation of the interpretation:Part II: Conclusion and contributions
Help the developer:• visualisation of results of the interpretation (case multi-
cameras).• generation of test sequences (add the noisy phenomena)
for validating and establishing the limits of an AVIS.
Help the expert for describing new scenarios. Define an unified platform using the same models
for the interpretation and the test platform.
- 31 - 04/20/23
Part I&II: conclusion and perspectives
Visualisation of the interpretation: Scene context
A scene context is composed of 4 elements: • zones (e.g. zone of bench) with semantic information
(e.g. expected mobile objects): represented by polygons. • walls: represented by vertical polygons. • context objects (e.g. bench) with semantic information
(e.g. function of the object, time and distance of utilization): represented by 3D geometric primitives (sphere, truncated cone, parallelepiped).
• camera information: calibration matrix containing the parameters of the virtual camera (e.g. position, direction, FOV).
- 32 - 04/20/23
Issue: detection errors, non rigid objects, occlusions, merging and splitting of trajectories.
Approach: combining different types of tracking- frame to frame tracker: to compute correspondences between successive mobile objects.- individual tracker: tracking of specific individuals using time delay.- group tracker: global tracking of groups of persons.
For example: a group of persons is defined as a set of individuals which has four characteristics:- special coherency: the mobile objects are close to each other.- size coherence: the mobile objects are bigger than a person.- temporal coherence: the motion of mobile objects corresponds to the motion of a person.- structure coherence: the number and the size of the mobile objects are stable.
Enable to compute a reliable historic of all mobile objects.
Video interpretation: tracking of mobile objects
- 33 - 04/20/23
An animation combines and instantiates all previously defined scenarios:– scene context: with the set up values (e.g. colour of the
context objects).– actors with their set up values (e.g. position).– scenarios with the involved actors and their period of
occurrence.– virtual camera used to visualise the scene.– visualisation speed.
- 34 - 04/20/23
Visualisation of the interpretation: Human behaviour: animation