Top Banner
3 rd SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES REPRESENTATION, RECOGNITION AND VISUALISATION OF HUMAN BEHAVIOURS FOR VIDEO INTERPRETATION - 1 - François BREMOND, Monique THONNAT and Thinh VU Van ORION lab, INRIA Sophia-Antipolis, FRANCE
34

3 rd SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Jan 07, 2016

Download

Documents

Henrik

- 1 - 10/11/2014. 3 rd SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES. REPRESENTATION, RECOGNITION AND VISUALISATION OF HUMAN BEHAVIOURS FOR VIDEO INTERPRETATION. François BREMOND, Monique THONNAT and Thinh VU Van - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

3rd SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB

TECHNOLOGIES

REPRESENTATION, RECOGNITION AND VISUALISATION OF HUMAN BEHAVIOURS

FOR VIDEO INTERPRETATION

- 1 - 04/20/23

François BREMOND, Monique THONNAT and Thinh VU Van

ORION lab, INRIA Sophia-Antipolis, FRANCE

Page 2: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Plan of presentation

Part I: Video interpretation – Global framework– Scenario recognition

Part II: Visualisation of the interpretation– Scene context (3D geometry)– Human body– Human behaviour– Results

Conclusion

- 2 - 04/20/23

Page 3: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Part I: Video interpretation:Global framework

Our goal: to model the interpretation process of video sequences from pixel up to behaviour.

Main issue: current video interpretation systems are based on specific (ad hoc) routines:

– depend on sensors (camera orientation).– dedicated to specific scenarios (detection of fighting

people) and sites (metro stations).

time

...

Recognised scenario

Video stream

Mobile object detection &

tracking

“Car accident?”

“Two strangers exchanging objects?”

Scenario recognition

Page 4: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Video interpretation: Global framework

We define several entities: Context object: predefined static object of the scene environment

(entrance zone, bench, walls, equipment,...). Moving region: any intensity change between a reference and the

current images. Mobile object: any moving region which has been tracked and

classified (person, group of persons, vehicle, noise, … etc). Basic action: spatio-temporal property, instantaneous, numerical,

generic state and event. Scenario: long term, symbolic, application dependent, behaviour and

activity.

- 4 - 04/20/23

Page 5: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Video interpretation:Global framework

- 5 - 04/20/23

A priori knowledge

Video stream

Moving region

detection

Mobile object

tracking

Recognition of actions

Recognition of scenario 1

Recognition of scenario 2

...Recognition of scenario n

Recognised scenario

Scenario recognition

module

Mobile object classes Context

objectsScenario library

Sensors information

Tracked object types

Descriptions of action

recognition routines

Page 6: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Definition: a priori knowledge describing: – the sensors (cameras, optical cells and contact sensors): 3D

position of the sensor, camera type (colour, resolution), field of view and calibration matrix.

– context objects: equipment (bench, trash, door), walls, interesting zones (entrance zone), areas of interest.

3D geometry: 3D location of the object and its volume. Semantic information: type of the object (equipment), its

characteristics (yellow, fragile) and its function (seat).

Role: – to keep the interpretation independent from the sensors and the

sites.– to provide additional knowledge to interpret up to the scenario

level.

Video interpretation: scene context

- 6 - 04/20/23

Page 7: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Issues: large variety of actions and scenarios– more or less abstract (running/fighting).– general (standing)/sensor and application (sit down) dependent.– spatial granularity: the view observed by one camera/the whole site.– temporal granularity: instantaneous/long term.– 3 levels of complexity depending on the complexity of temporal

relations and on the number of actors: non-temporal constraint relative to one actor (being seated). temporal sequence of sub-scenarios relative to one actor (open the door,

go toward the chair then sit down). complex temporal constraints relative to several actors (A meets B at the

coffee machine then C gets up and leaves).

Video interpretation: basic actions and scenarios

- 7 - 04/20/23

Page 8: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

We use several formalisms Action and scenario representation:

– n-ary tree.– finite state automaton.– graph.– set of constraints.

Action and scenario recognition: – specific routines.– classification.– bayes.– HMM.– propagation of temporal constraints.– constraint resolution.

Video interpretation: basic actions and scenarios

- 8 - 04/20/23

Page 9: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Example: a scenario is represented by a set of constraints.

Scenario(vandalism_against_ticket_machine,

Actors((p : Person), (eq : Equipment, Name = “Ticket_Machine”) )

Constraints( (exist ( (action s1: p move_close_to eq) (action s2: p stay_at eq)

(action s3: p move_away_from eq)

(action s4: p move_close_to eq) (action s5: p stay_at eq) )

( (s1 != s4) (s2 != s5)

(s1 before s2) (s2 before s3)

(s3 before s4) (s4 before s5) ) ) )

Production( (sc : Scenario)

( (Name of sc := "vandalism_against_ticket_machine")

(StartTime of sc := StartTime of s1)

(EndTime of sc := EndTime of s5) ) ) )

Video interpretation: basic actions and scenarios

- 9 - 04/20/23

Page 10: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Video interpretation:basic actions and scenarios

- 10 - 04/20/23

Page 11: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Video interpretation:Part I: conclusion

Approach: a framework combining several formalisms:– structure the knowledge to obtain a general model.– to have a declarative description of the knowledge.– to make the knowledge explicit.– to mix bottom-up and top-down processing.– to use evaluation and learning techniques.

- 11 - 04/20/23

Page 12: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Part II: Visualisation of the interpretation

Development of a test platform for an AVIS (Automatic Video Interpretation System): (a) visualisation of the scenarios recognised by an AVIS.

(b) simulation of the input of an AVIS.

(c) verification that the test platform is coherent with the AVIS.

(d) validation of the AVIS.

- 12 - 04/20/23

Page 13: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Visualisation of the interpretation

3 tasks of the test platform:(1) generation of realistic 3D animations corresponding to

the scenarios recognised by an interpretation system.

(2) generation of videos from 3D animations using a model of a virtual camera.

(3) generation of realistic 3D animations corresponding to the scenarios described by an expert.

- 13 - 20/04/23

Page 14: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Scenario recognition

1Recognised scenario

State, event, scenario models

for the recognition

Scene context model

AVIS

1Image sequence acquired by a camera

- 14 - 04/20/23

Page 15: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Scenario recognition

Scenario visualisation

1,2Image sequence acquired by a camera

2,3Generated image sequence

3Scenario described by experts

1,2,3Recognised scenario

1,2,3

3D Animation corresponding to the scenario

State, event, scenario models

for the recognition

Scene context model

Human body, action, scenario and animation

models for the visualisation

Scene context model

AVIS Test platform

- 15 - 04/20/23

Page 16: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Visualisation of the interpretation: approach

Conception of the test platform based on six generic models:

visualisation by using GEOMVIEW.

Animation

Scene context

Scenarios Actions

Human body

Camera

- 16 - 04/20/23

Page 17: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

(1) : visualisation of a scene context for a metro station

(2) : example of a context object : a bench

(2)(1)

- 17 - 04/20/23

Visualisation of the interpretation: Scene context

Page 18: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Visualisation of the interpretation: Human body

Model: hierarchical and articulated.

The human body parts are build based on three primitives:(1) sphere.

(2) truncated cone.

(3) parallelepiped.

1

2

3

- 18 - 04/20/23

Page 19: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Generic model of the human body parts:(1) the relative position of the body part in the referential of

the super body part. For example, the hand is defined relatively to the arm.

(2) the angular co-ordinates of the body part in its referential.

(3) the size of the body part along its referential axis.

(4) the sub-parts or/and geometric primitives that constitute the body part.

(5) the colour of the body part.

- 19 - 04/20/23

Visualisation of the interpretation: Human body

Page 20: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Definition of 14 classes of human body parts: human body, head, arm, leg, shoulder,…

Different views:

(3) (4) (5)

(1) (2)

- 20 - 04/20/23

Visualisation of the interpretation: Human body

Page 21: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Human behaviours for interpretation systems:– basic action:

state: characterises an individual at a given time. event: change of states at two successive times.

– scenario: combination of actions.

Human behaviours for the test platform:– posture: corresponds to all body parameters of an

individual at a given time.– action: change of body parameters of an individual.– scenario: combination of actions.

- 21 - 04/20/23

Visualisation of the interpretation: Human behaviour

Page 22: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Generic model of action: concerned human body part. initial/final positions. variation of rotation angles around its

referential. global period of the action. list of sub actions with:

– the concerned sub part of human body.– the variation of rotation angles around the sub part referential.– their relative period.

visualisation speed. fixed part of human body on the ground.

- 22 - 04/20/23

Visualisation of the interpretation: Human behaviour: action

t = t1 t = t2

Page 23: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

21 classes of actions: «walking», «running»,…

Actions «walking», «running» and «hand up»

- 23 - 04/20/23

Visualisation of the interpretation: Human behaviour: action

Page 24: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

calculation of the current posture from the previous instant.

calculation of the global position of the individual: – automatic recognition: use the position of the detected

individual.– expert description: based

on a fixed point on the ground.

visualisation of geometric primitives through GEOMVIEW.

- 24 - 04/20/23

t = 100 t = 150

Fixed point on the ground

Visualisation of the interpretation: Human behaviour: visualisation of action

Page 25: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

A scenario is a set of actions combining the individuals of the scene and the context objects which are relevant to the same activity.

Sequence of sub scenarios ordered by their period. Elementary scenario: action.

- 25 - 04/20/23

t = 80 t = 240

Visualisation of the interpretation: Human behaviour: scenario

Page 26: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

«Walking on the platform» «Person A and person B meet at the coffee machine M»

- 26 - 04/20/23

Visualisation of the interpretation: Human behaviour: animation

Page 27: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

«Pushing someone on the tracks» «Following another person»

- 27 - 04/20/23

Visualisation of the interpretation: Human behaviour: animation

Page 28: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Construction of models: • human body with 25 primitives.• 21 types of individual actions.• 4 types of scenarios.• 4 types of animations.

Generation of 7 types of 3D animations from descriptions.

Generation of 3D animations visualising individuals tracked by AVIS.

Checking the coherence by taking animations as input for AVIS.

- 28 - 04/20/23

Visualisation of the interpretation:Results

Page 29: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

- 29 - 04/20/23

Raw video Tracked individuals

Animation of tracked individuals

Animation from a synthesised video

Visualisation of the interpretation:Results: comparison of 2 animations

Page 30: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Six generic models:• scene context.• virtual camera.• human body.• individual actions.• scenarios.• animations.

A description language for modeling the knowledge of the scene.

Validation of these models on metro scenes.

- 30 - 04/20/23

Visualisation of the interpretation:Part II: Conclusion and contributions

Page 31: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Help the developer:• visualisation of results of the interpretation (case multi-

cameras).• generation of test sequences (add the noisy phenomena)

for validating and establishing the limits of an AVIS.

Help the expert for describing new scenarios. Define an unified platform using the same models

for the interpretation and the test platform.

- 31 - 04/20/23

Part I&II: conclusion and perspectives

Page 32: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Visualisation of the interpretation: Scene context

A scene context is composed of 4 elements: • zones (e.g. zone of bench) with semantic information

(e.g. expected mobile objects): represented by polygons. • walls: represented by vertical polygons. • context objects (e.g. bench) with semantic information

(e.g. function of the object, time and distance of utilization): represented by 3D geometric primitives (sphere, truncated cone, parallelepiped).

• camera information: calibration matrix containing the parameters of the virtual camera (e.g. position, direction, FOV).

- 32 - 04/20/23

Page 33: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

Issue: detection errors, non rigid objects, occlusions, merging and splitting of trajectories.

Approach: combining different types of tracking- frame to frame tracker: to compute correspondences between successive mobile objects.- individual tracker: tracking of specific individuals using time delay.- group tracker: global tracking of groups of persons.

For example: a group of persons is defined as a set of individuals which has four characteristics:- special coherency: the mobile objects are close to each other.- size coherence: the mobile objects are bigger than a person.- temporal coherence: the motion of mobile objects corresponds to the motion of a person.- structure coherence: the number and the size of the mobile objects are stable.

Enable to compute a reliable historic of all mobile objects.

Video interpretation: tracking of mobile objects

- 33 - 04/20/23

Page 34: 3 rd  SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES

An animation combines and instantiates all previously defined scenarios:– scene context: with the set up values (e.g. colour of the

context objects).– actors with their set up values (e.g. position).– scenarios with the involved actors and their period of

occurrence.– virtual camera used to visualise the scene.– visualisation speed.

- 34 - 04/20/23

Visualisation of the interpretation: Human behaviour: animation