Top Banner
MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas
53

MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Jan 04, 2016

Download

Documents

Melanie Weaver
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

MPEG-4

Coding of audio-visual objects

Presentation By: Ihab Ilyas

Page 2: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

MPEG-4 Overview

• Background

• Scope of MPEG-4 Standard

• Objectives

• Requirements

• Object Model

• Tools

• Version 2

Page 3: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Background

• MPEG : Moving Picture Experts Group“Compactly representing digital video and audio signal for consumer distribution”

• MPEG-1: Standard for storage and retrieval of audio and video on storage media

• MPEG-2: Standard for digital TV

Page 4: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Scope of MPEG-4 Standard

• Author: greater production flexibility and reusability

• Network Service Provider: Offering transport information which can be interpreted on various network platforms

• End user: Higher levels of interaction with content within the limits set by the author.

Page 5: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Objectives

• Interactivity : Interacting with the different audio-visual objects

• Scalability : Adopting contents to match bandwidth

• Reusability : For both tools and data

Page 6: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Objectives - Interactivity

• Client Side Interaction– Manipulating scene description and properties of audio-

visual objects

• Audio-Visual Objects Behavior– Triggered by user actions and other events

• Client Server Interaction– In case a return channel is available

Page 7: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Objectives - Scalability

• Scalability refers to the ability to only decode a part of a bitstream and reconstruct images or image sequences with:– Reduced decoder complexity (reduced quality)– Reduced spatial resolution– Reduced temporal resolution

• A scalable object is the one that has basic-quality information for presentation. When enough bitrate or resources can be assigned, enhancement layers can be added for improving quality.

Page 8: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Objectives – Scalability (Cont.)

• Scalability is a key factor in many applications: making moving video possible at very low bitrates notably for mobile devices

• MPEG-4 has been found usable for streaming wireless video transmission at 10Kbps in GSM.

• Low bitrates are accommodated by the use of scalable objects.

Page 9: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Objectives - Reusability

• Authors can easily organize and manipulate individual components and reuse existing decoded objects.

• Each type of content can be coded using the most effective algorithms.

Page 10: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Requirements

• Traditional Requirements (MPEG-1 & 2)– Streaming : for live broadcast

– Synchronization : to process data received at the right instants of time

– Stream Management : to allow the application to consume the content (content type, dependencies…etc.)

• Specific MPEG-4 Requirements– Audio-Visual objects

– Scene description

Page 11: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Audio-Visual Objects

• The representation of a natural or synthetic object that has an audio and/or visual manifestation

• Examples:– Video Sequence (with Shape information).– Audio Track– Animated 3D face– Speech synthesized from text.

• Advantages: Interaction – Scalability – Reusability

Page 12: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Scene Description

The coding of information that describes the spatio-temporal relationships between the various audio-visual objects.

Page 13: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Scene Graph

Page 14: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Scene Description (Cont.)

• Place media objects anywhere in a given coordinate system.

• Apply transforms to change the geometrical or acoustical appearance of a media object.

• Group primitive media objects to form compound media objects.

• Apply streamed data to media objects to modify their attributes (sound, moving texture…)

• Change, interactively, the user’s viewing and listening point anywhere in the scene.

Page 15: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Logical Structure of a Scene

Page 16: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Scene Description (Cont.)

• Starting from VRML, MPEG has developed a binary language called BInary Format for Scenes (BIFS).

• The standard differentiates parameters used to improve the coding efficiency of an object (motion vectors in video coding), and the ones used as modifiers of an object (its position in the scene)

• Modification in the latter set does not imply re-decoding the primitive media objects.

Page 17: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.
Page 18: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

MPEG-4 Mission

Develop a coded, streamable representation for audio-visual objects and their associated time-variant data along with a description of how they are combined.

Page 19: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

MPEG-4 Mission (Cont.)

• Coded Vs. Textual

• Streamable Vs. Downloaded

• Audio-Visual objects Vs. Individual Audio or Visual Streams

Page 20: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Object Model

• Visual objects in the scene are described mathematically and given a position in two or three dimensional space. Similarly, audio objects are placed in sound space.

• “Create once, access everywhere” ..objects are defined once and the calculations to update the screen and sound are done locally.

Page 21: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Objectifying the Visual

• Classical video (from the camera) is one of the visual objects defined in the standard.

• Objects with arbitrary shapes can be encoded apart from their background and can be described in two ways.– Binary Shape: for low bitrate environments– Gray Scale (Alpha Shape): for higher quality

content.

Page 22: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Objectifying the Visual (Cont.)

• MPEG does not specify how shapes are to be extracted. Current methods still have limitations (e.g. Weatherman).

• MPEG-4 specifies only the decoding process. Encoding is left to the market place.

Page 23: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

2D Animated Meshes

• A 2D mesh is a partition of a 2D planar region into polygonal patches.

• A 2D dynamic mesh refers to a 2D mesh geometry and motion information.

2D mesh

Face mesh

Page 24: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

2D Animated Meshes (Cont.)

• The most entertaining feature in MPEG-4 is the ability to map images onto computer generated shapes (meshes currently 2D and 3D in the next version).

• A few parameters to deform the mesh can create the impression of moving video from a still video (e.g. a waving flag).

• Predefined faces are particularly interesting meshes. Any feature (lips or eyes) may be animated by special commands that make them move in synchronization with speech.

Page 25: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

System Architecture

• Streaming data for media objects.

• Different architecture layers– Delivery layer– Sync layer– Compression layer– Composition layer

• Syntax Description

Page 26: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Streaming data for media objects

• Needed data for media objects can be conveyed in one or more Elementary Streams (ESs).

• An Object Descriptor (OD) identifies all streams associated with one media object.

• OD contains a set of descriptors that characterized the ESs (required decoder resources, encoder timing,..)

Page 27: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Architecture Layers

Page 28: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Architecture Layers

SL SL SL

TransMux Layer

FlexMux

TransMux Streams

FlexMux Channel

TransMux Channel FlexMux Streams DMIF Network Interface

DMIF Application Interface

Elementary Stream Interface

SL-Packetized Streams

Elementary Streams

FlexMux

Sync Layer

DMIF Layer

SL SLSL

FlexMux

SL

(RTP)UDP

IP

(PES)MPEG2

TS

AAL2ATM

H223PSTN

....

....

....DABMux

File Broad-cast

Inter-active (not specified in MPEG-4)

Del

iver

y La

yer

Page 29: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Delivery Layer

• Contains two-layer multiplexer– FlexMux: a tool defined according to the DMIF

(Delivery Multimedia Integration Framework). It allows grouping of ESs with a low overhead.

– TranMux: the second layer that offers transport service interfaces with different transport protocols (UDP/IP- MPEG-2,….)

Page 30: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Delivery layer (Cont.)

• The functionality of the DMIF is expressed by an interface called DMIF Application Interface (DAI)

• DAI is the reference point at which the elementary streams can be accessed as Sync layer – packetized streams.

• Sync layer talks to the delivery layer through DAI.

Page 31: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Sync Layer

• SL A flexible and configurable packetization facility that allows: Timing, Fragmentation, and continuity information on associated data packets. (Packetized Elementary Streams)

• It does not provide frame information (no packet length in header). Delivery layer will do it.

Page 32: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Sync Layer Functionality

• Identifying time stamped Access Units (data units that comprise complete representation unit).

• Each packet is an access unit or a fragment of an access unit.

• These access units forms the only semantic structure of ESs in this layer.

• Stamping access units includes timing information for decoding and composition.

• SL retrieves ESs from packetized ESs.

Page 33: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Compression Layer

• The streams are sent to their respective decoders that process the data and produce composition units.

• In order to relate ESs to media objects Object Descriptors (OD) are used to convey information about the number and properties of a set of ESs that belongs to a media object.

Page 34: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Compression Layer (Cont.)

• Scene Description: Defines– The spatial and temporal position of the various

objects– The objects dynamic behavior– Interactivity features

• The scene description contains unique identifiers that point to object descriptors.

• Tree structured and based on VRML

Page 35: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Decoding Buffer Architecture

DecodingBuffer DB1

Media ObjectDecoder

(encapsulatesDemultiplexer)

DMIF Appli-cation Interface

DecodingBuffer DBn

DecodingBuffer DB2 Media Object

Decoder

Memory CB2

Compositor

Elementary Stream Interface

DecodingBuffer DB3

Memory CB1

Composition

Composition

Memory CBn

CompositionMedia ObjectDecoder

1

2

n

Page 36: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Composition Layer

• Using scene description and decoded audio-visual object data to render the final scene presented to user.

• MPEG-4 does not specify how information is rendered

• Composition is performed at the receiver

Page 37: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Syntax Description

• MPEG-4 defines a syntactic description language (MSDL) to describe the exact binary syntax for bitstreams carrying media objects and for bitstreams with scene description information

• This language is an extension of C++, and is used to describe the syntactic representation of objects

Page 38: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Tools

• Stream Management: The Object Description Framework (ODF)

• Presentation Engine: (BIFS)

• Timing and synchronization: The System Decoder Model (SDM)

Page 39: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Tools – ODF

• Provides the glue between the scene description and the elementary streams.

• Unique identifiers are used in the scene description to point to the OD.

• The OD is a structure that encapsulates the setup and association information for a set of ES’s.

• OD’s are transported in dedicated ES’s called Object Descriptor Streams (ODS).

• This makes it possible to associate timing information to a set of OD’s.

• Provides mechanisms to describe a hierarchical relations between streams reflecting scalable encoding of the content.

Page 40: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Tools – ODF (Cont.)

• The initial OD, a derivative of the object descriptor is a key element necessary for accessing MPEG-4 content.

• Contains at least two elementary stream descriptors:– One point to the scene description stream.– Others may point to object descriptor stream.

Page 41: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Tools - BIFS

• Used to describe scene decomposition information.– Spatial and Temporal locations of objects.– Object attributes and behavior.– Relationships between elements in the scene

graph.

• Relies heavily on VRML.

Page 42: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

VRML

A file format for describing 3D interactive worlds(scenes) and objects. It may be used in conjunction with the WWW. It may be used to create 3D representation of complex scenes as in virtual reality representation.

Page 43: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

VRML Example – Shape node

shape{ geometry IndexedFaceSet{

coordindex [0, 1, 3, -1, 0, 2, 5, -1, …]coord Coordinate {point[0.0 5.0 ..]}color Color {rgb [0.2 0.7…]}

normal Normal {vector[0.0 1.0 0.0 ..]}textCoord Texture Coordinate {point [0 1.0 ,..]}}

appearance Appearance {material Material {transperancy 0.5}}

}

Page 44: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

BIFS vs. VRML

• VRML lacks important features:– The support of natural audio and video.– Timing model is loosely specified.– VRML worlds (scenes) are often very large.

• BIFS is a superset of VRML.– A binary format not a textual format (shorter)– Real-time streaming– Definition of 2D objects– Facial Animation– Enhanced Audio

Page 45: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Tools - SDM

• An adaptation for the MPEG-2 System Target Decoder (that describes temporal and buffer constraints for packetizing ES’s).

• MPEG-4 chose not to define multiplexing constraints in the SDM.

• SDM assumes the concurrent delivery of already demultiplexed ES’s to the decoder buffer.

Page 46: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Version 2

• Intellectual Property Management & Protection (IPMP)

• Advanced BIFS

• MPEG-4 File Format

• MPEG-J

• Coding of 3D Meshes

• Body Animation

Page 47: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Advanced BIFS

• Multi-user functionality to access the same scene.• Advanced audio BIFS for more natural sounds,

and sound environment modeling (air absorption, natural distance attenuation).

• Face and body animation.• Proto and Externproto and Script VRML

constructs.• Other VRML nodes not included in version 1.

Page 48: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

MPEG-4 File Format (MP4)

• Designed to contain the media information of an MPEG-4 presentation in a flexible extensible format that facilitates interchange, management, editing and presentation.

• The design is based on QuickTime® format.• Composed of object-oriented structures

called atoms.

Page 49: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

MPEG-J

• Specification of Java API’s in MPEG-4 System (Scene Graph, Resource Manager, …etc)

• Contents creator may embed complex control and data processing mechanisms to intelligently manage the operation of the audio-visual session.

• Java application is delivered as a separate ES to the terminal then directed to the MPEG-J run time environment.

Page 50: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Coding of 3D Meshes

• Coding of generic 3D meshes to efficiently code synthetic 3D objects.

• LOD (Level of Detail) scalability to reduce rendering time for objects that are distant from the viewer.

• 3D progressive geometric meshes (temporal enhancement of 3D mesh).

Page 51: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

Body Animation

• A body is an object capable of producing virtual body models and animations in form of a set of 3D polygon meshes ready for rendering.

Page 52: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

BIFS

• A set of nodes to represent the primitive scene objects, the scene graph constructs, the behavior and activity.

• BIFS scene tells where and when to render the media

Page 53: MPEG-4 Coding of audio-visual objects Presentation By: Ihab Ilyas.

BIFS

• Additionally to VRML, BIFS defines:– 2D capabilities.

– Integration of D and 3D.

– Advanced audio features.

– A timing model.

– BIFS-Update protocol to update scene in time

– BIFS-Anim protocol to animate the scene in time.

– A binary encoding of the scene.