elsaddik @ site.uottawa.ca abed @ mcrlab.uottawaelsaddik/abedweb/teaching/elg... · MPEG in a Nutshell ØI-Frames are self ... ØMPEG-4: vInitially, lower data rates for e.g. mobile
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
ØCalled motion JPEGØCompress each frame individually, without
reference to any other frames in the sequencevà thus does not consider inter-frame
redundanciesØaudio is not supported in an integrated fashionØMotion JPEG Hardware (Chips, boards) for near
real-time compression/ decompression available, but storage and retrieval from a hard disc still takes a second or more.vHigh quality video requires fast SCSI discs or
cashing of short video sequences in large memory buffers.
Spatial and temporal redundancy video compression – MPEG
We have seen with JPEG how spatial redundancy can be explored. MPEG utilises, as well as spatial redundancy, the fact that frames in a sequence are similar to each other. This is what is known as temporal redundancy.
A few definitions are required here:ØMacroblocksvThis is a 16x16 pixel block, composed of
4 times 8x8 luminance blocks and 2 colour difference blocks
ØMotion VectorsvIndicates the spatial translation of a
Integrated Coding And Multiplexing)Ødelays, for VLSI implementation:vmax. 30 ms encodingvmax. 10 ms decoding
ØSW codec delays vary for different layers, implementations, computers (rule-of-thumb may be 50/100/150 ms for layer 1/2/3, which makes MP3 rather inappropriate for real-time conversation)
MPEG - Fellow upØMPEG-2:vHigher data rates for high-quality audio/videovMultiple layers and profilesvStudio quality TV and CD quality audio channels. 4 to 6 Mbps
typically.ØMPEG-3vInitially HDTVvMPEG-2 scaled up to subsume MPEG-3
ØMPEG-4:vInitially, lower data rates for e.g. mobile communicationvthen: focus coding & additional functionalities based on
image contentsvVideo conferencing at very low bit rates: 4.8 to 64 Kbps, with
10fps.ØMPEG-7 (EC = "experimental core" status):vContent descriptionvBasis for search and retrievalvSee section on databases
ØMPEG-21 (upcoming):vFramework for multimedia business, delivery... what’s
(two modest) extension to MPEG-1 audio: 1. "low sample rate extension" LSE: v 1/2 of all MPEG-1 rates: 16, 22.05, 24kHzv quantization down to 8 bits/sample
2. "multichannel extension": more channels, i.e. up to v 5 full bandwidth channels (surround system)
• left and right front• center (in front)• left and right back
v "multilingual extension": 7 more, i.e. up to 12 channels (multiple languages, commentary)
Ø Backward compatibility with MPEG-1 audiov Only three MPEG-2 audio codecs will not provide
backward compatibility ( in the range of 256- 448 kbps)
MPEG - Fellow upØMPEG-2:vHigher data rates for high-quality audio/videovMultiple layers and profilesvStudio quality TV and CD quality audio channels. 4 to 6 Mbps
typically.ØMPEG-3vInitially HDTVvMPEG-2 scaled up to subsume MPEG-3
ØMPEG-4:vInitially, lower data rates for e.g. mobile communicationvthen: focus coding & additional functionalities based on
image contentsvVideo conferencing at very low bit rates: 4.8 to 64 Kbps, with
10fps.ØMPEG-7 (EC = "experimental core" status):vContent descriptionvBasis for search and retrievalvSee section on databases
ØMPEG-21 (upcoming):vFramework for multimedia business, delivery... what’s
MPEG-4: Schedule for StandardizationØ1993: Work startedØ1997: Committee DraftØ1998: Final Committee DraftØ1998: Draft International StandardØ1999-2000: International Standard
ØAgainvStarted from original goal of providing an audio-visual
coding standard for very-low-bit-rate channels (e.g., for mobile applications)vEvolved into a complex tool kit vMPEG-4 innovates the MPEG-2 information production
and consumption paradigm by the way audio and video info is represented
vDeals with audio and video no longer as packaged “bitstreams”, produced by encoding, but as “audio-visual objects” (AVOs)
ØContent-Based ScalabilityØContent-Based Manipulation and Bitstream
EditingØContent-Based Multimedia Data Access ToolsØHybrid Natural and Synthetic Data CodingØCoding of Multiple Concurrent Data StreamsØImproved Coding EfficiencyØRobustness in Error-Prone EnvironmentsØImproved Temporal Random Access
ØMPEG4 provides the ability to achieve scalability with a fine granularity in content, spatial resolution, temporal resolution, quality and complexity.ØContent-scalability may imply the existence of a
prioritization of the objects in the scene. The combination of more than one scalability case may yield interesting scene representations, where the more relevant objects are represented with higher spatial-temporal resolution. ØExample uses: vuser selection of decoded quality of individual
objects in the scene; vdatabase browsing at different scales,
ØMPEG4 provides a syntax and coding schemes to support content-based manipulation and bitstream editing without the need for transcoding.ØThis means the user should be able to access
one specific object in the scene/bitstream and perhaps change some of its characteristics.ØExample uses: vhome movie production and editing;
interactive home shopping; vinsertion of sign language interpreter or
ØMPEG4 supports efficient methods for combining synthetic scenes with natural scenes (e.g. text and graphics overlays), the ability to code and manipulate natural and synthetic audio and video data and decoder-controllable methods of mixing synthetic data with ordinary video and audio, allowing for interactivity. Øharmonious integration of natural and synthetic audio-
visual objects. Ø first step towards the integration of all types of audio-
visual information.ØExample uses:
vvirtual reality applications; vanimations and synthetic audio (e.g. MIDI) can be mixed
with ordinary audio and video in a game; vgraphics can be rendered from different viewpoints.
Øability to efficiently code multiple views/soundtracks of a scene as well as sufficient synchronisation between the resulting elementary streams. ØFor stereoscopic and multiview video applications, MPEG4
shall include the ability to exploit redundancy in multiple views of the same scene, also permitting solutions that allow compatibility with normal (mono) video. This functionality should provide efficient representations of 3D natural objects provided a sufficient number of views is available. Again, this may require a complex analysis process. It is expected that this functionality could substantially benefit applications such as virtual reality where almost only synthetic objects are used till now.ØExample uses:
vmultimedia entertainment, e.g. virtual reality games, 3D movies; vtraining and flight simulations; vmultimedia presentations and education.
Øthe growth of mobile networks provides a strong need for improved coding efficiency, ØMPEG4 is required to provide subjectively better
audio-visual quality compared to existing or other emerging standards (such as H.263), at comparable bit-rates. ØThe results of the MPEG4 video subjective tests,
held in November 1995, showed however that, in terms of coding efficiency, the available coding standards still perform very well in comparison with most of the other coding techniques proposedØExample uses: vefficient transmission of audio-visual data on
low-bandwidth channels; vefficient storage of audio-visual data on
Øuniversal accessibility implies access to applications over a variety of wireless and wired networks and storage media ØMPEG4 shall provide an error robustness
capability. Particularly, for low bit-rate applications under severe error conditions.ØThe idea is not to substitute the error control
techniques implemented by the network but provide resilience against the residual errors, e.g. through selective forward error correction, error containment or error concealment.ØExample uses: vtransmitting from a database over a wireless
network;vcommunicating with a mobile terminal; vgathering audio-visual data from a remote
ØMPEG4 shall provide efficient methods to randomly access, within a limited time and with fine resolution, parts from an audio-visual sequence. This includes ‘conventional’ random access at very low bit rates.
ØExample uses vaudio-visual data can be randomly accessed
from a remote terminal over limited capacity media; va ‘fast forward’ can be performed on a single
Seven Architectural ‘Elements’ in the Multimedia Framework:
1. Digital Item Declaration2. Digital Items Representation3. Digital Item Identification and Description 4. Content Management and Usage 5. Intellectual Property Management and
Protection 6. Terminals and Networks 7. Event Reporting
vdifferential PCM (DPCM) with motion estimation for interframe coding and vvariable word-length entropy coding (such as Huffman)
Øvery high-compression ratios for full-color, real-time motion video transmissionØcombines intraframe and interframe codingØoptimized for applications such as vvideo-conferencing, which are not motion-intensive
Ø limited motion search and estimation strategiesØcompression ratios from 100:1 to 2,000:1Øcovers the entire ISDN channel capacity (p x 64 kbps,
p=1,2,...,30)vfor p=1 or 2: videophone, desk -top video-conferencing
applicationsvfor p=6 or higher, more complex pictures are
H.261Ø Intraframe coding takes no advantage of redundancy between
frames.vIntraframe coding: yields "reference frame" f0veach 8x8 block is transformed by DCTvDCT with same quantization factor for all AC valuesvthis factor may be adjusted by loopback filtervintraframes rare (bandwidth!, main application videophone)
Ø Interframe coding (corresponds to P frames of MPEG) à Motion estimationvinterframes: f1,f2,f3,... relative to f0 (differential encoding)vSearch of similar macroblock (16x16) in previous imagevPosition of this macroblock defines motion vectorvSearch range is up to the implementation:
• max. ± 15 pixel• but: motion vector may also always be 0 ("bad" software
encoder) • e.g. H.261 also allows simple implementation, considering
only the differences between macroblocks located in the same position, thus a zero motion vector
ØExtension to H.261Ømax. bitrate: H.263 approx. 2.5 x H.261; lowest bitrates
suitable f. modem
Main Differences between H.261 and H.263ØBase Level Differences (always ON)vNo filter for HF noise in feedback loopvMotion vectors produced with 1/2-pixel resolutionvPicture format for sub-QCIF (128x96)vHuffman tables designed specifically for low bit rate.
vJPEG is the still picture modeØOptional Level Differences (Negotiated)vUnlimited search space for motion vector à fast encoder can do bettervSyntax-based Arithmetic codingvAdvanced prediction modevPB-frames (2 combined pictures: 1 B- & 1 P-Frame)