Top Banner
Multimedia Communication Systems 1 MULTIMEDIA SIGNAL CODING AND TRANSMISSION” DR. AFSHIN EBRAHIMI 1
49

Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Aug 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Multimedia Communication Systems 1“MULTIMEDIA SIGNAL CODING AND TRANSMISSION”DR. AFSHIN EBRAHIMI

1

Page 2: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Table of Contents

1 Introduction1.1 Concepts and terminology

1.1.1 Signal representation by source coding 1.1.2 Optimization of transmission

1.2 Signal sources and acquisition1.3 Digital representation of multimedia signals

1.3.1 Image and video signals1.3.2 Speech and audio signals1.3.3 Need for compression technology

2

Page 3: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Table of Contents2 Fundamentals of Signal Processing and Statistics

2.1 Signals and systems2.1.1 Elementary signals

2.1.2 Systems operations

2.2 Signals and Fourier spectra2.2.1 Two- and multi-dimensional

spectra

2.2.2 Spatio-temporal signals

2.3 Sampling of multimedia signals2.3.1 Separable two-dimensional

sampling

2.3.2 Sampling of video signals

2.4 Digital signal processing in multiple dimensions

2.5 Statistical analysis2.5.1 Sample Statistics

2.5.2 Joint statistical properties

2.5.3 Spectral properties of random signals

2.5.4 Markov chain models

2.5.5 Statistical foundations of information theory

3

Page 4: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Table of Contents

2.6 Linear prediction2.6.1 Autoregressive models2.6.2 Linear prediction

2.7 Linear block transforms2.7.1 Orthogonal basis functions2.7.2 Basis functions of orthogonal

transforms2.7.3 Efficiency of transforms2.7.4 Transforms with block overlap

2.8 Filterbank transforms2.8.1 Properties of subband filters2.8.2 Implementation of filterbank

structures2.8.3 Discrete wavelet transform (DWT)2.8.4 Two- and multi-dimensional filter

banks2.8.5 Pyramid decomposition

4

Page 5: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Table of Contents

3 Perception and Quality3.1 Properties of vision

3.1.1 Physiology of the eye3.1.2 Sensitivity functions3.1.3 Color vision

3.2 Properties of hearing3.2.1 Physiology of the ear3.2.2 Sensitivity functions

3.3 Quality metrics3.3.1 Objective signal quality metrics3.3.2 Subjective assessment

5

Page 6: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Table of Contents

4 Quantization and Coding4.1 Scalar quantization and pulse code modulation4.2 Coding theory

4.2.1 Source coding theorem and rate-distortion function

4.2.2 Rate-distortion function for correlated signals4.2.3 Rate-distortion function for multi-dimensional

signals4.3 Rate-distortion optimization of quantizers4.4 Entropy coding

4.4.1 Properties of variable-length codes4.4.2 Huffman codes

4.4.3 Systematic variable-length codes4.4.4 Arithmetic coding4.4.5 Adaptive and context-dependent entropy

coding4.4.6 Entropy coding and transmission errors4.4.7 Lempel-Ziv coding

4.5 Vector quantization (VQ)4.5.1 Basic principles of VQ4.5.2 VQ with uniform codebooks4.5.3 VQ with non-uniform codebooks4.5.4 Structured codebooks4.5.5 Rate-constrained VQ

6

Page 7: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Table of Contents

5 Methods of Signal Compression5.1 Binary signal coding

5.1.1 Run-length coding5.2 Predictive coding

5.2.1 Open-loop and closed-loop prediction systems

5.2.2 Non-linear and shift-variant prediction5.2.3 Effects of transmission losses5.2.4 Vector prediction5.2.5 Prediction in multi-resolution pyramids

5.3 Transform coding

5.3.1 Gain through discrete transform coding

5.3.2 Quantization of transform coefficients5.3.3 Coding of transform coefficients5.3.4 Transform coding under transmission

losses5.4 Bitstreams with multiple decoding capability

5.4.1 Simulcast and transcoding5.4.2 Scalable coding5.4.3 Multiple-description coding

7

Page 8: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Table of Contents

6 Still Image Coding6.1 Compression of binary images

6.1.1 Compression of bi-level images6.1.2 Binary shape coding6.1.3 Contour shape coding

6.2 Vector quantization of images6.3 Predictive coding

6.3.1 2D prediction6.3.2 2D vector prediction6.3.3 Quantization and encoding of

prediction errors6.3.4 Error propagation in 2D DPCM

6.4 Transform coding of images6.4.1 Block transform coding .6.4.2 Overlapping-block transform coding6.4.3 Subband and wavelet transform

coding6.4.4 Local adaptation of transform bases

by signal properties6.5 Synthesis based image coding

6.5.1 Region-based coding6.5.2 Colour and texture synthesis6.5.3 Post filtering

6.6 Still image coding standards

8

Page 9: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Table of Contents7 Video Coding7.1 Intraframe-only and frame replenishment coding7.2 Hybrid video coding

7.2.1 Motion-compensated Hybrid Coders7.2.2 Characteristics of interframe prediction error

signals7.2.3 Quantization error feedback and error

propagation7.2.4 Reference pictures in motion-compensated

prediction7.2.5 Accuracy of motion compensation7.2.6 Hybrid coding of interlaced video signals7.2.7 Optimization of hybrid encoders ..

7.3 Spatio-temporal transform coding7.3.1 Interframe transform and subband coding

7.3.2 Motion-compensated temporal filtering .7.3.3 Quantization and encoding of MCTF frames

7.4 Coding of side information (motion, modes)7.5 Scalable video coding

7.5.1 Scalable hybrid coding7.5.2 Scalable 3D frequency coding

7.6 Multi-view video coding 7.7 Synthesis based video coding

7.7.1 Region-based video coding7.7.2 Distributed source coding7.7.3 Super-resolution synthesis7.7.4 Dynamic texture synthesis

7.8 Video coding standards

9

Page 10: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Table of Contents

8 Speech and Audio Coding8.1 Coding of speech signals

8.1.1 Linear predictive coding8.1.2 Parametric (synthesis) coding8.1.3 Speech coding standards

8.2 Audio (music and sound) coding8.2.1 Transform coding of audio signals8.2.2 Synthesis based coding of audio and sound signals8.2.3 Coding of stereo and multi-channel audio signals8.2.4 Music and sound coding standards

10

Page 11: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Table of Contents

Transmission and Storage of Multimedia Data9.1 Digital multimedia services9.2 Network interfaces9.3 Adaptation to channel characteristics

9.3.1 Rate and transmission control9.3.2 Error control

9.4 Definitions at systems level9.5 Digital broadcast9.6 Media streaming

11

Page 12: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

IntroductionMultimedia communication systems are a flagship of the information technology revolution. The combination of multiple information types, particularly audiovisual information (speech/audio/sound/image/video/graphics) with abstracted (text), smelled or tactile information provides new degrees of freedom in exchange, distribution and acquisition of information. Communication includes exchange of information between different persons, between persons and machines, or between machines only. Sufficient perceptual quality must be provided, which is related to the compression and its interrelationship with transmission by networks. Advanced methodologies are based on content analysis and identification, which is of high importance for automatic user assistance and interactivity. In multimedia communication, concepts and methods from signal processing, systems and communications theory play a dominant role, where audiovisual signals are a primary challenge regarding transmission, storage and processing complexity.

12

Page 13: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

IntroductionBooks• Steinmetz, R,; Nahrstedt, K.: Media Coding and Content Processing. Prentice Hall,2002.• Steinmetz, R.; Nahrstedt, K., “Multimedia Systems”, Springer Verlag, 2004.• Steinmetz, R.; Nahrstedt, K., “Multimedia Applications”, Springer Verlag, 2004.• J.R. Ohm, “Multimedia Communication Technology”, Springer, 2004.Magazines• Multimedia Systems, ACM/Springer• Multimedia Magazine, IEEE

13

Page 14: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

IntroductionWhat is ‘Multimedia’?Simple definition of Multimedia:

Multi - MediaAny kind of system that supports more than one kind of mediumIs Television Multimedia?

Definition:Multimedia means the integration of continuous media (e.g., audio, video) anddiscrete media (e.g., text, graphics, images) through which the digital information can be conveyed to the user in an appropriate way.Multi: many, much, multipleMedium: A means to distribute and represent information

14

Page 15: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Facets of “Medium”1. Perception Medium - How do humans perceive information in a computerenvironment? (by seeing, by hearing, ...)2. Representation Medium - How is the information encoded in the computer? (ASCII, PCM, MPEG, ...)3. Presentation Medium - Which medium is used to output information from the computer or to bring it into the computer?

Input: keyboard, microphone, camera, ...

4. Storage Medium - Where is the information stored?5. Transmission Medium - Which kind of medium is used to transmit the information? (copper cable, radio, ...)6. Information Exchange Medium (combination of storage and transmission media) - Whichinformation carrier will be used for information exchange between different locations?

15

Page 16: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Classification of MediaEach medium defines• Representation values• Representation spaceRepresentation values determine the information representation of different media:• continuous representation values (e.g. electro-magnetic waves)• discrete representation values (e.g. characters of a text in digital form)Representation space determines the technique to output the media information, usuallyvisually (e.g., paper, slideshow) or acoustically (e.g., speakers)• Spatial dimensions:Two dimensional (2D graphics)Three dimensional (holography)

• Temporal dimensions:Time independent (document) - discrete media (e.g. text of a book)Time dependent (movie) - continuous media (e.g. sound, video)

16

Page 17: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Data StreamsWhen transmitted or played out, continuous media need a changing set of data in terms oftime, i.e. data streams. How to deal with such streams?Asynchronous TransmissionSuitable for communication with no time restrictions (discrete media)E.g. electronic mailSynchronous TransmissionBeginning of transmission may only take place at well-defined timesA clock signal runs the synchronization between a sender and a receiverIsochronous TransmissionPeriodic transmissions, time separation between subsequent transmissions is a multiple of a certain unit intervalA maximum and a minimum end-to-end delay for each packet of a data stream (limited jitter) is requiredAn end-to-end network connection is isochronous if it has a guaranteed bit rate and if the jitter also is guaranteed and small

17

Page 18: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Data Stream Characteristics 18

Strongly periodic data streams• Identical intervals T• No jitter (optimally)• Example: uncompressed audio

Weakly periodic data streams• Periodic intervals T• Timing variations in the intervals• Example: segmented transmission

Aperiodic data streams• Arbitrary intervals• Example: transmission of mouse

control signals

tT

tT1 T2 T3 T4 T5 T6

tT1 T2 T3 T1 T2 T3

T T

Page 19: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Data Stream Characteristics 19

T

t

t

tDn

. . .

Strongly regular data streams• Quantity remains constant during the entire lifetime

of the stream• Typical for uncompressed video/audio

Weakly regular data streams• Quantity varies periodically• Can result from some compression techniques• E.g. videos coded with MPEG

Irregular data streams• Quantity is neither constant nor periodically changing• Typical for compressed audio/video• Harder to transmit/process

T

D1 D1 D1 D1 D1

D3D2

D1

D1 D1D2 D2

D3 D3

Page 20: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Data Stream Characteristics 20

Continuous media consist of a time-dependent sequence of individual information units:Logical Data Units (LDUs)

Example: Symphony• A symphony consists of independent movements, movements consists of scores• Using e.g. PCM, 44.100 samples are made per second. On a CD, samples are grouped

into units with a duration of 1/75 second• Possible LDUs with different granularity: movements, scores, groups, samples. Used in

digital signal processing: sampling values as LDUs

Example: Movie• Consists of scenes represented by clips, clips

consist of single frames, frames consist of blocks of e.g. 16x16 pixels. Pixels can consist of chrominance and luminance values

• Using e.g. MPEG, inter-frame coding is used, thus image sequences are the smallest sufficient LDUs

Movie

Clips

Frames

Blocks

Pixels

Page 21: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Fields of the Lecture 21

Page 22: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Content 22

Basics• Audio Technology• Images and Graphics• Video and Animation

Multimedia Systems - Communication Aspects and Services• Voice over IP, Video conferencing• Group Communication, Synchronization• Quality of Service and Resource Management

Multimedia Systems – Storage Aspects• Optical storage media• Multimedia file systems, Multimedia databases

Multimedia Usage• Design and User Interfaces, Abstractions for Programming

Page 23: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Concepts and terminology 23

Page 24: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Concepts and terminology

The classical model assumes independent optimization of source and channel coding for best performance as the optimum solution. A source coding method which achieves optimum compression can be extremely sensitive against errors occurring in the channel, e.g. due to feedback of previous reconstruction errors into the decoding process. This requires joint optimization of the entire chain, such that in fact the best quality is retained for the user while the rate to be transmitted over the physical channel is made as low as possible.

The classical model assumes a passive receiver ('sink'), which is very much related to broadcast services. In multi‐media systems, the user can interact, and can take influence on any part of the chain, even back on the signal generation; this is reflected by providing a back channel, which can also be used by automatic mechanisms serving the user by best quality services. Instead of transmitter and receiver, denotation of devices at the front and back ends as server and client better reflects this new paradigm.

24

Page 25: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Concepts and terminology

The classical model assumes one monolithic channel for which the optimization of source coding, channel coding and modulation is made once. Multimedia communication mostly uses heterogeneous networks, which typically have largely varying characteristics; as a consequence, it is desirable to consider the channels more by an abstract level and perform proper adaptation to the instantaneous channel characteristics. Channels can be networks or storage devices. Recovery at the client side may include analysis which is far beyond traditional channel coding, e.g. by conveying loss characteristics to the server via the back channel.

Multimedia services are becoming more 'intelligent', including elements of signal content analysis to assist the user. This includes support for content related interaction, support in finding the multimedia information which best serves the needs of the user. Hence, the information source is not just encoded at the front end, but more abstract analysis can be performed in addition; the encoding part itself may also include meta information about the content.

Multimedia communication systems typically are distributed systems, which means that the actual processingsteps involved are performed at different places. Elements of adaptation of the content to the needs of the network, to the client configuration, or to the user's needs can be found anywhere in the chain. Finally, temporary or permanent storage of content can also reside anywhere, as storage elements are a specific type of channel, intended for the purpose of later review instead of instantaneous transmission.

25

Page 26: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Concepts and terminology:Quality of Service (QoS)

The QoS relating to network transmission includes aspects like transmission bandwidth, delay and losses. It indirectly contributes to the perceived quality. This will be denoted as Network QoS.

The QoS relating to perceived signal quality includes the entire transmission chain, including the compression performance of source encoding/decoding, and the inter‐relationship with the channel characteristics. This is de‐ noted as Perceptual QoS. An overview over methods for measurement is given in Appendix A.1.

The QoS relating to the overall service quality is at the highest level. It includes aspects like the level of user satisfaction with the content itself, but also the satisfaction concerning additional services, e.g. how good an adaptation to the user's needs is made. Some methods that are used to express this category of QoS with regard to content identification are described in Appendix A.2. This may be denoted as the Semantic QoS.

26

Page 27: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Signal representation by source coding

By multimedia signal compression, systems for transmission and storage of multimedia signals shall generate the most compact representation, such that the highest possible perceptual quality is achieved. Immediately after capturing, the signal is converted into a digital representation having a finite number of samples and amplitude levels. This step already influences the final quality. If the range of rates that a prospective channel can convey, or the resolution required by an application are not known by the time of acquisition, it is advisable to capture the signal by highest possible quality, and scale it later.In the source coder, the data rate needed for digital representation shall be reduced as much as possible. Properties of the signal which allow reduction of the rate can be expressed in terms of redundancy (which is e.g. the typically expected similarity of samples from the signal). The opinion about the quality of the overall system is ruled by the purpose of the consuming at the end of the chain. If the sink is a human observer, it is useful to adapt the source coding method to perceptual properties of humans, as it would be useless to convey a finer granularity of quality than the user can (or would desire to) perceive.In advanced methods of source coding, content-related properties can also be taken into consideration. This can e.g. be done by putting more emphasis on parts or pieces of the signal in which the user is expected to be most interested.

27

Page 28: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Signal representation by source coding

The encoded information is usually represented in form of binary digits (bits). The bit rate is measured either in bit/sample2, or bit per second (bit/s), where the latter results from the bit/sample ratio, multiplied by the samples/s (the sampling rate). An important criterion to judge the performance of a source coding scheme is the compression ratio. This is the ratio between the bit rate necessary for representation of the uncompressed source and its compressed counterpart. If e.g. for digital TV the uncompressed source requires 165 Mbit/s 3, and the rate after compression is 4 Mbit/s, the compression ratio is 165:4=41.25. If compressed signal streams are stored as files on computer discs, the file size can be evaluated to judge the compression performance. When translating into bit rates, it must be observed that file sizes are often measured in KByte, MByte etc., where one Byte consists of 8 bit, 1 KByte=1,024 Byte, 1 MByte=1,024 KByte etc.

28

Page 29: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Signal representation by source coding

29

Page 30: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Signal representation by source coding

Signal analysis: Important principles for this are prediction of signals and frequency analysis by transforms. In coding applications, the analysis step shall be reversible; by a complementary synthesis performed at the decoder, the signal shall be reconstructed achieving as much fidelity as possible. Hence, typical approaches of signal analysis used in coding are reversible transformations of the signal into equivalent forms, by which the encoded representation is as much free of redundancy as possible. If linear systems or transforms are used for this purpose, the removal of redundancy is often called de‐correlation, as correlation expresses linear statistical dependencies between signal samples. To optimize such systems, availability of good and simple models reflecting the properties of the signal is crucial. Methods of signal analysis can also be related to the generation (e.g. properties of the acquisition process) and to the content of signals. Besides the samples of the signal or its equivalent representation, additional side information parameters can be generated by the analysis stage, such as adaptation parameters which are needed during decoding and synthesis.

30

Page 31: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Signal representation by source coding

Quantization: maps the signal, its equivalent representation or additional parameters into a discrete form. If the required compression ratio does not allow lossless reconstruction of the signal at the decoder output, perceptual properties or circumstances of usage should be considered during quantization to retain as much as possible the relevant information.Bit-level encoding: has the goal to represent the discrete set of quantized values by lowest possible rate. The optimization of encoding is mostly performed on basis of statistical criteria.

31

Page 32: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Signal representation by source coding

Important parameters to optimize a source coding algorithm are rate, distortion, latency and complexity. These parameters have mutual influence on each other. The relationship between rate and distortion is determined by the rate distortion function, which gives a lower bound of the rate if a certain maximum distortion limit is required. Improved rate/distortion performance (which means improved compression ratio while keeping distortion constant) can usually be achieved by increasing the complexity of the encoding/decoding algorithm. Alternatively, increased latency also helps to increase compression performance; if for example an encoder is able to look ahead on effects of current decisions on future encoding steps, this provides an advantage.

32

Page 33: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Optimization of transmissionThe interface between the source coder and the channel is also of high importance for the overall Perceptual QoS. - Source encoder removes redundancy from the signal, - Channel encoder adds redundancy to the bit stream for the purpose of protection and recovery in case of losses. At the receiver side, the channel decoder removes the redundancy inserted by the channel encoder, while the source decoder supplements the redundancy which was removed by the source encoder. The operation of source encoding and channel decoding is similar and vice versa. Actually, the more complex part is usually on the side where redundancy is removed, which means finding the relevant information within an overcomplete representation. Source and channel encoding play counteracting roles and should be optimized jointly for optimum performance.

33

Page 34: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Optimization of transmissionIn the context of multimedia systems it often is advantageous to view the channel as a 'black box' for which a model exists. This in particular concerns error/loss characteristics, bandwidth, delay (latency) etc., which are the most important parameters of Network QoS. When parameters of Network QoS are guaranteed by the network, adaptation between source coding and the network transmission can be made in an almost optimum way. This is usually done by negotiation protocols. If no Network QoS is supported, specific mechanisms can be introduced for adaptation at the server and client sides. This includes application-specific error protection based on estimated network quality or usage of re-transmission protocols. Introduction of latency is also a viable method to improve the transmission quality, e.g. by optimization of transmission schedules, temporary buffering of information at the receiver side before presentation is started, or scrambling/interleaving of streams when bursty losses are expected.

34

Page 35: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Optimization of transmission

Today's digital communication networks as used for multimedia signal transmission are based on the definition of distinct layers with clearly defined interfaces. On top of the physical transmission layer, a hierarchy of protocol stacks performs the adaptation up to the application layers. In such a configuration, optimization over the entire transmission chain could only be achieved by cross-layer signaling, which however imposes additional complexity to the transmission.

35

Page 36: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Signal sources and acquisitionMultimedia systems mainly process digital representations of signals, while the acquisition and generation of natural signals will in many cases not directly be performed by a digital device; electro-magnetic (microphone), optical (lens), chemical (film) media may be involved. In such cases, the properties of the digital signal are influenced by the signal conversion process during acquisition. The analog-to-digital conversion itself consists of a sampling step which maps a spatio-temporally continuous signal into discrete samples, and a quantization step which maps an amplitude-continuous signal into numerical values. If natural signals are captured, part of the information originally available in the outside (three-dimensional) world is lost due to- limited bandwidth or resolution of the acquisition device;- "Non-pervasiveness" of the acquisition device, which resides at a singular position in the 3D exterior world, such that the properties of the signal are available only for this specific view or listening point; a possible solution is the usage of multiple cameras or microphones, where however acquisition of 3D spatial information will always be incomplete.

36

Page 37: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Signal sources and acquisition 37

Page 38: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Signal sources and acquisitionIn digital imaging the signal is also sampled in the horizontal dimension, and is converted (quantized) into numerical values instead of continuous-amplitude electrical signals. The image plane of width S1 and height S2 is mapped into N1 and N2 discrete sampling locations and represents a frame sample within a time-dependent sequence. Sampled and spatially bounded images can be expressed as matrices. Often, in the indexing of the samples the top left pixel of the image is assigned with coordinate (0,0) and is the top left element of the matrix as well.

38

Page 39: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Signal sources and acquisitionIn analog video technology (and still holding in the first generations of digital video cameras), interlaced acquisition is widely used, where the even and odd lines are captured at different time instances. Here, a video frame consists of two fields, each containing only half number of lines. This method incurs a time shift between the even and odd lines of the composite frames. When the entire frame is captured simultaneously (as done by movie cameras), the acquisition is progressive. It is expected that in the future most content will be captured progressively.

39

Page 40: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Digital representation of multimedia signals

The process of digitization of a signal consists of sampling (see sec. 2.2) and quantization (see more details in chapter 4). The resultant 'raw' digital format is denoted as Pulse Code Modulation (PCM) representation. These formats are often regarded as the original references in digital multimedia signal processing applications.To capture and represent color images, the most common representation consists of three primary components of active light, red (R), green (G) and blue (B). These components are separately acquired and sampled. This results in a count of samples which is higher by a factor of three as compared to monochrome images. True representation of color may even require more components in a multi-spectral representation.Color images and video are often represented by a luminance component Y and two chrominance (color difference) components. For the transformation between R,G,B and luminance/chrominance representations, different definitions exist, depending on the particular application domain.

40

Page 41: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Digital representation of multimedia signalsFor example, in standard TV resolution video, the following transform is mainly used:

41

For high definition (HD) video formats, the transform

is more commonly used. The possible color variations in the R,G,B color space are restricted such that perceptually and statistically more important colors are represented more accurately. Chrominance components are in addition usually sub-sampled, which is reasonable as the human visual sense is not capable to perceive differences in color by the same high spatial resolution as for the luminance component.

Page 42: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Digital representation of multimedia signalsIn interlaced sampling, sub-sampling of chrominances is mostly performed only in horizontal direction to avoid color artifacts in case of motion, while for progressive sampling both horizontal and vertical directions of chrominance can be sub-sampled into lower resolution. Component sampling ratios are often expressed in a notation C1:C2:C3 to express the relative numbers of samples. For example,- when the same number of samples is used for all three components like in R,G,B, the expression is '4:4:4';- a Y,Cb,Cr sampling structure with horizontal-only sub-sampling of the two chrominances is expressed by the notation '4:2:2', while '4:1:1' indicates horizontal sub-sampling by a factor 4;- if sub-sampling is performed in both directions, i.e. half number of samples in chrominances along both horizontal and vertical directions, the notation is '4:2:0'.The respective source format standards also specify the sub-sampled component sample positions in relation to the luminance sample positions.

42

Page 43: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Digital representation of multimedia signals 43

Page 44: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Digital representation of multimedia signals 44For video representation, besides the total number of bits e.g. required to store a movie, the number of bits per second is important for transmission. It is straightforward to multiply the number of bits per frame by the number of frames per second instead of total number of frames.

Page 45: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Digital representation of multimedia signalsFor standard TV resolution, the source of the digital TV signal is the analog TV signal of 625 lines in Europe (525lines in Japan or US), typically recorded by an interlaced schema. These analog signals are sampled by a rate of 13.55 MHz for the luminance. After removal of vertical blanking intervals, 575 (480) active lines remain. The horizontal blanking intervals (for line synchronization) are also removed, which gives around 704 active pixels per line. The digital formats listed in Tab. 1.2 are storing only those active pixels with a very small overhead of few surplus pixels from the blanking intervals. Japanese and US (NTSC) formats are traditionally using 60 fields per second (30 frames per second), while in Europe, 50 fields per second (25 frames per second) is used in analog TV (PAL, SECAM). The digital standards defining HD formats are more flexible in terms of frame and field rates, allowing ranges of 24, 25, 30, 50 or 60 frames/second, 50 or 60 fields/second; movie material, interlaced and progressive video are supported. For higher resolutions, the '720p' format (720 lines progressive) is widely used in professional digital video cameras. All 'true' HDTV formats have 1080 lines in the digital signal.

There are other commonly‐used formats, some of which are generated by digitally down‐converting the standard TV resolution, e.g. the half horizontal resolution (HHR), the Common Intermediate Format (CIF) or Standard Intermediate Format (SIF) and the Quarter CIF (QCIF). For computer display or mobile devices, also formats such as VGA and QVGA are commonly used. Higher resolutions beyond HD are currently expected to emerge from the professional area (‘Digital Cinema’) into consumer applications. Current plans are to introduce formats with doublenumber of samples horizontally/vertically as compared to HD1080, then called ‘4Kx2K’ or quadrupling the number (‘8Kx4K’). Those formats will only support progressive sampling, but frame rates may become even higher in the future (72 frames per second and beyond).

45

Page 46: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Digital representation of multimedia signals 46This figure gives a coarse impression of the sampled image areas supported in formats between QCIF and HDTV. An increased number of samples can either be used to increase the resolution (spatial detail), or to display scenes by a wider angle. For example, in a cinema movie close-up views of human faces are rarely shown. Movies displayed on a cinema screen allow the observer's eye to explore the scene, while on standard definition TV screens and even more for the smaller formats, this capability is very limited.For medical and scientific purposes, digital images with much higher resolution than in movie production are used, resolutions of up to 10,000x10,000 = 100.000,000 pixels are quite common. Such formats are not realistic yet for realtime acquisition by digital video cameras, as the clock rates for sampling would be extremely high.

Page 47: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Digital representation of multimedia signals

Speech and audio signalsFor audio signals, parameters such as sampling rate and precision (bit depth) take most influence on the resulting data rates of the digital representation. These parameters highly depend on the properties of the signals, and on the requirements for quality. In speech signal quantization, nonlinear mappings using logarithmic amplitude compression are used, which for the case of low amplitudes provides an equivalently low quantization noise as in 12 bit quantization, even though only 8 bit/sample are used. For music signals to be acquired by audio CD quality, linear 16 bit representation is most commonly necessary. For some specialized applications, even higher bit‐depths and higher sampling rates than for CD are used. 

47

Page 48: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Digital representation of multimedia signals 48

Page 49: Multimedia Communication Systems 1fa.ee.sut.ac.ir/Downloads/AcademicStaff/1/Courses/36/MCS...7.5 Scalable video coding 7.5.1 Scalable hybrid coding 7.5.2 Scalable 3D frequency coding

Digital representation of multimedia signals

Need for compression technology

Due to the tremendous amount of rates necessary for representation of the original uncoded formats, the requirement for data compression by application of image, video and audio coding is permanently present, even though the available transmission bandwidth is further increasing by advances in communications technology. In general, the past experience has shown that multimedia traffic increases faster than new capacity is becoming available, and compressed transmission of data is inherently cheaper. If sufficient bandwidth is available, it is more efficiently used in terms of quality that serves the user, if the resolution of the signal is increased. Further, certain types of communication channels (in particular in mobile transmission) exist where the bandwidth is inherently expensive due to physical limitations. This must however be weighted against the complexity that is necessary for the implementation of a compression algorithm, which may lead to higher cost of the device and higher power consumption, which is in particular critical for mobile devices.

49