Top Banner

of 100

Multimedia Middle Ware

Apr 10, 2018

Download

Documents

e_nora_2k
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/8/2019 Multimedia Middle Ware

    1/100

    Helwan University

    Faculty of Engineering

    Department of Electronics,Communications, and Computers

    MULTIMEDIA MIDDLEWARE

    by

    Nora Abdel gaffar Naguib El-morsy

    Bsc. In Telecommunication Engineering, 2005

    [email protected]

    A thesis submitted in partial fulfillment of the requirements for the degree of

    Masters of Science in Telecommunications Engineering

    Supervised by:

    Prof. Mohamed I. El AdawyFaculty of Engineering, Helwan University

    Dr. Hesham A. Keshk Faculty of Engineering, Helwan University

    Dr. Ahmed E. HusseinFaculty of Engineering, Helwan University

    2010

  • 8/8/2019 Multimedia Middle Ware

    2/100

    ii | P a g e

  • 8/8/2019 Multimedia Middle Ware

    3/100

    P a g e | iii

    ACKNOWLEDGEMENT

    It is a pleasure to thank those who made this thesis possible. I would like to

    express my gratitude to Prof. Mohamed I. El-Adawy for his constant support

    and most valuable advice. I would like to thank the rest of the supervisory

    committee for all their help and Dr. Ahmed E. Hussien for the suggestion of

    reference titles.

    I would also like to thank my family for the support they provided me

    through my entire life and in particular, I really cannot express my full

    gratitude to my brother Yasser Naguib who patiently proofread this entire

    thesis. Special thanks go to my brother Wael Naguib without whose

    motivation and encouragement I would not have considered a post graduate

    degree. Above all, to my mother who stood beside me all the time.

    Lastly, I offer my regards to all of those who supported me in any respect

    during the completion of the project.

    I dedicate this thesis to My Mother

  • 8/8/2019 Multimedia Middle Ware

    4/100

    iv | P a g e

  • 8/8/2019 Multimedia Middle Ware

    5/100

    P a g e | v

    PUBLICATIONS

    Nora A. Naguib, Ahmed E. Hussein, Hesham A. Keshk, andMohamed I. El- Adawy Contrast Error Distribution Measurementfor Full Reference Image Quality Assessment, The 18thInternational Conference on Computer Theory and Applications2008, Alexandria, Egypt.

    Nora A. Naguib, Ahmed E. Hussein, Hesham A. Keshk, andMohamed I. El- Adawy Using PFA in Feature Analysis and Selectionfor H.264 Adaptation, World Academy of Science, Engineering and

    Technology, VOLUME 54, JUNE 2009, Paris, France, ISSN: 2070-3724

  • 8/8/2019 Multimedia Middle Ware

    6/100

    vi | P a g e

  • 8/8/2019 Multimedia Middle Ware

    7/100

    P a g e | vii

    ABSTRACT

    In today's world, users have heterogeneous devices connected to a mesh of networks each

    with different capabilities and restrictions. Multimedia content providers need innovative

    approaches to keep not only one version of each video but having the capability to offer

    different bitstreams for a variety of client capabilities as well. The previously used design of

    "one size fits all" systems can not apply in diverse environments presented today. A single

    bitstream with static parameter cannot satisfy the diversity presented on the client side. This is

    why the researchers in Universal Multimedia Access (UMA) are working on the development

    of new techniques for coding multimedia objects with maximum compression efficiency

    along with flexibility in the parameters of the provided video when dealing with client devices.

    The transcoding of multimedia objects requires the presence of intermediate systems that are

    capable of altering the bitstream on demand. Those systems should have the capability of

    manipulating different format of bitstreams. A large number of adaptation techniques exists

    in todays litera ture, each specialized in altering the video bitstream with respect to only one

    dimension namely temporal (frame-rate), spatial (resolution), Signal to Noise Ratio (SNR), orformat conversion. In real world, adaptation of video sequences should take the form of

    multi-dimensional adaptation allowing the system to do a combination of reduction processes

    on different parameters of video sequence while providing the best possible quality.

    In this thesis, we have focused on the transcoder policy module. While most of the previous

    studies in multimedia transcoding focused on the transcoding techniques, the lack of control

    algorithm rendered those techniques useless. The study was directed toward the creation of

    an offline data analysis model for transcoders policy module.

    The results and analysis provided in this thesis help toward the creation of policy module that

    control the transcoder operation for universal multimedia access.

    KEYWORDS: Multimedia Transcoding, Objective Quality Assessment, Universal

    Multimedia Access.

  • 8/8/2019 Multimedia Middle Ware

    8/100

  • 8/8/2019 Multimedia Middle Ware

    9/100

    P a g e | ix

    4-3-3 Prediction Accuracy 48

    4-3-4 Prediction Monotonicity 48 4-3-5 Prediction Consistency 49 4-4 Results 49

    4-4-1 Overall Performance 50 4-4-2 Cross-Distortion Performance 50 4-4-3 Logistic Regression Performance 53 4-4-4 Complexity Performance 52

    Data Analysis 63 5-1 Introduction 63 5-2 Offline Data Analysis Model 64

    5-3 H.264 Setup 65 5-4 Test Sequences 66 5-5 Features 66

    5-5-1 Feature Definitions 68 5-5-1-1 Source Domain Features 68 5-5-1-2 Resources Required 68 5-5-1-3 Coded Domain features 69

    5-5-2 Analysis and selection 69 5-6 Results 70 5-7 Transcoder Configuration 73

    5-8 Transcoder Setup 74 5-9 Clustering 76

    Conclusion and Future Work 79 6-1 Conclusion 79 6-2 Future Work 82

    Bibliography 83

  • 8/8/2019 Multimedia Middle Ware

    10/100

    x | P a g e

    LIST OF FIGURES

    FIGURE1-1 MULTIMEDIAMIDDLEWARE.........................................................................................3 FIGURE2-1 MULTIMEDIACOMMUNICATIONS STUDY AREAS(2001 ITU-T) ..........................................8 FIGURE2-2 GENERALARCHITECTURE OFCODING ALGORITHMS..........................................................9 FIGURE2-3 SCALABLEBITSTREAMS .............................................................................................. 10 FIGURE3-1 BLOCK DIAGRAM OF THEPERCEPTUALDISTORTION METRIC(PDM) ................................. 19 FIGURE3-2 BLOCK DIAGRAM OF THESTRUCTURALSIMILARITY......................................................... 20

    FIGURE3-3 BLOCKDIAGRAM OF THEMULTI-SCALESTRUCTURALSIMILARITYL: LOW PASS FILTERING; 2: DOWN SAMPLING BY2 .................................................................................................... 22

    FIGURE3-4 CONCEPTUAL DIAGRAM OF THEVIF ............................................................................ 22 FIGURE3-5 SUBJECTIVEEXPERIMENTS: VIEWING MODES (ON THE LEFT) SCORESCALE(ON THERIGHT).

    (A) DOUBLESTIMULUSIMPAIRMENT SCALE(DSIS) (B) DOUBLESTIMULUSCONTINUOUS Q UALITYSCALE(DSCQS) (C) SINGLESTIMULUSCONTINUOUS Q UALITYSCALE(SSCQS) ....................... 25

    FIGURE3-6 (A) VIDEO CODING LAYER(VLC)ANDNETWORKABSTRACTIONLAYER(NAL)ARRANGEMENT. (B) NALUNIT ................................................................................................................ 29

    FIGURE3-7 BLOCK DIAGRAM OFH.264 ENCODER......................................................................... 30 FIGURE3-8 BLOCK DIAGRAM OF THEH.264 DECODER................................................................... 30 FIGURE3-9 H.264 PROFILES...................................................................................................... 33 FIGURE3-10 HOMOGENEOUS TRANSCODING................................................................................ 35 FIGURE3-11 TRANSCODERIMPLEMENTATION............................................................................... 36 FIGURE3-12 UTILITYMODEL..................................................................................................... 36 FIGURE3-13 INFO-PYRAMID BASED CONTROL SCHEME................................................................... 37 FIGURE3-14 THREE DIMENSIONAL VIEW...................................................................................... 38 FIGURE3-15 SYSTEM OVERVIEW................................................................................................. 39 FIGURE3-16 ADAPTATION, RESOURCE, UTILITY SPACES.................................................................. 41 FIGURE4-1 BLOCKDIAGRAM OF THECONTRASTERRORDISTRIBUTION(CED) ................................... 46 FIGURE4-2 SCATTER PLOT OFVQRS AGAINSTDMOS VALUES(BLUE), ANDNONLINEARLOGISTIC FITTING

    CURVE(BLACK). THIS WAS CALCULATED FOR6 VQM: PSNR, SSIM, VIF, PD-VIF, CED, LOG(CED)RESPECTIVELY................................................................................................. 54

    FIGURE4-3 SCATTER PLOT OF PREDICTEDDMOS (VQRS AFTER LOGISTIC REGRESSION) AGAINSTDMOS VALUES. THIS WAS CALCULATED FOR6 VQM: PSNR, SSIM, VIF, PD-VIF, CED, LOG(CED) RESPECTIVELY................................................................................................................. 57

    FIGURE4-4 CALIBRATIONCURVES FOR EACH ERROR DOMAIN: JPEG2K(GREEN), JPEG (RED), WHITENOISE (BLUE), GAUSSIANBLUE(MAGENTA), FASTFADING(CYAN) AND ALL ERROR DOMAINS(BLACK). THIS WAS CALCULATED FOR6 VQM: PSNR, SSIM, VIF, PD-VIF, CED, LOG(CED) .... 60

    FIGURE5-1 BLOCK DIAGRAM OFMULTIMEDIAMIDDLEWARE.......................................................... 65 FIGURE5-2 TESTSEQUENCESDESCRIPTION.................................................................................. 67 FIGURE5-3 STANDARDTRANSCODERCONFIGURATION................................................................... 73

  • 8/8/2019 Multimedia Middle Ware

    11/100

    P a g e | xi

    FIGURE5-4 ADOPTED TRANSCODER CONFIGURATION..................................................................... 74

    FIGURE5-5 NORMALIZEDBITRATE AGAINST DIFFERENT TRANSCODING PARAMETERS FOR ALL THE TESTSEQUENCES.................................................................................................................... 76

    FIGURE5-6 DENDROGRAM OF THE GENERATED CLUSTERS............................................................... 77 FIGURE5-7 NORMALIZEDBITRATE AFTER ADDING THE NO TRANSCODING VALUES............................... 77

  • 8/8/2019 Multimedia Middle Ware

    12/100

    xii | P a g e

    LIST OF TABLES

    TABLE1 COMPARISON BETWEEN THE PSNR, SSIM, CED, PD-VIF, LOG(CED), LOG(VIF)WITH RESPECTTO CC: PEARSONCORRELATIONCOEFFICIENT, SROCC: SPEARMANRANKCORRELATIONCOEFFICIENT, RMSE: ROOT MEANSQUAREERROR ............................................................. 51

    TABLE2 PEARSON CORRELATIONCOEFFICIENT OF THESSIM, CED, PD-VIF, LOG(CED), LOG(VIF). CALCULATED FOR THE DISTORTION DOMAINSJPEG2000, JPEG, WHITENOISE, GAUSSIANBLUR, ANDFASTFADING........................................................................................................... 51

    TABLE3 SPEARMANRANKCORRELATIONCOEFFICIENT OF THESSIM, CED, PD-VIF, LOG(CED),

    LOG(VIF). CALCULATED FOR THE DISTORTION DOMAINSJPEG2000, JPEG, WHITENOISE, GAUSSIANBLUR, ANDFASTFADING................................................................................... 51

    TABLE4 ROOT MEASSQUAREERROR OF THESSIM, CED, PD-VIF, LOG(CED), LOG(VIF). CALCULATEDFOR THE DISTORTION DOMAINSJPEG2000, JPEG, WHITENOISE, GAUSSIANBLUR, ANDFASTFADING. ........................................................................................................................ 52

    TABLE5 EVALUATION OF THEQ UALITYMETRICS............................................................................ 52 TABLE6 SOURCEDOMAIN FEATURES........................................................................................... 71 TABLE7 RESOURCEFEATURES..................................................................................................... 72 TABLE8 CODED DOMAIN FEATURES............................................................................................ 72 TABLE9 FINALTRAIL................................................................................................................. 72

  • 8/8/2019 Multimedia Middle Ware

    13/100

    P a g e | xiii

    ACRONYM

    ARU Adaptation / Resource / Utility

    CED Contrast Error Distribution

    CPDT Cascaded Pixel Domain Transcoder

    DCT Discrete Cosine Transform

    DFT Discrete Fourier TransformDMOS Differential Mean Opinion Score

    DSCQS Double Stimulus Continuous Quality Scale

    DSIS Double Stimulus Impairment Scale

    DWT Discrete Wavelet Transform

    FIR Finite Impulse Response

    FR QA Full Reference Quality AssessmentHVS Human Visual System

    ISO/IEC International Organization for Standardization

    / International Electro-technical Commission

    IT Information Technology

    ITU-R International Telecommunication Union

    Radio Communication

    ITU-T International Telecommunication Union

    Telecommunications

    MM FSA MultiMedia Framework Study Areas

    MPEG Motion Pictures Experts Group

    MSE Mean Square Error

    NAL Network Abstraction Layer

  • 8/8/2019 Multimedia Middle Ware

    14/100

    xiv | P a g e

    NR QA No Reference Quality Assessment

    NSS Natural Scene Statistics

    PCA Principle Component Analysis

    PDM Perceptual Distortion Metric

    PFA Principle Feature Analysis

    PSNR Peak Signal to Noise Ratio

    QoE Quality of Experience

    RR QA Reduce Reference Quality AssessmentSDOs Standards Development Organizations

    SG Study Group

    SNR Signal to Noise Ratio

    SSCQS Single Stimulus Continuous Quality Scale

    SSIM Structural Similarity

    UMA Universal Multimedia Access VCL Video Coding Layer

    VIF Visual Information Fidelity

    VQEG Video Quality Experts Group

    VQM Video Quality Metric

    VQR Video Quality Rating

  • 8/8/2019 Multimedia Middle Ware

    15/100

    P a g e | 1

    C h a p t e r 1

    I n t r o d u c t i o n

    1.

    1-1 Motivation

    Multimedia plays an important role in our life. We now have terms that were

    introduced to industry, culture and leisure that solely depend on theevolvement of the Multimedia Communications field. Working with another

    team member overseas through your laptop was never possible if it were not

    for the video conferencing capabilities. The term webinar was not used until

    few years ago when it was found that a web based seminar would be more

    effective in reaching all its target audience with no regard to distances apart.

    Multimedia objects can be described as the highest demanding object

    transferred between networks, where the Quality of Experience (QoE) [1] is

    the most important thing. The slightest delay or error would heavily affect the

    quality and render the multimedia object useless. This however doesnt

    change the fact that multimedia is the most popular type of data on the

    internet.

  • 8/8/2019 Multimedia Middle Ware

    16/100

    2 | P a g e

    The growth of users with access to the internet along with the tremendous

    increase in their network capabilities and mobility, made way to the increase

    in amount of data accessed and uploaded through the internet. This data as a

    whole contains at least 70% of it as multimedia objects. Those users spend

    more than 20% of their time away from their primary workplace.

    For a relatively long time now, we are used to having two types of networks

    available to us. Telecommunications and IT (Information Technology)networks. Though we have interconnections between them, we havent yet

    reached the combination of the two. To achieve this merge the ITU-T

    (International Telecommunications Union - Telecommunication) is working

    on the standardization of what is called Next Generation Networks.

    The work of study group 16 is focused in providing guidelines for Network

    of the Networks that unifies the view points of end users, standard

    committee, telecommunication and IT providers. This will allow the

    convergence of all services under the umbrella of one network, and the

    cooperation of content providers and network service providers to serve end

    users better.

    This advancement in the telecommunications networks and deviceinteroperability led to increasing the importance of multimedia objects.

    Multimedia communication is expected to dominate the field of

    communications in the following 10 years. This fact makes it crucial for us to

    tackle the problem of exchanging multimedia objects seamlessly in these

    changing environments. The research presented in this thesis is a trail to

    examine some of the open issues in the field of multimedia communications.

  • 8/8/2019 Multimedia Middle Ware

    17/100

    P a g e | 3

    1-2 Problem Statement

    Multimedia middleware are intermediate systems between the client and the

    content server that provides a number of complementary services. The

    generalized block diagram of multimedia middleware is illustrated in Figure

    1-1. Those servers are used to transcode multimedia objects before delivery

    to client devices. This transcoding will help in situations where, we do not

    want to exhaust network resources or device processing power when users

    are just reviewing multimedia objects to select one, or when the client device

    does not have a high screen resolution.

    Figure 1-1 Multimedia Middleware

    Transcoding can be done with respect to numerous domains, none of which

    will result in the same combination of resources. The transcoding middleware

    should be able to evaluate the client request, analyze the content of the

    multimedia object requested, choose a transcoding scheme, then Transcode

  • 8/8/2019 Multimedia Middle Ware

    18/100

    4 | P a g e

    and deliver it to the user. This middleware server will need to fit within the

    existing system and be transparent to both content server and client.

    A multimedia middleware should possess the following qualities in order to

    be transparent to the client side:

    When adding a new multimedia object to the content server, the timerequired for the transcoding server to analyze the content of the

    video should be minimized. Time from the reception of client requests till delivery of the content

    back to the user should be minimized.

    Transcoding server should not require the presence of any pixeldomain information in any of its processes.

    The server should have the means to assess the quality of thegenerated version of the multimedia object and choose betweendifferent transcoding schemes.

    The above qualities provide a roadmap for the implementation of transcoding

    servers. However for those servers to function properly, a set of offline data

    analysis studies for multimedia objects should be done. In the available

    literature, a number of studies worked on this point but none have reached

    the optimal criteria satisfying the above stated qualities. Our work inmultimedia middleware is focused toward the implementation of the

    transcoder policy module. We have divided the analysis into two points. A

    quality assessment model has been developed for the use in offline data

    analysis along with an overall feature analysis for the selection of transcoding

    schemes.

    1-3 Objectives and contributions

    The middleware server request cycle consists of the following:

  • 8/8/2019 Multimedia Middle Ware

    19/100

    P a g e | 5

    Data Analysis of the pre-encoded video stream.

    Policy Module: Choosing a transcoding scheme that best fits theclient requirement and have the best quality of all possible solutions.

    Transcoding the video stream.

    The objective of this research is to examine the first two stages. This work

    will help toward the practical implementation of the middleware server

    control module. The contribution of this research was concentrated in thefollowing:

    Discovering the features that best serve in clustering the multimediaobjects and provide means of predicting the way those objects wouldreact to different transcoding schemes.

    Developing a new quality assessment metric for the evaluation andthe choice of the best available transcoding scheme.

    1-4 Thesis Outline

    This thesis is organized as follows: chapter 2 introduces some of the

    multimedia communications concepts used in the discussion presented in this

    thesis, chapter 3 provides a review of the related literature, chapter 4

    introduces the proposed objective quality assessment model along with theevaluation of its performance, chapter 5 presents the offline data analysis and

    the features analysis for the implementation of the transcoder policy module,

    and chapter 6 presents the conclusion and future work.

  • 8/8/2019 Multimedia Middle Ware

    20/100

    6 | P a g e

  • 8/8/2019 Multimedia Middle Ware

    21/100

    P a g e | 7

    C h a p t e r 2

    M u l t i m e d i a C o m m u n i c a t i o n sB a s i c s

    2.

    2-1 ITU-T MediaCom2004 project

    The advances in the multimedia communications depend not only on fields

    that study multimedia objects but on the development of underlying

    networks and services that will allow the integration of complex multimedia

    objects in resource limited network, taking into consideration the quality

    received by end users.

    ITU-T SG16, the lead Study Group for Multimedia, is working on project -

    MEDIACOM 2004 (Multimedia Communication 2004) [2]. The objective of

    the Mediacom 2004 Project is to establish a framework for Multimediastandardization for use both inside and external to the ITU. This framework

    will support the harmonized and coordinated development of global

    multimedia communication standards across all ITU-T and ITU-R Study

    Groups, and in close cooperation with other regional and international

    standards development organizations (SDOs).

  • 8/8/2019 Multimedia Middle Ware

    22/100

    8 | P a g e

    Figure 2-1 presents the Multimedia framework study areas (MM FSA) as

    defined by the Mediacom project.

    Figure 2-1 Multimedia Communications Study areas (2001 ITU-T)

    2-2 MPEG-7 and MPEG-21

    Another important segment of research is the semantic annotation of

    multimedia content. This annotation provides a bigger picture view on the

    overall information that resides in a webpage. As a result, the content of this

    webpage can be classified based on its importance and then delivered.

    MPEG-7 and MPEG-21 are two standards developed by the Moving

    Pictures Experts Group MPEG in 2003. Those standards are not intended

    for the coding of Multimedia objects as the proceeding standards. However,they aim in the integration with the other coding algorithms to allow the

  • 8/8/2019 Multimedia Middle Ware

    23/100

    P a g e | 9

    transmission of user preference and context information back and forth

    between clients and content servers.

    2-3 Coding Standards

    Multimedia objects are known to contain a large amount of correlated data.

    Coding algorithms are designed to decouple these associations in both the

    temporal and spatial dimension and by that achieve a high compression rate

    without losing valuable information. Figure 2-2 illustrates the maincomponents of coding algorithms.

    Figure 2-2 General Architecture of Coding Algorithms

    MPEG-4 and H.264 are the newest standards for multimedia coding

    developed by the MPEG. They both rely on the same coding principles but with significantly different visions. MPEG-4 is mainly concerned with

    flexibility where H.264 features efficient compression and reliability.

    As stated above, the difference between the two standards does not reside in

    the theory of the compression module itself, but in how the input is treated.

    In MPEG-4, the input of the compression module is a series of multimedia

  • 8/8/2019 Multimedia Middle Ware

    24/100

    10 | P a g e

    objects that are contained in video frames. H.264 uses frame based

    compression.

    2-4 Transcoding Vs Scalable Coding

    Scalable Video Encoding is the coding of video streams to contain a number

    of substreams that can be decoded separately. The bitstream structure is

    shown in Figure 2-3. First, a base substream containing the most basic

    information that allows client devices to render the video with the lowestobtainable quality is considered. This is usually the case in mobile devices

    where the client is connected on a low bandwidth network. That base

    substream is followed by a series of enhancement layers that can be

    downloaded on-demand; this is usually the case when the client can afford

    more resources to increase the quality of received video.

    Figure 2-3 Scalable Bitstreams

    On the other hand, transcoding can be achieved by the presence of

    intermediate systems (Multimedia Middleware) between server and client. On

    these subsystems the video is re-encoded upon receiving client requests. Those requests will contain the characteristics of the client device along with

  • 8/8/2019 Multimedia Middle Ware

    25/100

    P a g e | 11

    network resources available. In this thesis the words transcoding and

    adaptation will be used interchangeably.

    The most basic form of a transcoder is a back to back encoder-decoder

    configuration. However, this configuration requires a heavy processing power

    on the intermediate system. Another form is based on partially decoding the

    stream and manipulating the data in its pre-coded form without referring to

    the pixel domain data. Those transcoding systems exploit dependenciesbetween coded domain and pixel domain information along with full

    understanding of coding scheme itself.

    Scalable coding and transcoding are the two coexisting lines of UMA

    researches where each has its advantages and limitations. Scalable coding has

    the advantage of processing videos in advance therefore it does not require

    any intermediate system. However, it means that the video bitstream

    resource/quality degradation can be done only on predefined steps and

    therefore it does not comply with the exact client requirements.

    In other words, scalable coding provides error margin between the provided

    bitstream and the requested resource/quality. Meanwhile transcoding tailors

    video bitstreams to the exact device/network requirements provided by theclient requests.

    Two other limitations involved on practical implementation of the scalable

    coding are as follows:

  • 8/8/2019 Multimedia Middle Ware

    26/100

    12 | P a g e

    The decoders compliance to the scalable coding format. Non

    compliant decoders will only decode the base layer of the bitstreamyielding a low quality video on clients which can support higherquality.

    Enormous number of video bitstreams is available on todaysnetworks that adapt single layer bitstreams; In order to accommodatethe scalable coding techniques transcoding is required for all present

    videos

    2-5 Quality Assessment

    Quality assessment is an important step in transcoding / adaptation process.

    In proxy /middleware, the choice of the transcoding dimension and the exact

    parameter is dependent on the quality produced. Although meeting client

    requests and resources is the steering wheels of the transcoding middleware,

    the QoE on client side is what this whole system is about.

    During assessment of reduced bit stream, we should bear in mind that quality

    measurement of multimedia objects is not defined as fidelity of the new

    bitstream to original. Quality when it comes to multimedia objects is defined

    as the perceived quality which means that some errors are more important

    than others. The perceived quality is related to the limitation within the

    Human visual system (HVS) where some errors are neutral while others areseverely perceived by it.

    Peak Signal to Noise Ratio (PSNR) is considered to be the most recognized

    quality metric. This metric calculates error power within the image.

    Consequently, it overlooks the significance of the affected data within the

    image, along with modification in HVS response due to this variation in data.

  • 8/8/2019 Multimedia Middle Ware

    27/100

    P a g e | 13

    The degree by which the alteration of video bitstream has affected the

    perceived quality can be calculated by either subjective experiments or

    objective quality metrics. Subjective experiments refer to viewing videos by

    human observers where each observer rates the video quality and then a

    mean opinion record is calculated for this video. Objective quality metrics

    measures degradation of visual perceptual quality defining criterion for

    describing perceptual error.

  • 8/8/2019 Multimedia Middle Ware

    28/100

    14 | P a g e

  • 8/8/2019 Multimedia Middle Ware

    29/100

  • 8/8/2019 Multimedia Middle Ware

    30/100

    16 | P a g e

    on the screen with respect to the original video stream. This clarifies why any

    fidelity measure as the SNR would fail in describing the opinion of the

    observer.

    Although, HVS is a complex organism, it is limited when it comes to error

    perception. These limitations are the reason why an error with less power

    might contribute in a much severe way to the degradation of image quality.

    Up until now, subjective experiments have been used for the assessment of

    multimedia quality. However those experiments are impractical, expensive

    and time consuming. Hence, they cannot be used in estimating the quality of

    multimedia objects during its reproduction. Researchers in the field of

    multimedia quality assessment are working on the development of objective

    metrics that can predict the observers opinion about the quality of

    multimedia objects.

    3-1-2 Simple Quality Metrics Simple error power models are considered to be the most recognized quality

    metric. This metric calculates the error power within the image.

    Consequently, it overlooks the significance of the affected data within the

    image, along with the modification in the HVS response due to this variationin data.

    To calculate the PSNR between the original and distorted images, we start by

    calculating the MSE (Mean Square Error) of pixels grayscale values.

    =1

    , ,

    , ,

    2

    [3]

  • 8/8/2019 Multimedia Middle Ware

    31/100

    P a g e | 17

    Where: images have a width of X pixels and height of Y pixels and the video

    sequence contains F frames.

    = 10 102

    [3]

    Where: I is the maximum value that a pixel can take.

    From the above we can see that the MSE defines the difference between thetwo signals and the PSNR defines the fidelity of the distorted image to the

    original. In [4] the authors illustrate why error power cannot be used as a

    metric for the perceptual quality. They considered the following cases:

    Different types of visual error with equal power introduced to thesame image.

    Identical error introduced to different images.

    In these two cases, although the error has identical power values, the two

    images may enclose different perceptual quality. In other words, the type of

    error should be studied with respect to its effect on HVS and the image in

    hand.

    3-1-3 Objective Quality Metrics The above argument about error power based metrics led the researchers to

    explore, and formulate a definition for the perceived quality. Some of the

    metrics were designed to be generic and utilized the basic understanding of

    the limitations in the HVS. The metric itself was designed to mimic the

    processing done in the human eye and brain. Other metrics were more

  • 8/8/2019 Multimedia Middle Ware

    32/100

    18 | P a g e

    specific and relied on the prior information about the distortion process that

    multimedia object went through. (For example: Coding algorithms introduce

    blocking Artifacts)

    Three types of references can be used for quality assessment: Full Reference

    (FR), Reduced Reference (RR), and No Reference (NR). In FR QA (Full

    Reference Quality Assessment) the original image is compared to the

    reproduced image, while in RR QA only some features of the original imageare used in the comparison. NR QA is the techniques that rely on the natural

    image features to decide about the quality of the image without referring to

    any outside information. Obviously, the FR and RR are not very suitable for

    the transmission quality problem, due to the need for the original image or

    some of its features at the receiver. However, FR and RR are very useful in

    cases of developing coding and transcoding techniques. These metrics are

    used in order to judge the quality of the image where the original is already

    available.

    In the following sections, we are going to present a number of FR QA

    metrics that have been developed by researchers in the quality assessment

    field, along with the underlying definition of the perceptual quality.

    3-1-3-1 USING DCT, DWT, AND DFT Authors in [5] examined the effect of decoupling inter-pixel dependencies by

    using transforms like Discrete Cosine Transform (DCT), Discrete Wavelet

    Transform (DWT) or Discrete Fourier Transform (DFT). Their study shows

    that by transforming images to the frequency domain and then doing a simple

    pixel difference, the resulting performance would surpass complex quality

    measures.

  • 8/8/2019 Multimedia Middle Ware

    33/100

    P a g e | 19

    3-1-3-2 P ERCEPTUAL D ISTORTION METRIC (PDM)

    Figure 3-1 Block diagram of the Perceptual Distortion Metric (PDM)

    In [3], a generic model of HVS is used as an objective quality assessment

    metric. The block diagram of the metric is illustrated in Figure 3-1. The color

    space conversion block relies on the fact that HVS treats colors as a nonlinear

    color differences rather than RGB. i.e., White-Black, Red-Green, and Blue- Yellow. The perceptual decomposition is a set of spatio-temporal filters that

    would mimic the nonlinearity of the neuron responses in HVS to different

    Spatio-temporal patterns. The HVS sensitivity decreases with high spatial

    frequency, the contrast gain control module is used to compensate for this

    feature.

  • 8/8/2019 Multimedia Middle Ware

    34/100

    20 | P a g e

    3-1-3-3 S TRUCTURAL SIMILARITY

    Figure 3-2 Block diagram of the Structural Similarity

    The argument used in this metric is based on the idea where the human eye is

    tuned to detect structural error. By definition there are three types of error

    that can be introduced to multimedia objects i.e., variation of average local

    luminance or contrast and structural error. The first two dont contribute in

    the degradation of the perceived quality. Thus by removing those two error

    types, we can calculate the structural error that would result in defining the

    amount of degradation in the image quality. The block diagram of theStructural Similarity (SSIM) is shown in Figure3-2.

    The definition of those three types of error is as follows:

    Luminance error:

    , = 22 + 2

  • 8/8/2019 Multimedia Middle Ware

    35/100

    P a g e | 21

    Contrast error:

    , =22 + 2

    Structure error:

    , = 2+

    2

    Where:

    x: Mean of image X

    y : Mean of image Y

    x: Variance of image X

    y: Variance of image Y

    xy: Covariance between image X and Y

    From the above, Authors in [4], [6], and [7] presented the structural error as

    the cosine of the angle between the original (x-) and distorted image (y-).

    This logic assumes that after the removal of the luminance and contrast

    errors, the resulting error would be illustrated as a circle where all error have

    the same error power but different angle defining its effect on the perceived

    quality.

  • 8/8/2019 Multimedia Middle Ware

    36/100

    22 | P a g e

    Figure 3-3 Block Diagram of the Multi-scale Structural Similarity L: Low passfiltering; 2: Down sampling by 2

    In [8], an improvement of the system showed that running the metric on

    downscaled images and combining the results would be more effective in

    catching all the structural error in the image, and compensate for the different

    watching distances. A diagram of the Multi-scale SSIM is in Figure3-3.

    3-1-3-4 V ISUAL I NFORMATION F IDELITY ANDN ATURAL SCENE S TATISTICS

    Figure 3-4 Conceptual diagram of the VIF

    Although at the beginning of this discussion we argued that fidelity measures

    do not correlate well to the perceived quality. This concept is illustrated in

    Figure 3-4. The authors in [9-10] presented their concept of a fidelity measurethat uses the natural scene statistics to calculate the amount of information

  • 8/8/2019 Multimedia Middle Ware

    37/100

    P a g e | 23

    conveyed correctly between the original and distorted image through to the

    observer.

    The Natural Scene Statistics (NSS) rely on the fact that natural scenes occupy

    tiny subspace out of all possible permutations for pixel values, and by that, it

    is easy to describe natural undistorted images with a number of statistical

    features. Visual Information Fidelity (VIF) defines the perceived quality as the

    difference in mutual information between the input and output of HVS forno-distortion and distortion channels.

    3-2 Subjective Experiments

    Subjective experiments [11] are required for the evaluation of Video Quality

    Metrics (VQMs). In these experiments, human subjects are requested to

    review, evaluate and assess the quality of images available in the database. Thesubjects are normally screened for visual acuity and color blindness, to make

    sure those quality score describe the accurate perceived quality for each

    image. Moreover, viewing session should last for less than 30 minutes to

    reduce the effect of fatigue on the observers.

    The output of these experiments is the Differential Mean Opinion Score

    (DMOS) of each image in the database. Those DMOS values serve as

    benchmark for perceived quality, and are to be compared with the output

    values of objective model when they are evaluated. Generally the evaluation

    significance is affected by the size, and the different error types used in the

    database.

  • 8/8/2019 Multimedia Middle Ware

    38/100

    24 | P a g e

    There are a number of internationally accepted test methods to perform

    subjective experiment. They are illustrated in Figure 3-5 and following is a

    description of their scheme:

    3-2-1 Double Stimulus Impairment Scale (DSIS)Human subjects review reference / test image sets, then rate the images in a

    discrete scale ranging from: Imperceptible, perceptible, slightly annoying,

    annoying, and very annoying.

    3-2-2 Double Stimulus Continuous Quality Scale (DSCQS)In this test method, subjects are blind as to which image is the reference.

    Each reference/ test set is viewed twice. The rating of the images is scored on

    two scales continuous and discrete.

    3-2-3 Single Stimulus Continuous Quality Scale (SSCQS) This method differs from DSCQS in the number of viewing times for the

    reference/ test sets. Therefore, it is used for longer sequences (several

    minutes) whereas the DSCQS is only suitable for sequences of about 20-30

    seconds. Furthermore, the SSCQS resemble the real viewing conditions more

    than DSCQS.

  • 8/8/2019 Multimedia Middle Ware

    39/100

    P a g e | 25

    Figure 3-5 Subjective Experiments: Viewing Modes (On the Left) ScoreScale (On the Right). (A) Double Stimulus Impairment Scale (DSIS) (B)Double Stimulus Continuous Quality Scale (DSCQS) (C) Single Stimulus

    Continuous Quality Scale (SSCQS)

    3-3 VQEG

    The Video Quality Experts Group (VQEG) was formed on 1997. Its main

    objective was to validate and standardize objective quality assessment models.

    Moreover, that group works toward standardization of performance metricsfor validating the objective models. So far, the VQEG have completed two

    sets of tests.

    Phase I (1998): The subjective experiment used DSCQS. Nineobjective quality assessment models were evaluated. This test showedthat 8 out of 9 models gave results that are indistinguishable from

    PSNR.

  • 8/8/2019 Multimedia Middle Ware

    40/100

  • 8/8/2019 Multimedia Middle Ware

    41/100

    P a g e | 27

    at least 20-29 human observers. Single stimulus method was used. The

    database was rated in 7 separate viewing sessions.

    The fact that images were reviewed in more than one session led to a

    mismatch scale in the scores given to those images. Therefore, an extra round

    of review was performed using double stimulus methodology and a randomly

    selected 50 images.

    3-4-3 Realignment Process The raw scores for each subject were converted to difference scores (between

    the test and the reference) and then Z-scores and then scaled and shifted to

    the full range (1 to 100). Finally, a Difference Mean Opinion Score (DMOS)

    value for each distorted image was computed.

    For a single image if the single score is considered an outlier, that is outside acertain interval from the standard deviation of the mean score for the image.

    This point is removed from the DMOS calculation for that image.

    A subject is rejected if the number of outliers is more than a specific accepted

    rate. Hence all the ratings done by that subject are excluded from the final

    dataset.

    3-4-4 Datasets The database of images is accompanied with a number of datasets that define

    the benchmark values of the perceived quality for each image of the 982

    images available in the database.

  • 8/8/2019 Multimedia Middle Ware

    42/100

    28 | P a g e

    dmos.mat: contains two arrays of length 982 each: DMOS and orgs.

    o orgs(i)==0 for distorted image, and orgs(i)==1 for referenceimages.

    o DMOS(1:227): JP2K, DMOS(228:460):JPEG,DMOS(461:634): White Noise, DMOS(634:808): GaussianBlur, DMOS (809:982): Fast Fading]

    o The values of DMOS when corresponding (orgs==1) are

    zero (they are reference images)

    refnames_all.mat: contains a cell array refnames_all.

    o refnames_all{i} is the name of the reference image for imagei whose DMOS value is given by DMOS(i).

    o If orgs(i)==0, then this is a valid DMOS entry. Else if orgs(i)==1 then image i denotes a copy of the reference

    image.

    DMOS_realigned.mat: DMOS values after realignment.

    3-5 H.264 Review

    Throughout this study, the H.264 standard was used as the main compression

    technique for encoding and transcoding all test sequences. In this section we

    are going to review this standard and demonstrate its new features.

    H.264 is the newest in its series known as International standard 14496-10 or

    MPEG-4 part 10 Advanced Video Coding of ISO/IEC. The standard was

    finalized in March 2003 and approved by the ITU-T in May 2003 [14-16]

    The encoder decoder configuration is separated into two separate stages

    Video Coding Layer (VCL) and the Network Abstraction Layer (NAL).

    Figure 3-6 shows the arrangement of both layers.

  • 8/8/2019 Multimedia Middle Ware

    43/100

    P a g e | 29

    Figure 3-6 (A) Video Coding Layer (VLC) and Network Abstraction Layer(NAL) arrangement. (B) NAL unit

    VCL is responsible for efficient coding of video frames and delivering coded

    information to be formatted by NAL. The main aim of NAL is to arrange all

    of the coded information in a way that would be comprehended by the

    receiver. All the information are sent in what is known as NAL units, these

    units act as packets that can be handled separately by the transport layer fortransmission or storage in file. Each NAL unit consists of a NAL header that

    specifies sequencing of the information within the unit, and the payload data.

    H.264 coding standard falls into the category of Block-based motion

    compensated video compression. Figure 3-7 and Figure 3-8 show the detailed

    block diagram of encoder and decoder.

  • 8/8/2019 Multimedia Middle Ware

    44/100

    30 | P a g e

    Figure 3-7 Block diagram of H.264 Encoder

    Figure 3-8 Block diagram of the H.264 Decoder

    The term slice refers to a set of macroblocks in raster order that are to becoded in the same type, i.e. I, P, B, SI, SP. Macroblocks are defined as the

  • 8/8/2019 Multimedia Middle Ware

    45/100

    P a g e | 31

    area of 16 X 16 pixels. It is the main building block on which the processing

    occurs.

    The slice type is defined by type of coding done on macroblocks contained in

    the slice. Different slice types are illustrated in the following:

    I (Intra) slice: Macroblocks are coded through prediction frommacroblocks in the same frame.

    P (Predicted) slice: Macroblocks are coded in reference to previously coded frames.

    B (Bi-directional predicted) slice: Macroblocks uses both previous andnext frames.

    SI and SP (Switching) slice: used to switch between different substreams.

    The processing in the macroblock layer is divided in two categories intra and

    inter coding. In intra coding a macroblock is predicted using only spatial

    information i.e., macroblocks from the same frame. However in inter frames,

    the prediction rely on temporal dependencies. This is done by copying an area

    from previously coded frames and assigning it to the currently encoded

    macroblock. The encoder then sends motion vectors, reference frames and

    error signal between the predicted and the current macroblock. Moreover,

    Motion vector are not sent to the receiver. Only a displacement Motion

    vector is sent to adjust values predicted by the receiver. This is dependent on

    the fact that motion prediction in both encoder and decoder are identical.

    Therefore, Motion vectors are predicted from the surrounding macroblocks

    and then a compensation MV is sent to the receiver to correct the value.

  • 8/8/2019 Multimedia Middle Ware

    46/100

    32 | P a g e

    The motion prediction in H.264 supports half and quarter pixel values. The

    intensity values for fractional pixels are determined by means of interpolation.

    Luma half pixel: 6-tap FIR filter.

    Luma quarter pixel: Averaging of half and integer pixels.

    Chroma: All fractional pixels are computed through averaging.

    In the following a list of the differences between H.264 and earlier standards:

    H.264 includes a Deblocking filter.

    H.264 allows for multiple reference frames.

    H.264 introduces the spatial prediction in intra frames.

    H.264 uses 4X4 integer transform instead of the former DCT 8X8transform.

    The standard defines a set of profiles in which H.264 can operate: baseline,

    main, and extended. Each profile defines accepted syntax and tools to be

    used. The profiles are in Figure 3-9. In this study we have used the Baseline

    profile.

  • 8/8/2019 Multimedia Middle Ware

    47/100

    P a g e | 33

    Figure 3-9 H.264 profiles

    H.264 is the most efficient coding algorithm with respect to bit rate

    reduction, yet the most complex among its peers. In [17] authors performed a

    number of tests to analyze the complexity-distortion relationship withinH.264. They found that P frames are more efficient with respect to distortion

    and complexity but requires more bitrate than sequences containing B frames.

    The authors in [18] show that processing time of H.264 is dominated by

    deblocking filter (49.01%) and fractional pixel interpolation (19.98%).

    3-6 Multimedia Transcoding

    The research in multimedia transcoding is categorized into the following:

    Transcoding techniques: design of transcoding techniques that wouldadapt the video stream to fit fewer resources.

    Transcoder analysis: analysis of the resource utilization in transcodersand its optimization schemes.

  • 8/8/2019 Multimedia Middle Ware

    48/100

    34 | P a g e

    Control Schemes: to control the selection process of transcoding

    techniques along with amount of transcoding done by each of them.

    Although, there is now a large number of studies on transcoding techniques

    design, the lack of policy module that supports transcoder implementation

    left those designs useless. In the following section we are going to review the

    first category to familiarize the reader with the baseline knowledge about

    transcoding. The rest of the section will provide a review of control scheme.

    3-6-1 Transcoding Techniques Transcoding has different types based on kind of change induced in the

    bitstream [19]:

    Homogeneous: is the modification in one or more resources that isrequired by bitstream. The different types of resources are

    demonstrated in Figure 3-10.

    Heterogeneous: is the change of the bitstream syntax from onestandard coding scheme to another.

    Error Resilience: is the injection of some bits to increase thebitstreams robustness to error.

  • 8/8/2019 Multimedia Middle Ware

    49/100

    P a g e | 35

    Figure 3-10 Homogeneous transcoding

    Another categorization for transcoding techniques would be from the

    implementation point of view. The simplest implementation is back to back

    encoder- decoder configuration, also known as cascaded pixel domain

    transcoder (CPDT). CPDT is the simplest yet most time consuming

    transcoder implementation. In Figure 3-11, we demonstrate that as we go

    deeper in the structure of the bitstream we will get higher quality transcoding

    at the cost of transcoder complexity.

  • 8/8/2019 Multimedia Middle Ware

    50/100

    36 | P a g e

    Figure 3-11 Transcoder Implementation

    3-6-2 Control Schemes In [20], authors proposed a utility model based on the maximization of utility

    under a certain amount of resources. The system didnt support dynamic

    transcoding nor was transcoding done online. Three profiles where defined

    for each multimedia object, namely: gold, silver, bronze.

    Figure 3-12 Utility Model

    Another argument was set by authors in [21] that is, Offline transcoded

    objects can be arranged in what is called Info-pyramid. The info-pyramid isby definition a progressive data representation scheme. Objects stored in the

    info-pyramid have different resolutions and abstraction levels:

  • 8/8/2019 Multimedia Middle Ware

    51/100

    P a g e | 37

    Fidelity: is spatial and temporal resolution using lossy compression

    technique.

    Modality: can be the selection of either: key frame images, audiotrack, and closed captions.

    When customization and selection module receives client request, it assigns

    the object that best fits the request and sends it back to user. The architecture

    of the system is illustrated in Figure 3-13.

    Figure 3-13 Info-pyramid based control scheme

    On the other hand, authors in [22] proposed a model with three dimensions:

    Device Modality: Display, audio, memory, CPU, and color

  • 8/8/2019 Multimedia Middle Ware

    52/100

    38 | P a g e

    Network conditions: Bandwidth, Latency, and BER

    User preferences

    The illustration of the dimensions and overall system architecture is illustrated

    in Figure 3-14 and Figure 3-15 respectively.

    For each dimension a number of classes where defined and offline

    transcoding for multimedia objects was done. Storage and mapping of these

    different bitstreams is done by using MPEG-7 standard. When users request

    is received the system chooses from a matrix of classes the most appreciate

    one and sends it to user.

    Figure 3-14 Three dimensional view

  • 8/8/2019 Multimedia Middle Ware

    53/100

    P a g e | 39

    Figure 3-15 System overview

    Another type of control schemes was proposed in[23]. The system operatesin real-time and uses a single dimensional transcoding to fit videos to

    available bit rate. A buffer based control scheme was used. The system

    utilizes the relation between delay, occupancy of buffer and bitrate. Two type

    of transcoding was used re-quantization and frame dropping. The estimated

    amount of bits required to encode a frame is predicted by using information

    gathered from previously encoded frames.

    Control scheme can be simplified to fit as a solution to a specific application.

    In [24], authors proposed a control scheme for map viewing application. The

    scheme is user centric, where information about the type of usage is

    important in defining amount of details to be sent to the user. For example a

    hiker would require more fine details than a car driver.

  • 8/8/2019 Multimedia Middle Ware

    54/100

  • 8/8/2019 Multimedia Middle Ware

    55/100

    P a g e | 41

    Figure 3-16 Adaptation, Resource, Utility spaces

    The curves for these three spaces cannot be developed from a single video

    sequence. Each video sequence can react differently to adaptation processes. Authors developed a system for the generation of utility functions by

    extracting a set of features from video sequences. Those features are then

    used to cluster the sequences into a number of predefined clusters that are

    supposed to behave in the same way with respect to different adaptation

    process. Those clusters are defined through the analysis of a set of test

    sequences.

  • 8/8/2019 Multimedia Middle Ware

    56/100

    42 | P a g e

  • 8/8/2019 Multimedia Middle Ware

    57/100

    P a g e | 43

    C h a p t e r 4

    Q u a l i t y A s s e s s m e n t

    4.

    4-1 Introduction

    Our work in the objective quality assessment was mainly driven by the need

    for an objective model to be used in the policy module of the transcoding engine. This FR QA model should possess the following in order to replace

    the need for subjective experiments:

    High correlation to subjective experiments output.

    Consistent with respect to its reaction to different type of visual error

    and image content.

    Inexpensive with respect to time consumption.

    These features are crucial for the metric to be practically used in place of

    human observers. Research in quality assessment has revealed different

    perspectives for looking at perceptual error. Although, these definitions of

    the perceptual error made use of high level features of images, none of them

  • 8/8/2019 Multimedia Middle Ware

    58/100

    44 | P a g e

    have reached the optimal criteria for providing the metric features described

    above.

    In [28], the authors studied 10 state-of-art FR QA metrics. This extensive

    evaluation shows that most of these metrics produce results worse than or

    indistinguishable from PSNR. Although these metrics are based on high level

    visual features, they didnt correlate well with the subjective data.

    In this chapter, we are going to present our work on the formulation of an

    objective metric that would comply with the above criteria, along with the

    logic behind its design.

    4-2 Proposed Metric

    Studies examining how HVS treats the received visual information found that

    HVS doesnt treat images as luminance values but as contrast differences.

    Moreover, this contrast based response varies with the viewing distance. This

    led to the use of contrast sensitivity function after the decomposition of the

    image into spatial and temporal bands in HVS based metrics.

    The idea of the metric presented here uses this fact. If the change in contrast

    values was distributed well on the entire image, HVS will not capture this typeof error, since the relations between the contrast values are maintained, and

    vice versa, contrast change due to a distortion having a large standard

    deviation would modify the contrast relations in images.

    The proposed algorithm for calculating the Contrast Error Distribution

    (CED) metric is as follows [29]:

  • 8/8/2019 Multimedia Middle Ware

    59/100

  • 8/8/2019 Multimedia Middle Ware

    60/100

    46 | P a g e

    Calculate the metric.

    =

    Figure 4-1 Block Diagram of the Contrast Error Distribution (CED)

    4-3 Metric Evaluation Process

    The metric evaluation process is not just a simple measurement of how much

    resemblance is there between DMOS values and Video Quality Ratings

    (VQRs). A number of metrics are to be applied on the VQRs to confirm if

    this metric gives good results regardless of error type, image content, or even

    the amount of quality degradation.

    In short, all of the above would comply with a single definition:

    Generalizability. VQEG defines it as: the ability of a model to perform

    reliably over a very broad set of video content. This is obviously a critical

    selection factor given the very wide variety of content found in real

    applications. There is no specific metric that is specific to generalizability so

    this objective testing procedure requires the selection of as broad a set of

    representative test sequences as is possible.[12]

  • 8/8/2019 Multimedia Middle Ware

    61/100

    P a g e | 47

    As stated above, to achieve this generalizability, we have to perform VQM

    tests over a wide range of images and use performance test that would

    describe every aspect of generalizability. For this reason, the VQEG

    standardized evaluation domains for VQM, as follows:

    Prediction Accuracy: the ability to predict the subjective quality ratings with low error.

    Prediction Monotonicity: the degree to which the models predictionsagree with the relative magnitudes of subjective quality ratings.

    Prediction Consistency: the degree to which the model maintainsprediction accuracy over the range of video test sequences, i.e., itsresponse is robust to a variety of video impairments.

    4-3-1 Subjective data rescaling

    DMOS values after realignment might take invalid values, for examplenegative values. Therefore a linear scaling is required to level values between

    0 and 1. Zero being the worst perceived quality.

    The scaling function is as follows:

    =Raw Difference Score Minimum Value

    Maximum Value Minimum Value

    4-3-2 Nonlinear Regression The relation between DMOS and VQRs is not linear. Therefore, the

    application of performance metrics on VQM output will lead to inaccurate

    results. This nonlinearity is due to the fact that subjective test results tend to

  • 8/8/2019 Multimedia Middle Ware

    62/100

    48 | P a g e

    be compressed at the extreme of the test range. Consequently, a nonlinear

    regression process is required to compensate for this.

    We have used a 5 parameter logistic regression function as follows:

    = 11 + 2 3 [28]

    The nonlinear regression converts VQRs into DMOS p(predicted) that can bethen compared to DMOS(subjective).

    4-3-3 Prediction Accuracy Pearson linear correlation Coefficients:

    2

    =

    2

    2 2

    Where xy , x, and y are defined as follows:

    2 =

    2 = 2 2 = 2

    4-3-4 Prediction Monotonicity Spearman rank order correlation coefficient is a measure of monotonic

    association that is used when distribution of data make Pearson correlation

    coefficient undesirable or misleading.

  • 8/8/2019 Multimedia Middle Ware

    63/100

    P a g e | 49

    =2 2

    4-3-5 Prediction Consistency Outlier Ratio:

    =

    Where:

    No is the number of outlier points

    N is the total number of data points

    A point is considered as outlier where Qerror[i] for 1iN andQerror[i]=DMOS[i] DMOSp[i], if the following condition is satisfied:

    > 2 _ _ .

    Root mean square error (RMSE): is also considered as a metric for

    consistency.

    4-4 Results

    In the evaluation cycle we have chosen 6 FR-QA metric to be compared.

    Peak Signal to Noise Ratio (PSNR)

    Structural Similarity (SSIM) [31]

    Visual Information Fidelity (Log(VIF)) [32]

    Pixel-Domain Visual Information Fidelity (VIF-PD): a less compleximplementation of the VIF [33]

  • 8/8/2019 Multimedia Middle Ware

    64/100

    50 | P a g e

    Contrast Error Distribution (CED) [Proposed]

    Contrast Error Distribution (Log(CED)) [Proposed]

    4-4-1 Overall Performance The overall performance was measured by computing the Pearson

    Correlation Coefficient, spearman rank, and the root mean square error of

    the 6 quality assessment metrics mentioned above. The results are shown in

    Table 1. The values in Table 1 demonstrate that CED gives results similar toa more sophisticated metrics as the VIF.

    4-4-2 Cross-Distortion Performance The results shown in Table 2 through Table 4 are the detailed values of the

    above performance metrics for each distortion domains. The tables show that

    CEDs output is identical across all distortion domains whereas the othermetrics perform worse in the Fast Fading domain.

  • 8/8/2019 Multimedia Middle Ware

    65/100

    P a g e | 51

    Table 1 Comparison between the PSNR, SSIM, CED, PD-VIF, Log(CED),Log(VIF) with respect to CC: Pearson Correlation Coefficient, SROCC:

    Spearman Rank Correlation Coefficient, RMSE: Root Mean Square Error

    PSNR SSIM CED

    (Proposed)

    PD-VIF Log(CED)

    (Proposed)

    Log(VIF)

    CC 0.8700 0.8959 0.9369 0.9326 0.9525 0.9544

    SROCC 0.8755 0.9075 0.9550 0.9471 0.9550 0.9637

    RMSE 13.4713 12.1396 9.9549 9.8798 8.3168 8.1708

    Table 2 Pearson Correlation Coefficient of the SSIM, CED, PD-VIF,Log(CED), Log(VIF). Calculated for the distortion domains JPEG2000,

    JPEG, White Noise, Gaussian Blur, and Fast Fading

    JP2K JPEG WN GBlur FF

    SSIM 0.9311 0.9436 0.9693 0.8622 0.9271

    CED (Proposed) 0.9561 0.9688 0.9325 0.9368 0.9466

    PD-VIF 0.9702 0.9749 0.9717 0.9538 0.8698

    Log(CED) (Proposed) 0.9598 0.9738 0.9716 0.9696 0.9635Log(VIF) 0.9744 0.9688 0.9804 0.9707 0.9490

    Table 3 Spearman Rank Correlation Coefficient of the SSIM, CED, PD-VIF,Log(CED), Log(VIF). Calculated for the distortion domains JPEG2000,

    JPEG, White Noise, Gaussian Blur, and Fast Fading

    JP2k JPEG WN GBlur FF

    SSIM 0.9331 0.9389 0.9684 0.8827 0.9380CED (Proposed) 0.9545 0.9712 0.9719 0.9699 0.9658

    PD-VIF 0.9717 0.9840 0.9872 0.9695 0.8675

    Log(CED) (Proposed) 0.9545 0.9712 0.9719 0.9699 0.9658

    Log(VIF) 0.9698 0.9600 0.9856 0.9734 0.9658

  • 8/8/2019 Multimedia Middle Ware

    66/100

    52 | P a g e

    Table 4 Root Meas Square Error of the SSIM, CED, PD-VIF, Log(CED),Log(VIF). Calculated for the distortion domains JPEG2000, JPEG, White

    Noise, Gaussian Blur, and Fast Fading.

    JP2k JPEG WN GBlur FF

    SSIM 9.2222 10.5526 6.8789 9.3565 10.6995

    CED (Proposed) 7.6804 8.2344 10.4274 6.8455 9.6306

    PD-VIF 6.1433 7.1296 6.6276 5.5593 14.0610

    Log(CED) (Proposed) 7.0897 7.2565 6.6182 4.5263 7.6321

    Log(VIF) 5.6908 7.8561 5.5314 4.4474 9.0253

    4-4-3 Complexity Performance

    VQEG has not yet standardized a complexity measure for the VQM.

    However the complexity of the 3 metrics was evaluated using a Pentium M,

    1.86 GHz Laptop; using the consumed time in calculating the quality metric

    for all the jpeg 2000 distorted images (227 images). The complexity measure

    is shown in Table 5.

    From the results, it can be seen that CED provides a good tradeoff between

    performance and complexity. Where it operates in 1.5 seconds per image

    where metrics with comparable results operate in 12 seconds per image.

    Table 5 Evaluation of the Quality Metrics

    MSSIM CED

    (Proposed)

    PD-VIF VIF

    224.11sec/227images

    310.91sec/227images

    498.26sec/227images

    2768.4sec/227images

    0.99sec/ averageimage

    1.37sec/ averageimage

    2.2sec/ averageimage

    12.2sec/ averageimage

  • 8/8/2019 Multimedia Middle Ware

    67/100

    P a g e | 53

    4-4-4 Logistic Regression Performance Figure 4-2 shows the scatter plot of output of VQMs against DMOS values,

    along with the logistic regression fit of the data. The plot for CED shows

    that VQR points are distributed evenly across the perceived quality range.

    Figure 4-3 shows the scatter plot of DMOS against the predicted DMOS

    values. This scatter plot shows outlier points. For the metric to perform

    better, the scatter points should be distributed near the diagonal of the graph.Moreover, the points should be distributed evenly across the range of the

    perceived quality.

    It can be seen from Figure 4-3 that metrics have two empty spots one near

    the origin and the other at the far side of the graph as highlighted in red. The

    empty spot near the origin means that zero point is translated to a different

    value in the predicted DMOS. The graph for the CED shows that all the

    empty spots have been decreased significantly and therefore the response for

    the CED is improved for error figures located in those areas of the graph.

    Figure 4-4 shows the calibration curves of the 5-distortion domains from the

    database used in the experiment. The evaluation of VQM performance

    stability across different types of distortion mandate that the calibrationcurves should be indistinguishable. In the figure, we can see that calibration

    curves are not overlying, however they are adjacent to each other. The points

    of intersection highlight the amount of error where the metric would react to

    different types of error indifferently. Otherwise, the metric would be

    more/less sensitive to certain types of error.

  • 8/8/2019 Multimedia Middle Ware

    68/100

  • 8/8/2019 Multimedia Middle Ware

    69/100

    P a g e | 55

    Figure 4-2 Cont.

    Figure 4-2 Cont.

  • 8/8/2019 Multimedia Middle Ware

    70/100

    56 | P a g e

    Figure 4-2 Cont.

    Figure 4-2 Cont.

  • 8/8/2019 Multimedia Middle Ware

    71/100

    P a g e | 57

    Figure 4-3 Scatter plot of predicted DMOS (VQRs after logistic regression)against DMOS values. This was calculated for 6 VQM: PSNR, SSIM, VIF,

    PD-VIF, CED, Log(CED) respectively

    Figure 4-3 Cont.

    RMSE=13.4713

    RMSE=12.1396

  • 8/8/2019 Multimedia Middle Ware

    72/100

    58 | P a g e

    Figure 4-3 Cont.

    Figure 4-3 Cont.

    RMSE=9.8798

    RMSE=8.1708

  • 8/8/2019 Multimedia Middle Ware

    73/100

    P a g e | 59

    Figure 4-3 Cont.

    Figure 4-3 Cont.

    RMSE=9.9549

    RMSE=8.3168

  • 8/8/2019 Multimedia Middle Ware

    74/100

    60 | P a g e

    Figure 4-4 Calibration Curves for each error domain: JPEG2k (Green), JPEG (Red), White Noise (Blue), Gaussian Blue (Magenta), Fast Fading

    (Cyan) and all error domains (Black). This was calculated for 6 VQM: PSNR,SSIM, VIF, PD-VIF, CED, Log(CED)

    Figure 4-4 Cont.

  • 8/8/2019 Multimedia Middle Ware

    75/100

    P a g e | 61

    Figure 4-4 Cont.

    Figure 4-4 Cont.

  • 8/8/2019 Multimedia Middle Ware

    76/100

    62 | P a g e

    Figure 4-4 Cont.

    Figure 4-4 Cont.

  • 8/8/2019 Multimedia Middle Ware

    77/100

    P a g e | 63

    C h a p t e r 5

    D a t a A n a l y s i s

    5.

    5-1 Introduction

    Nowadays, a large number of video transcoding schemes exist. These

    schemes change a pre-encoded video bitstream into another that exhibits lessbit rate or complexity and therefore quality.

    Currently, the main problem in video adaptation is management of process

    itself. More specifically, the problem lies in how to determine the following:

    The transcoding scheme to be used.

    The amount of transcoding.

    The problem relies on the fact that not all video sequences react in the same

    way to transcoding processes. A certain amount of transcoding can result into

    a different amount of resource reduction in different video sequences. This is

    due to varied complexity of video content.

  • 8/8/2019 Multimedia Middle Ware

    78/100

    64 | P a g e

    5-2 Offline Data Analysis Model

    The authors in [34] put together a systematic procedure for designing video

    adaptation technologies, they are as follows:

    1. Identify the adequate entities for adaptation, e.g. frame, shot,

    sequence of shot, etc.

    2. Identify the feasible adaptation operators e.g., de-quantization, frame

    dropping, coefficient drooping, etc.3. Develop models for measuring and estimating resource and utility

    values associated with video entities undergoing identified operators.

    4. Given user preferences and constraints on resource or utility, develop

    strategies to find the optimal adaptation operator(s) satisfying the

    constraints.

    In Figure 5-1, a conceptual diagram of the 3 stages process of the transcoder:offline data analysis, policy module, and transcoding engine. The work was

    mainly focused on offline data analysis module. The policy module decides

    which transcoding algorithm to be used and how much transcoding is

    needed. This is done by extracting some features from pre-encoded videos

    and mapping it to a certain class. Each of the classes defined in the policy

    module contains information about the resource transcoding relations. Those classes are created in the offline data analysis stage.

    The main aim of offline data analysis stage is to define the main classes of

    multimedia objects. Each class has its own Resource transcoding quality

    graph which contributes in the policy module decision.

  • 8/8/2019 Multimedia Middle Ware

    79/100

    P a g e | 65

    Figure 5-1 Block diagram of Multimedia Middleware

    The presented study relies mainly on the idea of finding key features that

    would characterize the differences between video sequences. Those video

    sequences usually reach the transcoding server in a pre-encoded form.

    Transcoding servers should distinguish the class of the sequence through only

    the information present in the coded domain.

    5-3 H.264 Setup

    The C++ implementation of H.264 video coding algorithm in [35] Version

    JM 13.0 was used. The baseline profile was chosen as the main profile for

    encoding the Test sequence.

    This profile contains the following features:

  • 8/8/2019 Multimedia Middle Ware

    80/100

    66 | P a g e

    I slices: Intra-coding, only spatial prediction is allowed.

    P slices: Inter-coding, forward temporal prediction.

    CAVLC: Context Adaptive variable length codes

    Configuration parameters for the coding algorithm:

    Baseline Profile

    QP=28

    To be coded in IPPP

    5-4 Test Sequences

    The test video sequences used in this study are presented in [36]. Those video

    sequences are single shot video segments. Therefore, video sequence isencoded with the first frame as I-frame and the rest of the frames as P-

    frames. A description of complexity for each video sequence is described in

    Figure 5-2 Test Sequences Description

    5-5 Features

    By classifying videos based on their content, the video bitstreams can be

    grouped based on their behavior within the transcoding engine. This

    classification depends mainly on features extracted from video sequences. A

    number of studies in transcoding control schemes have adopted the idea of

    classifying the video content based on their complexity. However, features

    used were the main point of argument in this concept. In this chapter, the

    proposed feature analysis is presented. This analysis was done on most of

    features used in the available literature [37-40]. The study conducted in this

  • 8/8/2019 Multimedia Middle Ware

    81/100

    P a g e | 67

    thesis concluded that many of these features convey the same information

    and some of which can be omitted from the proposed model.

    Figure 5-2 Test Sequences Description

  • 8/8/2019 Multimedia Middle Ware

    82/100

    68 | P a g e

    5-5-1 Feature Definitions All of feature definitions described in this section are calculated on per frame

    basis, In order to calculate a single value for each sequence, the average was

    computed. Only for the Source Domain Features, average values were

    compared against the first frame (I-frame) value.

    5-5-1-1 SOURCE D OMAIN F EATURES

    Variance: Average variance of the luminance pixels Pelact: Standard deviation of the luminance pixels

    Pelspread: Standard deviation of Pelact

    Edgeact: Magnitude of pixel gradient

    Edgespread: standard deviation of EdgeAct

    5-5-1-2 R ESOURCES R EQUIRED bitcount: Bitcount for coding for macroblock accumulated on the

    whole frame.

    bitcount Y: Bitcount used for coding only the Y component of theframe

    ME time: Time consumed in motion estimation SNR Y: Signal to Noise Ratio calculated on Y frame

    SNR U: Signal to Noise Ratio calculated on U frame

    SNR V: Signal to Noise Ratio calculated on V frame

    Time: Time consumed in coding

  • 8/8/2019 Multimedia Middle Ware

    83/100

    P a g e | 69

    5-5-1-3 CODED D OMAIN FEATURES

    MV magn: Motion verctors magnitude (Calculated for only non staticMacroblocks)

    MV magn var: Motion vectores variance (Calculated for only nonstatic Macroblocks)

    sub MV: Percentage of MVs that require subpixel interpolation(either half pixel or quarter pixel)

    non zero MV: Percentage of non static Macroblocks

    ave energy I: Average Energy of AC coefficients in Iframes

    ave energy P: Average Energy of AC coefficients in Pframes

    MV accel: Motion vectors acceleration

    MV dir: Motion vector change of direction

    5-5-2 Analysis and selection Using Principal component analysis (PCA) [41-42] would only help in

    changing the axis on which the features are projected to the axes with the

    highest covariance between features. Therefore PCA is not suitable as the

    main purpose is to omit some features and to inspect if source video features

    are important for differentiating between the video sequences or not.

    Principal Feature analysis in [43] provides a way to do this. By classifying the

    features in the high variance axes and finding the most dominant feature

    groups therefore only one feature from each dominant group can be chosen.

    First, this algorithm was used on each of the three feature domains separately

  • 8/8/2019 Multimedia Middle Ware

    84/100

  • 8/8/2019 Multimedia Middle Ware

    85/100

    P a g e | 71

    indistinguishable. The three source features to be selected are Ave variance,

    Pelspread, and Edgeact. Retained variability is equal to 99.3974 %.

    In Table II, the trail of resources is presented. this analysis demonstrates that

    ME time can be used instead of encoding time without any loss of

    information and that SNR can be calculated on any of the frame components

    YUV without any difference. Retained variability of this trail was 99.77155 %.

    In Table III, the trail of the coded domain features. The four selected features

    are MV magn, sub MV, Ave energy I, and Ave energy P.

    Final trail is where both source and coded domain features are compared.

    This trail results are illustrated in Table IV. Retained variability for this trail is

    99.9966 %

    Table 6 Source Domain Features

    Cluster Index Feature Distance from center

    2 Ave Variance (I-frame) 0.063633

    2 Ave Variance (Averaged) 0.063633

    3 Pelact (I-frame) 0.0015095

    3 Pelact (Averaged) 0.00197883 Pelspread (I-frame) 0.00086648

    3 Pelspread (Averaged) 0.0013133

    1 Edgeact (I-frame) 0.0045721

    1 Edgeact (Averaged) 0.0045721

    3 Edgespread (I-frame) 0.012588

    3 Edgespread (Averaged) 0.0014647

  • 8/8/2019 Multimedia Middle Ware

    86/100

    72 | P a g e

    Table 7 Resource Features

    Cluster Index Feature Distance from center

    3 Bitcount 0.18841

    3 Bitcount Y 1.1781

    2 ME Time 0.0012094

    1 SNR V 0.02584

    1 SNR U 0.025837

    1 SNR Y 0.02599

    2 Time 0.0012094

    Table 8 Coded Domain Features

    Cluster Index Feature Distance from center

    1 MV magn 0

    1 MV magn var 0

    2 Sub MV 02 Non zero MV 0

    3 Ave energy I 0

    4 Ave energy P 0

    1 MV accel 0

    1 MV dir 0

    Table 9 Final Trail

    Cluster Index Feature Distance from center

    2 MV magn 0.0016

    2 Sub MV 0.0018

    3 Ave energy I 0

    1 Ave energy P 0

    2 Ave variance 0.04812 PelSpread 0

    2 Edgeact 0.0096

  • 8/8/2019 Multimedia Middle Ware

    87/100

    P a g e | 73

    5-7 Transcoder Configuration

    Figure 5-3 presents architecture of transcoding system, where videos are pre-

    encoded with best quality supported, then passed through a transcoder that

    only decodes NAL units into a set of VCL information. Transcoder changes

    some of this information in coded domain and then re-encodes them into

    NAL units. This modified bitstream is then sent to decoder at the client side

    to retrieve the pixel domain video sequence.

    Figure 5-3 Standard Transcoder Configuration

    The implementation used for the transcoder is presented in Figure 5-4. This

    configuration was adopted to simplify the implementation of the transcoder.

  • 8/8/2019 Multimedia Middle Ware

    88/100

    74 | P a g e

    This relies on the fact that NAL encoder and decoder blocks are identical

    therefore can be omitted.

    Figure 5-4 Adopted transcoder configuration

    5-8 Transcoder Setup

    The implementation of transcoder is based on the coefficient dropping

    transcoding scheme. This has been applied to all test sequences and the same

    features were extracted, details are as follow:

    Transcoding parameters and amount of reduction:

    Drop one coefficient (6.25% reduction)

    Drop 3 coefficients (18.75% reduction)

    Drop 5 coefficients (31.25% reduction)

  • 8/8/2019 Multimedia Middle Ware

    89/100

    P a g e | 75

    Drop 7 coefficients (43.75% reduction)

    In this experiment we have used the features elected by the feature analysis as

    discussed in the previous section, those features are as follows:

    Bitcount

    ME time

    SNR Y

    Sub MV

    Ave Energy I

    Ave Energy P

    MV MagnFigure 5-5 shows bitrate relations between different bitstreams and

    transcoding parameters. The bit rate values are normalized using (zscore)

    function in matlab. This function is defined as:

    =

    =

    Where: V is a column vector of D.

  • 8/8/2019 Multimedia Middle Ware

    90/100

  • 8/8/2019 Multimedia Middle Ware

    91/100

    P a g e | 77

    Figure 5-6 Dendrogram of the generated clusters

    Figure 5-7 Normalized Bitrate after adding the no transcoding values

  • 8/8/2019 Multimedia Middle Ware

    92/100

    78 | P a g e

    Cluster analysis done in this study was able to predict reaction of test videos

    to transcoding process. The dendrogram shows the presence of two clusters

    in test sequences, one where videos bitrat e without transcoding is higher

    than transcoded bitrate, and the second, videos bitrate without transcoding is

    less than some of the transcoded bitrate. Those two clusters are marked in

    bitrate graph in Figure 5-7.

  • 8/8/2019 Multimedia Middle Ware

    93/100

    P a g e | 79

    C h a p t e r 6

    C o n c l u s i o n a n d F u t u r e Wo r k

    6.

    6-1 Conclusion

    The research in multimedia transcoding became an essential part of the field

    of multimedia communications. This is due to the fact that all users areturning on to multimedia as a key source of information. In reality, most of

    those users are using devices or networks that can't yet handle the large

    amount of resources required in the transmission of multimedia objects.

    Multimedia Middleware servers perform the required transcoding to allow the

    video sequence to be transferred over these networks or devices seamlessly without any intervention from the user's side. This system requires a

    thorough understanding of the video characteristics, device capabilities and

    network resources. The overall objective of this structure is to provide users

    with exactly the right amount of information excluding the possibility of

    requiring more resources than needed.

  • 8/8/2019 Multimedia Middle Ware

    94/100

    80 | P a g e

    A large number of transcoding techniques have been developed in the

    available literature. Those techniques can alter the video sequence through

    the modification of one or more of its parameters. This leads to a variety of

    potential transcoded objects that can be transferred to the user. Currently, the

    management scheme for providing video sequences that fits best the

    requirements of the client devices and networks continues to be a challenge.

    The management system of multimedia content adaptation should have thecapability of providing an efficient use of resources on client side while

    keeping the response time to client requests minimal. The concept adopted in

    this thesis for the implementation of the transcoding system relies mainly on

    the study of the video content while providing different transcoding plans for

    different content types.

    The transcoding cycle will start with an offline analysis stage that would

    cluster the multimedia objects based on their characteristic into categories.

    This analysis predicts the behavior of multimedia objects with respect to the

    transcoding techniques. Next the choice of the best transcoding plan is

    determined. This would require the presence of a quality assessment metric to

    evaluate the result and grantee the transmission of the best available option of

    the resources on hand.

    In our study we have explored those two points. The work done in this thesis

    will help toward the implementation of the transcoding server and more

    specifically the policy module in that transcoding server.

    First, we have considered the examination of the quality assessment methods

    in order to define a valid approach to compute the amount of degradation in

    the object quality. We have defined Contrast Error Distribution (CED)

  • 8/8/2019 Multimedia Middle Ware

    95/100

    P a g e | 81

    metric which provides a good tradeoff between performance and complexity.

    This feature makes it suitable for usage in transcoders where real-time

    response is valued greatly.

    The results showed that CED is consistent with respect to different error

    domains and visual content. This characteristic will allow it to be used in the

    loopback analysis cycle where both time and generalizability matters most.

    The proposed metric defines the perceived quality using a simple

    mathematical model that is deducted from common knowledge about the

    HVS. All previously available studies of the FR QA models, showed that for

    a metric to be of good performance it has to be based on complex analysis of

    the image. However using the CED overcomes this weak point showed high

    performance as that of the complex metric and at the same time very low

    computational time.

    Secondly, we ran an analytical study of the type of features to be included in

    the offline analysis of videos. This study led to a set of features that can be

    used in classifying and predicting the behavior of the video with respect to

    the change in transcoding parameters.

    The analysis showed that pixel domain features can be omitted. This is an

    important fact as all the videos in the content servers will be in a pre-encoded

    form and therefore the pixel domain features will not be available for user in

    the transcoding server. As a result, the offline analysis will not require any

    external information other than the pre-encoded video sequence.

  • 8/8/2019 Multimedia Middle Ware

    96/100

    82 | P a g e

    In our study we have ran some preliminary experiment that proved that using

    the features selected. A clustering system would be able to predict the

    behavior of a set of video sequences.

    6-2 Future Work

    The contributions discussed so far have examined the implementation of the

    offline data analysis and the quality assessment metric. We have examined

    those two segments of the transcoding server separately. Consequently, thenext step would be to integrate both of the proposed structures into the

    implementation of a transcoding server to validate the whole theory.

    Moreover, we need to expand the analysis done in this thesis to include the

    following:

    Expand the evaluation process of the CED to include a database thatcontains compound error components instead of single errorcomponent.

    Change the CED to use 16X16 windows instead of 8X8 and apply iton DCT coefficients instead of luminance values.

    Build a transcoding server that would use multiple transcoding techniques and validate the ability of the clustering algorithm to

    detect the most significant clusters.

  • 8/8/2019 Multimedia Middle Ware

    97/100

  • 8/8/2019 Multimedia Middle Ware

    98/100

    84 | P a g e

    Criterion for Image Quality Assessment Using Natural Scene

    Statistics," IEEE Transactions on Image Processing , vol. 14, no. 12, 2005.[11] Wu H.R., Digital Video Image Quality and Perceptual Coding . 978-

    1420027822: CRC Press Inc, 2005.[12] Philip Corriveau, Arthur Webster. (2003) VQEG F