Top Banner
Research Article QoE Evaluation: The TRIANGLE Testbed Approach Almudena D-az Zayas , 1 Laura Panizo , 1 Janie Baños , 2 Carlos Cárdenas , 2 and Michael Dieudonne 3 1 University of Malaga, Spain 2 DEKRA, Spain 3 Keysight Technologies, Belgium Correspondence should be addressed to Almudena D´ ıaz Zayas; [email protected] Received 11 October 2018; Accepted 25 November 2018; Published 18 December 2018 Academic Editor: Giovanni Stea Copyright © 2018 Almudena D´ ıaz Zayas et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. is paper presents the TRIANGLE testbed approach to score the Quality of Experience (QoE) of mobile applications, based on measurements extracted from tests performed on an end-to-end network testbed. e TRIANGLE project approach is a methodology flexible enough to generalize the computation of the QoE for any mobile application. e process produces a final TRIANGLE mark, a quality score, which could eventually be used to certify applications. 1. Introduction e success of 5G (the fiſth generation of mobile commu- nications), and to some extent that of 4G, depends on its ability to seamlessly deliver applications and services with good Quality of Experience (QoE). Along with the user, QoE is important to network operators, product manufacturers (both hardware and soſtware), and service providers. How- ever, there is still no consensus on the definition of QoE, and a number of acronyms and related concepts (e.g., see [1]) add confusion to the subject: QoE (Quality of Experience), QoS (Quality of Service), QoSD (Quality of Service Deliv- ered/achieved by service provider), QoSE (Quality of Service Experience/Perceived by customer/user), and so forth. is is a field in continuous evolution, where methodologies and algorithms are the subject of study of many organisations and standardization bodies such as the ITU-T. TRIANGLE project has adopted the definition of QoE provided by the ITU-T in Recommendation P.10/G.100 (2006) Amendment 1 “Definition of Quality of Experience (QoE)” [2]. “e overall acceptability of an application or service, as perceived subjectively by the end-user” In [2], the ITU-T emphasizes that the Quality of Experi- ence includes the complete end-to-end system effects: client (app), device, network, services infrastructure, and so on. erefore, TRIANGLE brings in a complete end-to-end network testbed and a methodology for the evaluation of the QoE. Consistent with the definition, the majority of the work in this area has been concerned with subjective measurements of experience. Typically, users rate the perceived quality on a scale, resulting in the typical MOS (Mean Opinion Score). Even in this field, the methodology for subjective assessment is the subject of many studies [3]. However, there is a clear need to relate QoE scores to technical parameters that can be monitored and whose improvement or worsening can be altered through changes in the configurations of the different elements of the end-to-end communication channel. e E-model [4], which is based on modelling the results from a large number of subjective tests done in the past on a wide range of transmission parameters, is the best-known example of parametric technique for the computation of QoE. Also, one of the conclusions of the Project P-SERQU, conducted by the NGMN (Next Genera- tion Mobile Networks) [5] and focused on the QoE analysis of HTTP Adaptive Streaming (HAS), is that it is less complex and more accurate to measure and predict QoE based on traffic properties than making a one-to-one mapping between generic radio and core network QoS to QoE. e TRIANGLE Hindawi Wireless Communications and Mobile Computing Volume 2018, Article ID 6202854, 12 pages https://doi.org/10.1155/2018/6202854
13

QoE Evaluation: The TRIANGLE Testbed Approach

Jan 18, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: QoE Evaluation: The TRIANGLE Testbed Approach

Research ArticleQoE Evaluation The TRIANGLE Testbed Approach

Almudena D-az Zayas 1 Laura Panizo 1 Janie Bantildeos 2

Carlos Caacuterdenas 2 and Michael Dieudonne 3

1University of Malaga Spain2DEKRA Spain3Keysight Technologies Belgium

Correspondence should be addressed to Almudena Dıaz Zayas adzumaes

Received 11 October 2018 Accepted 25 November 2018 Published 18 December 2018

Academic Editor Giovanni Stea

Copyright copy 2018 Almudena Dıaz Zayas et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

This paper presents the TRIANGLE testbed approach to score the Quality of Experience (QoE) of mobile applications basedon measurements extracted from tests performed on an end-to-end network testbed The TRIANGLE project approach is amethodology flexible enough to generalize the computation of the QoE for any mobile application The process produces a finalTRIANGLE mark a quality score which could eventually be used to certify applications

1 Introduction

The success of 5G (the fifth generation of mobile commu-nications) and to some extent that of 4G depends on itsability to seamlessly deliver applications and services withgood Quality of Experience (QoE) Along with the user QoEis important to network operators product manufacturers(both hardware and software) and service providers How-ever there is still no consensus on the definition of QoEand a number of acronyms and related concepts (eg see [1])add confusion to the subject QoE (Quality of Experience)QoS (Quality of Service) QoSD (Quality of Service Deliv-eredachieved by service provider) QoSE (Quality of ServiceExperiencePerceived by customeruser) and so forth Thisis a field in continuous evolution where methodologies andalgorithms are the subject of study of many organisations andstandardization bodies such as the ITU-T

TRIANGLE project has adopted the definition of QoEprovided by the ITU-T in Recommendation P10G100(2006) Amendment 1 ldquoDefinition of Quality of Experience(QoE)rdquo [2]

ldquoThe overall acceptability of an application orservice as perceived subjectively by the end-userrdquo

In [2] the ITU-T emphasizes that the Quality of Experi-ence includes the complete end-to-end system effects client

(app) device network services infrastructure and so onTherefore TRIANGLE brings in a complete end-to-endnetwork testbed and a methodology for the evaluation of theQoE

Consistent with the definition themajority of the work inthis area has been concerned with subjective measurementsof experience Typically users rate the perceived quality ona scale resulting in the typical MOS (Mean Opinion Score)Even in this field the methodology for subjective assessmentis the subject of many studies [3]

However there is a clear need to relate QoE scoresto technical parameters that can be monitored and whoseimprovement or worsening can be altered through changes inthe configurations of the different elements of the end-to-endcommunication channel The E-model [4] which is based onmodelling the results from a large number of subjective testsdone in the past on a wide range of transmission parametersis the best-known example of parametric technique for thecomputation of QoE Also one of the conclusions of theProject P-SERQU conducted by the NGMN (Next Genera-tion Mobile Networks) [5] and focused on the QoE analysisof HTTP Adaptive Streaming (HAS) is that it is less complexand more accurate to measure and predict QoE based ontrafficproperties thanmaking a one-to-onemapping betweengeneric radio and core network QoS to QoETheTRIANGLE

HindawiWireless Communications and Mobile ComputingVolume 2018 Article ID 6202854 12 pageshttpsdoiorg10115520186202854

2 Wireless Communications and Mobile Computing

project follows also a parametric approach to compute theQoE

Conclusions in [5] point out that a large number ofparameters in the model could be cumbersome due tothe difficulty of obtaining the required measurements andbecause it would require significantly more data pointsand radio scenarios to tune the model The TRIANGLEapproach has overcome this limitation through the largevariety of measurements collected the variety of end-to-end network scenarios designed and mostly the degree ofautomation reached which enables the execution of intensivetest campaigns covering all scenarios

Although there are many proposals to calculate thequality of experience in general they are very much orientedto specific services for example voice [6] or video streaming[7 8] This paper introduces a methodology to compute theQoE of any application even if the application supportsmorethan one service

The QoE as perceived by the user depends on manyfactors the network conditions both at the core (CN) and atthe radio access (RAN) the terminal the service servers andhuman factors difficult to control Due to the complexity andthe time needed to run experiments or make measurementsmost of the studies limit the evaluation of theQoE to a limitedset of or even noncontrolled network conditions especiallythose that affect the radio interface (fading interference etc)TRIANGLE presents a methodology and a framework tocompute the QoE out of technical parameters weighting theimpact of the network conditions based on the actual usescases for the specific application As in ITU recommendationG1030 [9] and G1031 [10] the userrsquos influence factors areoutside of the scope of the methodology developed inTRIANGLE

TRIANGLE has developed an end-to-end cellular net-work testbed and a set of test cases to automatically testapplications under multiple changing network conditionsandor terminals and provide a single quality score Thescore is computed weighting the results obtained testingthe different uses cases applicable to the application forthe different aspects relevant to the user (the domains inTRIANGLE) and under the network scenarios relevant forthe application The framework allows specific QoS-to-QoEtranslations to be incorporated into the framework based onthe outcome of subjective experiments on new services

Note that although the TRIANGLE project also providesmeans to test devices and services only the process to testapplications is presented here

The rest of the paper is organized as follows Section 2provides an overview of related work Section 3 presents anoverview of the TRIANGLE testbed Section 4 introducesthe TRIANGLE approach Section 5 describes in detail howthe quality score is obtained in the TRIANGLE frameworkSection 6 provides an example and the outcome of thisapproach applied to the evaluation of a simple App theExoplayer Finally Section 7 summarizes the conclusions

2 State of the Art

Modelling and evaluatingQoE in current and next generationof mobile networks is an important and active research

area [8] Different types of testbeds can be found in the lit-erature ranging from simulated to emulated mobilewirelesstestbeds which are used to obtain subjective or objective QoEmetrics to extract a QoE model or to assess the correctnessof a previously generated QoE model Many of the testbedsreviewed have been developed for a specific research insteadof for a more general purpose such as the TRIANGLEtestbed which can serve a wide range of users (researchersapp developers service providers etc) In this section someQoE-related works that rely on testbeds are reviewed

The QoE Doctor tool [12] is closely related to the TRI-ANGLE testbed since its main purpose is the evaluation ofmobile apps QoE in an accurate systematic and repeatableway However QoE Doctor is just an Android tool that cantake measurements at different layers from the app userinterface (UI) to the network and quantify the factors thatimpact the app QoE It can be used to identify the causesof a degraded QoE but it is not able to control or monitorthe mobile network QoE Doctor uses an UI automationtool to reproduce user behaviour in the terminal (app userflows in TRIANGLE nomenclature) and tomeasure the user-perceived latency by detecting changes on the screen OtherQoE metrics computed by QoE Doctor are the mobile dataconsumption and the network energy consumption of the appby means of an offline analysis of the TCP flows The authorshave used QoE Doctor to evaluate the QoE of popular appssuch as YouTube Facebook or mobile web browsers One ofthe drawbacks of this approach is that most metrics are basedon detecting specific changes on the UI Thus the modulein charge of detecting UI changes has to be adapted for eachspecific app under test

QoE-Lab [13] is a multipurpose testbed that allows theevaluation of QoE in mobile networks One of its purposesis to evaluate the effect of new network scenarios on servicessuch as VoIP video streaming or web applications To thisend QoE-Lab extends BERLIN [14] testbed framework withsupport for next generation mobile networks and some newservices such as VoIP and video streaming The testbedallows the study of the effect of network handovers betweenwireless technologies dynamic migrations and virtualizedresources Similar to TRIANGLE the experiments are exe-cuted in a repeatable and controlled environment Howeverin the experiments presented in [13] the user equipmentwere laptops which usually have better performance andmore resources than smartphones (battery memory andCPU)The experiments also evaluated the impact of differentscenarios on the multimedia streaming services included inthe testbed The main limitations are that it is not possibleto evaluate different mobile apps running in different smart-phones or relate the QoE with the CPU battery usage and soforth

De Moor et al [15] proposed a user-centric methodologyfor the multidimensional evaluation of QoE in a mobile real-life environment The methodology relies on a distributedtestbed that monitors the network QoS and context infor-mation and integrates the subjective user experience basedon real-life settings The main component of the proposedarchitecture is theMobile Agent a component to be installedin the user device that monitors contextual data (location

Wireless Communications and Mobile Computing 3

velocity on-body sensors etc) and QoS parameters (CPUmemory signal strength throughput etc) and provides aninterface to collect user experience feedback A processingentity receives the (device and network) monitored data andanalyzes the incoming data The objective of this testbedinfrastructure is to study the effects of different networkparameters in the QoE in order to define new estimationmodels for QoE

In [16] the authors evaluated routing protocols BATMANand OLSR to support VoIP and video traffic from a QoSand QoE perspective The evaluation took place by runningexperiments in two different testbeds First experimentswere run in the Omnet++ simulator using the InetManetframework Second the same network topology and networkscenarios were deployed in the Emulab test bench a real(emulated) testbed and the same experiments were carriedout Finally the results of both testbeds (simulated andreal-emulated) were statistically compared in order to findinconsistencies The experiments in the simulated and emu-lated environments showed that BATMAN achieves betterthan OLSR and determined the relation between differentprotocol parameters and their performanceThese results canbe applied to implement network nodes that control in-stackprotocol parameters as a function of the observed traffic

In [17] a testbed to automatically extract a QoE modelof encrypted video streaming services was presented Thetestbed includes a software agent to be installed in theuser device which is able to reproduce the user interactionand collect the end-user application-level measurements thenetwork emulator NetEm which changes the link conditionsemulating the radio or core network and a Probe softwarewhich processes all the traffic at different levels computesthe TCPIP metrics and compares the end-user and networklevel measurements This testbed has been used to automat-ically construct the model (and validate the model) of thevideo performance of encrypted YouTube traffic over a Wi-Fi connection

More recently in [18] Solera et al presented a testbedfor evaluating video streaming services in LTE networks Inparticular the QoE of 3D video streaming services over LTEwas evaluated The testbed consists of a streaming serverthe NetEm network emulator and a streaming client Oneof the main contributions of the work is the extension ofNetEm to better model the characteristics of the packet delayin bursty services such as video streaming Previously torunning the experiments in the emulation-based testbedthe authors carried out a simulation campaign with anLTE simulator to obtain the configuration parameters ofNetEm for four different network scenarios These scenarioscombine different positions of the user in the cell anddifferent network loads From the review of these works itbecomes clear that the setup of a simulation or emulationframework for wireless or mobile environments requires inmany cases a deep understanding of the network scenariosTRIANGLE aims to reduce this effort by providing a set ofpreconfigured real network scenarios and the computationof the MOS in order to allow both researchers and appdevelopers to focus on the evaluation of new apps servicesand devices

3 TRIANGLE

The testbed the test methodology and the set of test caseshave been developed within the European funded TRIAN-GLE project Figure 1 shows the main functional blocks thatmake up the TRIANGLE testbed architecture

To facilitate the use of the TRIANGLE testbed fordifferent objectives (testing benchmarking and certifying)to remotely access the testbed and to gather and presentresults a web portal which offers an intuitive interfacehas been implemented It provides access to the testbedhiding unnecessary complexity to App developers Foradvanced users interested in deeper access to configurationparameters of the testbed elements or the test cases thetestbed offers a direct access to the Keysight TAP (TestingAutomation Platform) which is a programmable sequencerof actions with plugins that expose the configuration andcontrol of the instruments and tools integrated into thetestbed

In addition to the testbed itself TRIANGLE has devel-oped a test methodology and has implemented a set oftest cases which are made available through the portal Toachieve full test case automation all the testbed componentsare under the control of the testbed management frame-work which coordinates their configuration and executionprocesses the measurements made in each test case andcomputes QoE scores for the application tested

In addition as part of the testbed management frame-work each testbed component is controlled through a TAPdriver which serves as bridge between the TAP engine andthe actual component interface The configuration of thedifferent elements of the testbed is determined by the testcase to run within the set of test cases provided as partof TRIANGLE or the customized test cases built by usersThe testbed translates the test cases specific configurationssettings and actions into TAP commands that take care ofcommanding each testbed component

TRIANGLE test cases specify the measurements thatshould be collected to compute the KPI (Key PerformanceIndicators) of the feature under test Some measurementsare obtained directly from measurement instruments butothers require specific probes (either software or hardware)to help extract the specific measurements Software probesrunning on the same device (UE LTE User Equipment) thatthe application under test include DEKRA Agents and theTestelDroid [19] tool from UMA TRIANGLE also providesan instrumentation library so that app developers can delivermeasurement outputs which cannot otherwise be extractedand must be provided by the application itself Hardwareprobes include a power analyzer connected to the UE tomeasure power consumption and the radio access emulatorthat among others provides internal logs about the protocolexchange and radio interface low layers metrics

The radio access (LTE RAN) emulator plays a key role inthe TRIANGLE testbed The testbed RAN is provided by anoff-the-shelf E7515A UXM Wireless Test Set from Keysightan emulator that provides state-of-the-art test features Mostimportant the UXM also provides radio channel emulationfor the downlink radio channel

4 Wireless Communications and Mobile Computing

DBs

Interface and visualization (Portal)

Testbed management

Measurements and datacollections

UE

RAN

Transport

AppEPC Local application

serversApp App

TAP

Orcomposutor

Orchestration

Compositor ExecutorDBs

ETLframework

ETL modules

DEKRADriver

WebDriverDriver

AndroidTap Driver

EPCDriver

AppInstrumentat

ion TAPDriver

iOS TAPDriver

hellipTAP Driver

Figure 1 TRIANGLE testbed architecture

In order to provide an end-to-end system the testbedintegrates a commercial EPC (LTE Evolved Packet Core)from Polaris Networks which includes the main elements ofa standard 3GPP compliant LTE core network that is MME(Mobility Management Entity) SGW (Serving Gateway)PGW (Packet Gateway) HSS (Home Subscriber Server) andPCRF (Policy and Charging Rules Function) In additionthis EPC includes the EPDG (Evolved Packet Data Gateway)and ANDSF (Access Network Discovery and Session Func-tion) components for dual connectivity scenarios The RANemulator is connected to the EPC through the standard S1interface The testbed also offers the possibility of integratingartificial impairments in the interfaces between the corenetwork and the application servers

The Quamotion WebDriver another TRIANGLE ele-ment is able to automate user actions on both iOS andAndroid applications whether they are native hybrid offully web-based This tool is also used to prerecord the apprsquosuser flows which are needed to automate the otherwisemanual user actions in the test cases This completes the fullautomation operation

Finally the testbed also incorporates commercial mobiledevices (UEs) The devices are physically connected to thetestbed In order to preserve the radio conditions configuredat the radio access emulator the RAN emulator is cable con-ducted to the mobile device antenna connector To accuratelymeasure the power consumption theN6705B power analyzerdirectly powers the device Other measurement instrumentsmay be added in the future

4 TRIANGLE Approach

TheTRIANGLE testbed is an end-to-end framework devotedto testing and benchmarking mobile applications servicesand devices The idea behind the testing approach adoptedin the TRIANGLE testbed is to generalize QoE computationand provide a programmatic way of computing it Withthis approach the TRIANGLE testbed can accommodate thecomputation of the QoE for any application

The basic concept in TRIANGLErsquos approach to QoEevaluation is that the quality perceived by the user dependson many aspects (herein called domains) and that thisperception depends on its targeted use case For examplebattery life is critical for patient monitoring applications butless important in live streaming ones

To define the different 5G uses cases TRIANGLE basedits work in the Next Generation Mobile Network (NGMN)Alliance foundational White Paper which specifies theexpected services and network performance in future 5Gnetworks [20] More precisely the TRIANGLE project hasadopted a modular approach subdividing the so-calledldquoNGMN Use-Casesrdquo into blocks The name Use Case waskept in the TRIANGLE approach for describing the appli-cation service or vertical using the network services Thediversification of services expected in 5G requires a concretecategorization to have a sharp picture of what the user will beexpected to interact with This is essential for understandingwhich aspect of the QoE evaluation needs to be addressedThe final use cases categorization was defined in [11] andencompasses both the services normally accessible via mobile

Wireless Communications and Mobile Computing 5

Synthetic MOS

Test case

Scenario 1

Iteration 1

Meas 1

Meas

Meas P

Iteration

Meas 1

Meas

Meas P

Iteration N

Meas 1

Meas

Meas P

Scenario K

Iteration 1

Meas 1

Meas

Meas P

Iteration

Meas 1

Meas

Meas P

Iteration N

Meas 1

Meas

Meas P

KPI 1 Synthetic MOS 1

KPI Synthetic MOS

KPI R Synthetic MOS R

Aggregation Synthetic MOS Scenario 1

KPI 1 Synthetic MOS 1

KPI Synthetic MOS

KPI R Synthetic MOS R

Aggregation Synthetic MOS Scenario K

Figure 2 The process to obtain the synthetic-MOS score in a TRIANGLE test case

Table 1 Uses cases defined in the TRIANGLE project

Identifier Use CaseVR Virtual RealityGA GamingAR Augmented RealityCS Content Distribution Streaming ServicesLS Live Streaming ServicesSN Social NetworkingHS High Speed InternetPM Patient MonitoringES Emergency ServicesSM Smart MeteringSG Smart GridsCV Connected Vehicles

phones (UEs) and the ones that can be integrated in forexample gaming consoles advanced VR gear car units orIoT systems

The TRIANGLE domains group different aspects thatcan affect the final QoE perceived by the users The cur-rent testbed implementation supports three of the severaldomains that have been identified Apps User Experience(AUE) Apps Energy consumption (AEC) and ApplicationsDevice Resources Usage (RES)

Table 1 provides the use cases and Table 2 lists thedomains initially considered in TRIANGLE

To produce data to evaluate the QoE a series of testcases have been designed developed and implemented tobe run on the TRIANGLE testbed Obviously not all testcases are applicable to all applications under test becausenot all applications need or are designed to support all thefunctionalities that can be tested in the testbed In orderto automatically determine the test cases that are applicableto an application under test a questionnaire (identified as

features questionnaire in the portal) equivalent to the classi-cal conformance testing ICS (Implementation ConformanceStatement) has been developed and is accessible through theportal After filling the questionnaire the applicable test planthat is the test campaign with the list of applicable test casesis automatically generated

The sequence of user actions (type swipe tap etc) a userneeds to perform in the terminal (UE) to complete a task (egplay a video) is called the ldquoapp user flowrdquo In order to be ableto automatically run a test case the actual application userflow with the user actions a user would need to perform onthe phone to complete certain tasks defined in the test casealso has to be provided

Each test case univocally defines the conditions of execu-tion the sequence of actions the user would perform (ie theapp user flow) the sequence of actions that the elements ofthe testbed must perform the traffic injected the collectionof measurements to take and so forth In order to obtainstatistical significance each test case includes a numberof executions (iterations) under certain network conditions(herein called scenarios) Out of the various measurementsmade in the different iterations under any specific networkconditions (scenario) a number of KPIs (Key PerformanceIndicators) are computed The KPIs are normalized into astandard 1-to-5 scale as typically used in MOS scores andreferred to as synthetic-MOS a terminology that has beenadopted from previous works [7 21] The synthetic-MOSvalues are aggregated across network scenarios to produce anumber of intermediate synthetic-MOS scores which finallyare aggregated to obtain a synthetic-MOS score in each testcase (see Figure 2)

The process to obtain the final TRIANGLE mark issequential First for each domain a weighted average ofthe synthetic-MOS scores obtained in each test case inthe domain is calculated Next a weighted average of thesynthetic-MOS values in all the domains of a use case iscalculated to provide a single synthetic-MOS value per use

6 Wireless Communications and Mobile Computing

Table 2 TRIANGLE domains

Category Identifier Domain

Applications

AUE Apps User experienceAEC Apps Energy consumptionRES Device Resources UsageREL ReliabilityNWR Network Resources

Devices

Mobile Devices

DEC Energy ConsumptionDDP Data PerformanceDRF Radio PerformanceDRA User experience with reference apps

IoT DevicesIDR ReliabilityIDP Data PerformanceIEC Energy consumption

Synthetic MOS Domain A Use Case X

Synthetic MOS Domain B Use Case X

Synthetic MOS Use Case X

Synthetic MOS Use Case Y

TRIANGLE MARK

App

Synthetic MOS Test CaseDomain AUse Case X01

Synthetic MOS Test CaseDomain AUse Case X02

Synthetic MOS Test CaseDomain BUse Case X01

Synthetic MOS Test CaseDomain BUse Case X02

Synthetic MOS Test CaseDomain AUse Case Y01 Synthetic MOS Domain A Use Case Y

Synthetic MOS Test CaseDomain BUse Case Y01 Synthetic MOS Domain B Use Case Y

Figure 3 The process to obtain the TRIANGLE mark

case An application will usually be developed for one specificuse case as those defined in Table 1 but may be designed formore than one use case In the latter case a further weightedaverage is made with the synthetic-MOS scores obtained ineach use case supported by the application These sequentialsteps produce a single TRIANGLE mark an overall qualityscore as shown in Figure 3

This approach provides a common framework for testingapplications for benchmarking applications or even forcertifying disparate applications The overall process for anapp that implements features of different use cases is depictedin Figure 3

5 Details of the TRIANGLE QoE Computation

For each use case identified (see Table 1) and domain (seeTable 2) a number of test cases have been developed withinthe TRIANGLE project Each test case intends to test an

individual feature aspect or behaviour of the applicationunder test as shown in Figure 4

Each test case defines a number of measurements andbecause the results of the measurements depend on manyfactors they are not in general deterministic and thuseach test case has been designed not to perform just onesingle measurement but to run a number of iterations (N)of the same measurement Out of those measurements KPIsare computed For example if the time to load the firstmedia frame is the measurement taken in one specific testcase the average user waiting time KPI can be calculated bycomputing the mean of the values across all iterations Ingeneral different use case-domain pairs have a different set ofKPIsThe reader is encouraged to read [11] for further detailsabout the terminology used in TRIANGLE

Recommendation P10G100 Amendment 1 Definition ofQuality of Experience [2] notes that the overall acceptabilitymay be influenced by user expectations and context Forthe definition of the context technical specifications ITU-T

Wireless Communications and Mobile Computing 7

Feature Non-interactiveplayback

bullTime to load firstmedia frame

bullPlayback Cut-off

bullVideo resolution

bullContent Stall

i) Uses cases

vi) Synthetic MOS iv) Context

Networkscenarios

App User Flow Measurements KPIs MOS

ii) Domains

Apps Energy Consumption

Apps Device Resources

Apps User Experience

Mea

sure

men

ts

v) Test case execution

iii)Test caseAUECS01

Figure 4 QoE computation steps

G1030 ldquoEstimating end-to-end performance in IP networksfor data applicationsrdquo [9] and ITU-T G1031 ldquoQoE factors inweb-browsingrdquo [10] have been considered in TRIANGLEIn particular ITU-T G1031 [10] identifies the following con-text influence factors location (cafeteria office and home)interactivity (high-level interactivity versus low-level inter-activity) task type (business entertainment etc) and taskurgency (urgent versus casual) Userrsquos influence factors arehowever outside of the scope of the ITU recommendation

In the TRIANGLE project the context information hasbeen captured in the networks scenarios defined (Urban -Internet Cafe Off Peak Suburban - Shopping Mall BusyHours Urban ndash Pedestrian Urban ndash Office High speed trainndash Relay etc) and in the test cases specified in [11]

The test cases specify the conditions of the test butalso a sequence of actions that have to be executed by theapplication (app user flows) to test its features For examplethe test case that tests the ldquoPlay and Pauserdquo functionalitydefines the app user flow shown in Figure 5

The transformation of KPIs into QoE scores is the mostchallenging step in the TRIANGLE framework The execu-tion of the test cases will generate a significant amount of rawmeasurements about several aspects of the system SpecificKPIs can then be extracted through statistical analysis meandeviation cumulative distribution function (CDF) or ratio

TheKPIs will be individually interpolated in order to pro-vide a common homogeneous comparison and aggregationspace The interpolation is based on the application of twofunctions named Type I and Type II By using the proposedtwo types of interpolations the vast majority of KPIs can betranslated into normalized MOS-type of metric (synthetic-MOS) easy to be averaged in order to provide a simpleunified evaluation

Type I This function performs a linear interpolation on theoriginal data The variables 119898119894119899

119870119875119868and119898119886119909

119870119875119868are the worst

and best known values of a KPI from a reference case The

Perform login step (ifrequired) and wait for

10 seconds

Start playing avideo of 5

minutes during10 seconds

Pause thereproduction

Resume thereproduction after

2 minutes anduntil the end of

the video

Figure 5 App user flow used in the ldquoAUECS02 Play and Pauserdquotest case

function maps a value v of a KPI to vrsquo (synthetic-MOS) inthe range [1-to-5] by computing the following formula

V1015840 =V minus 119898119894119899

119870119875119868

119898119886119909119870119875119868minus 119898119894119899

119870119875119868

(50 minus 10) + 10 (1)

This function transforms a KPI to a synthetic-MOS value byapplying a simple linear interpolation between the worst andbest expected values from a reference case If a future inputcase falls outside the data range of the KPI the new value will

8 Wireless Communications and Mobile Computing

Table 3 AUECS002 test case description

Identifier AUECS002 (App User ExperienceContent Streaming002)Title Play and pauseObjective Measure the ability of the AUT to pause and the resume a media fileApplicability (ICSG ProductType = Application) AND (ICSG UseCases includes CS) AND ICSA CSPauseInitial Conditions AUT in in [AUT STARTED] mode (Note Defined in D22 [11] Appendix 4)

Steps(1) The Test System commands the AUT to replay the Application User Flow (Application User Flow that

presses first the Play button and later the Pause button)(2) The Test System measures whether pause operation was successful or not

Postamble (i) Execute the Postamble sequence (see section 26 in D22 [11] Appendix 4)

Measurements (Raw)

(i) Playback Cut-off Probability that successfully started stream reproduction is ended by a cause other thanthe intentional termination by the user

(ii) Pause Operation Whether pause operation is successful or not(iii) Time to load first media frame (s) after resuming The time elapsed since the user clicks resume button

until the media reproduction starts(Note For Exoplayer the RESUME button is the PLAY button)

be set to the extreme value minKPI (if it is worse) or maxKPI(if it is better)

Type II This function performs a logarithmic interpolationand is inspired on the opinion model recommended by theITU-T in [9] for a simple web search taskThis function mapsa value v of a KPI to vrsquo (synthetic-MOS) in the range [1-to-5]by computing the following formula

V1015840 =50 minus 10

ln ((119886 lowast 119908119900119903119904119905119870119875119868+ 119887) 119908119900119903119904119905

119870119875119868)

∙ (ln (V) minus ln (119886 lowast 119908119900119903119904119905119870119875119868+ 119886)) + 5

(2)

The default values of 119886 and 119887 correspond to the simple websearch task case (119886 = 0003 and 119887 = 012) [9 22] and theworst value has been extracted from the ITU-T G1030 Ifduring experimentation a future input case falls outside thedata range of the KPI the parameters 119886 and 119887will be updatedaccordingly Likewise if through subjective experimentationother values are considered better adjustments for specificservices the function can be easily updated

Once all KPIs are translated into synthetic-MOS valuesthey can be averaged with suitable weights In the averagingprocess the first step is to average over the network scenariosconsidered relevant for the use case as shown in Figure 2This provides the synthetic-MOS output value for the testcase If there is more than one test case per domain which isgenerally the case a weighted average is calculated in order toprovide one synthetic-MOS value per domain as depicted inFigure 3The final step is to average the synthetic-MOS scoresover all use cases supported by the application (see Figure 3)This provides the final score that is the TRIANGLE mark

6 A Practical Case Exoplayer under Test

For better understanding the complete process of obtainingtheTRIANGLEmark for a specific application the Exoplayer

Table4Measurement points associatedwith test caseAUECS002

Measurements Measurement points

Time to load first media frame Media File Playback - StartMedia File Playback - First Picture

Playback cut-off Media File Playback - StartMedia File Playback - End

Pause Media File Playback - Pause

is described in this section This application only has one usecase content distribution streaming services (CS)

Exoplayer is an application levelmedia player forAndroidpromoted by Google It provides an alternative to AndroidrsquosMediaPlayer API for playing audio and video both locally andover the Internet Exoplayer supports features not currentlysupported by Androidrsquos MediaPlayer API including DASHand SmoothStreaming adaptive playbacks

The TRIANGLE project has concentrated in testing justtwo of the Exoplayer features ldquoNoninteractive Playbackrdquoand ldquoPlay and Pauserdquo These features result in 6 test casesapplicable out of the test cases defined in TRIANGLETheseare test cases AUECS001 and AUECS002 in the App UserExperience domain test casesAECCS001 andAECCS002in the App Energy Consumption domain and test casesRESCS001 and RESCS002 in the Device Resources Usagedomain

The AUECS002 ldquoPlay and Pauserdquo test case descriptionbelonging to the AUE domain is shown in Table 3 The testcase description specifies the test conditions the generic appuser flow and the rawmeasurements which shall be collectedduring the execution of the test

The TRIANGLE project also offers a library that includesthe measurement points that should be inserted in thesource code of the app for enabling the collection of themeasurements specified Table 4 shows the measurementpoints required to compute the measurements specified intest case AUECS002

Wireless Communications and Mobile Computing 9

Table 5 Reference values for interpolation

Feature Domain KPI Synthetic MOS Calculation KPI min KPI maxNon-Interactive Playback AEC Average power consumption Type I 10 W 08 WNon-Interactive Playback AUE Time to load first media frame Type II KPI worst=20 msNon-Interactive Playback AUE Playback cut-off ratio Type I 50 0Non-Interactive Playback AUE Video resolution Type I 240p 720pNon-Interactive Playback RES Average CPU usage Type I 100 16Non-Interactive Playback RES Average memory usage Type I 100 40Play and Pause AEC Average power consumption Type I 10 W 08 WPlay and Pause AUE Pause operation success rate Type I 50 100Play and Pause RES Average CPU usage Type I 100 16Play and Pause RES Average memory usage Type I 100 40

The time to load first media picture measurement isobtained subtracting the timestamp of the measurementpoint ldquoMedia File Playback ndash Startrdquo from the measurementpoint ldquoMedia File Playback ndash First Picturerdquo

As specified in [11] all scenarios defined are applicableto the content streaming use case Therefore test cases inthe three domains currently supported by the testbed areexecuted in all the scenarios

Once the test campaign has finished the raw measure-ment results are processed to obtain the KPIs associated witheach test case average current consumption average time toload first media frame average CPU usage and so forth Theprocesses applied are detailed in Table 5 Based on previousexperiments performed by the authors the behaviour of thetime to load the first media frame KPI resembles the webresponse time KPI (ie the amount of time the user hasto wait for the service) and thus as recommended in theopinionmodel forweb search introduced in [9] a logarithmicinterpolation (type II) has been used for this metric

The results of the initial process that is the KPIs compu-tation are translated into synthetics-MOS values To computethese values reference benchmarking values for each of theKPIs need to be used according to the normalization andinterpolation process described in Section 5 Table 5 showswhat has been currently used by TRIANGLE for the AppUser Experience domain which is also used by NGMN asreference in their precommercial Trials document [23]

For example for the ldquotime to load first media framerdquo KPIshown in Table 5 the type of aggregation applied is averagingand the interpolation formula used is Type II

To achieve stable results each test case is executed 10times (10 iterations) in each network scenario The synthetic-MOS value in each domain is calculated by averaging themeasured synthetic-MOS values in the domain For examplesynthetic-MOS value is the RES domain obtained by aver-aging the synthetic-MOS value of ldquoaverage CPU usagerdquo andldquoaverage memory usagerdquo from the two test cases

Although Exoplayer supports several video streamingprotocols in this work only DASH [24] (Dynamic AdaptiveStreaming over HTTP) has been tested DASH clients shouldseamlessly adapt to changing network conditions by makingdecisions on which video segment to download (videosare encoded at multiple bitrates) The Exoplayerrsquos default

000

000

0

001

728

0

003

456

0

005

184

0

010

912

0

012

640

0

014

368

0

020

096

0

Timestamp

Video Resolution

0

200

400

600

800

1000

1200

Hor

izon

tal R

esol

utio

n

Figure 6 Video Resolution evolution in the Driving Urban Normalscenario

adaptation algorithm is basically throughput-based and someparameters control how often and when switching can occur

During the testing the testbed was configured with thedifferent network scenarios defined in [11] In these scenariosthe network configuration changes dynamically following arandom pattern resulting in different maximum throughputrates The expected behaviour of the application under testis that the video streaming client adapts to the availablethroughput by decreasing or increasing the resolution of thereceived video Figure 6 depicts how the client effectivelyadapts to the channel conditions

However the objective of the testing carried out in theTRIANGE testbed is not just to verify that the video stream-ing client actually adapts to the available maximum through-put but also to check whether this adaptation improves theusersrsquo experience quality

Table 6 shows a summary of the synthetic-MOS valuesobtained per scenario in one test case of each domain Thescores obtained in the RES andAECdomains are always highIn the AUE domain the synthetic MOS associated with theVideo Resolution shows low scores in some of the scenariosbecause the resolution decreases reasonable good scores inthe time to load first media and high scores in the time toplayback cut-off ratio Overall it can be concluded that the

10 Wireless Communications and Mobile Computing

Table 6 Synthetic MOS values per test case and scenario for the feature ldquoNoninteractive Playbackrdquo

AUE domain AEC domain RES domain

Test Case AUECS001 Test CaseAECCS001 Test Case RESCS001

ScenarioTime to loadfirst mediaframe

PlaybackCut-off ratio

VideoResolution

mode

AveragePower

Consumption

Average CPUUsage

AverageRAM Usage

HighSpeed DirectPassenger 21 31 23 47 43 42

Suburban Festival 38 47 31 48 43 41Suburban shopping mallbusy hours 37 37 13 48 44 41

Suburban shopping malloff-peak 36 31 23 48 43 41

Suburban stadium 38 29 21 47 44 41Urban Driving Normal 26 39 28 47 44 4Urban Driving TrafficJam 34 37 16 48 44 4

Urban Internet Cafe BusyHours 38 37 19 48 44 4

Urban Internet Cafe OffPeak 38 31 23 48 43 4

Urban Office 38 47 33 48 45 43Urban Pedestrian 39 26 2 47 44 4

35 36 23 47 44 41

DASH implementation of the video streaming client undertest is able to adapt to the changing conditions of the networkmaintaining an acceptable rate of video cut-off rebufferingtimes and resources usage

The final score in each domain is obtained by averagingthe synthetic-MOS values from all the tested network scenar-ios Figure 7 shows the spider diagram for the three domainstested In the User Experience domain the score obtained islower than the other domains due to the low synthetic-MOSvalues obtained for the video resolution

The final synthetic MOS for the use case Content Dis-tribution Streaming is obtained as a weighted average of thethree domains representing the overall QoE as perceived bythe userThefinal score for the Exoplayer version 1516 and thefeatures tested (Noninteractive Playback and Play and Pause)is 42 which means that the low score obtained in the videoresolution is compensated with the high scores in other KPIs

If an application under test has more than one use casethe next steps in the TRIANGLE mark project approachwould be the aggregation per use case and the aggregationover all use cases The final score the TRIANGLE mark is anestimation of the overall QoE as perceived by the user

In the current TRIANGLE implementation the weightsin all aggregations are the same Further research is neededto appropriately define the weights of each domain and eachuse case in the overall score of the applications

7 Conclusions

The main contribution of the TRIANGLE project is theprovision of a framework that generalizes QoE computation

and enables the execution of extensive and repeatable testcampaigns to obtainmeaningfulQoE scoresTheTRIANGLEproject has also defined amethodology which is based on thetransformation and aggregation of KPIs its transformationinto synthetic-MOS values and its aggregation over thedifferent domains and use cases

The TRIANGLE approach is a methodology flexibleenough to generalize the computation of QoE for any applica-tionservice Themethodology has been validated testing theDASH implementation in the Exoplayer App To confirm thesuitability of theweights used in the averaging process and theinterpolation parameters as well as to verify the correlationof the obtained MOS with that scored by users the authorshave started experiments with real users and initial results areencouraging

The process described produces a final TRIANGLEmarka single quality score which could eventually be used to cer-tify applications after achieving a consensus on the differentvalues of the process (weights limits etc) to use

Data Availability

Themethodology and results used to support the findings ofthis study are included within the article

Conflicts of Interest

The authors declare that they have no conflicts of inter-est

Wireless Communications and Mobile Computing 11

DER

AUE AEC

Device Resource Usage

User Experience Energy Consumption

Figure 7 Exoplayer synthetic-MOS values per domain

Acknowledgments

The TRIANGLE project is funded by the European UnionrsquosHorizon 2020 Research and Innovation Programme (GrantAgreement no 688712)

References

[1] ETSI ldquoHuman factors quality of experience (QoE) require-ments for real-time communication servicesrdquo Tech Rep 102643 2010

[2] ITU-T ldquoP10G100 (2006) amendment 1 (0107) new appendixI - definition of quality of experience (QoE)rdquo 2007

[3] F Kozamernik V Steinmann P Sunna and E WyckensldquoSAMVIQ - A new EBUmethodology for video quality evalua-tions in multimediardquo SMPTE Motion Imaging Journal vol 114no 4 pp 152ndash160 2005

[4] ITU-T ldquoG107 the E-model a computational model for use intransmission planningrdquo 2015

[5] J De Vriendt D De Vleeschauwer and D C Robinson ldquoQoEmodel for video delivered over an LTE network using HTTPadaptive streamingrdquo Bell Labs Technical Journal vol 18 no 4pp 45ndash62 2014

[6] S Jelassi G Rubino H Melvin H Youssef and G PujolleldquoQuality of Experience of VoIP Service A Survey of AssessmentApproaches andOpen Issuesrdquo IEEECommunications Surveys ampTutorials vol 14 no 2 pp 491ndash513 2012

[7] M Li C-L Yeh and S-Y Lu ldquoReal-Time QoE MonitoringSystem forVideo Streaming ServiceswithAdaptiveMedia Play-outrdquo International Journal of Digital Multimedia Broadcastingvol 2018 Article ID 2619438 11 pages 2018

[8] S Barakovic and L Skorin-Kapov ldquoSurvey and Challengesof QoE Management Issues in Wireless Networksrdquo Journal ofComputer Networks and Communications vol 2013 Article ID165146 28 pages 2013

[9] ITU-T ldquoG1030 estimating end-to-end performance in IPnetworks for data applicationsrdquo 2014

[10] ITU-T ldquoG1031 QoE factors in web-browsingrdquo 2014[11] EU H2020 TRIANGLE Project Deliverable D22 Final report

on the formalization of the certification process requirementsanduse cases 2017 httpswwwtriangle-projecteuproject-olddeliverables

[12] Q A Chen H Luo S Rosen et al ldquoQoE doctor diagnosingmobile app QoE with automated UI control and cross-layeranalysisrdquo in Proceedings of the Conference on Internet Mea-surement Conference (IMC rsquo14) pp 151ndash164 ACM VancouverCanada November 2014

[13] M A Mehmood A Wundsam S Uhlig D Levin N Sarrarand A Feldmann ldquoQoE-Lab Towards Evaluating Quality ofExperience for Future Internet Conditionsrdquo in Testbeds andResearch Infrastructure Korakis T Li H Tran-Gia P and HS Park Eds vol 90 of TridentCom 2011 Lnicst pp 286ndash301Springer Development of Networks and Communities BerlinGermany 2012

[14] D Levin A Wundsam A Mehmood and A FeldmannldquoBerlin The Berlin Experimental Router Laboratory for Inno-vative Networkingrdquo in TridentCom 2010 Lnicst T MagedanzA Gavras N H Thanh and J S Chase Eds vol 46 of LectureNotes of the Institute for Computer Sciences Social Informaticsand Telecommunications Engineering pp 602ndash604 SpringerHeidelberg Germany 2011

12 Wireless Communications and Mobile Computing

[15] K De Moor I Ketyko W Joseph et al ldquoProposed frameworkfor evaluating quality of experience in a mobile testbed-oriented living lab settingrdquo Mobile Networks and Applicationsvol 15 no 3 pp 378ndash391 2010

[16] R Sanchez-Iborra M-D Cano J J P C Rodrigues and JGarcia-Haro ldquoAnExperimental QoE Performance Study for theEfficient Transmission of High Demanding Traffic over an AdHoc Network Using BATMANrdquo Mobile Information Systemsvol 2015 Article ID 217106 14 pages 2015

[17] P Oliver-Balsalobre M Toril S Luna-Ramırez and R GarcıaGaraluz ldquoA system testbed for modeling encrypted video-streaming service performance indicators based on TCPIPmetricsrdquo EURASIP Journal on Wireless Communications andNetworking vol 2017 no 1 2017

[18] M Solera M Toril I Palomo G Gomez and J Poncela ldquoATestbed for Evaluating Video Streaming Services in LTErdquoWireless Personal Communications vol 98 no 3 pp 2753ndash27732018

[19] A Alvarez A Dıaz P Merino and F J Rivas ldquoField mea-surements of mobile services with Android smartphonesrdquoin Proceedings of the IEEE Consumer Communications andNetworking Conference (CCNC rsquo12) pp 105ndash109 Las Vegas NevUSA January 2012

[20] NGMN Alliance ldquoNGMN 5G white paperrdquo 2015 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2015NGMN 5G White Paper V1 0pdf

[21] ldquoInfrastructure and Design for Adaptivity and Flexibilityrdquo inMobile Information Systems Springer 2006

[22] J Nielsen ldquoResponse Times The Three Important Limitsrdquo inUsability Engineering 1993

[23] NGMN Alliance ldquoDefinition of the testing framework for theNGMN 5G pre-commercial networks trialsrdquo 2018 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2018180220 NGMN PreCommTrials Framework definition v1 0pdf

[24] 3GPP TS 26246 ldquoTransparent end-to-end Packet-switchedStreaming Services (PSS) Progressive Download and DynamicAdaptive Streaming over HTTP (3GP-DASH)rdquo 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: QoE Evaluation: The TRIANGLE Testbed Approach

2 Wireless Communications and Mobile Computing

project follows also a parametric approach to compute theQoE

Conclusions in [5] point out that a large number ofparameters in the model could be cumbersome due tothe difficulty of obtaining the required measurements andbecause it would require significantly more data pointsand radio scenarios to tune the model The TRIANGLEapproach has overcome this limitation through the largevariety of measurements collected the variety of end-to-end network scenarios designed and mostly the degree ofautomation reached which enables the execution of intensivetest campaigns covering all scenarios

Although there are many proposals to calculate thequality of experience in general they are very much orientedto specific services for example voice [6] or video streaming[7 8] This paper introduces a methodology to compute theQoE of any application even if the application supportsmorethan one service

The QoE as perceived by the user depends on manyfactors the network conditions both at the core (CN) and atthe radio access (RAN) the terminal the service servers andhuman factors difficult to control Due to the complexity andthe time needed to run experiments or make measurementsmost of the studies limit the evaluation of theQoE to a limitedset of or even noncontrolled network conditions especiallythose that affect the radio interface (fading interference etc)TRIANGLE presents a methodology and a framework tocompute the QoE out of technical parameters weighting theimpact of the network conditions based on the actual usescases for the specific application As in ITU recommendationG1030 [9] and G1031 [10] the userrsquos influence factors areoutside of the scope of the methodology developed inTRIANGLE

TRIANGLE has developed an end-to-end cellular net-work testbed and a set of test cases to automatically testapplications under multiple changing network conditionsandor terminals and provide a single quality score Thescore is computed weighting the results obtained testingthe different uses cases applicable to the application forthe different aspects relevant to the user (the domains inTRIANGLE) and under the network scenarios relevant forthe application The framework allows specific QoS-to-QoEtranslations to be incorporated into the framework based onthe outcome of subjective experiments on new services

Note that although the TRIANGLE project also providesmeans to test devices and services only the process to testapplications is presented here

The rest of the paper is organized as follows Section 2provides an overview of related work Section 3 presents anoverview of the TRIANGLE testbed Section 4 introducesthe TRIANGLE approach Section 5 describes in detail howthe quality score is obtained in the TRIANGLE frameworkSection 6 provides an example and the outcome of thisapproach applied to the evaluation of a simple App theExoplayer Finally Section 7 summarizes the conclusions

2 State of the Art

Modelling and evaluatingQoE in current and next generationof mobile networks is an important and active research

area [8] Different types of testbeds can be found in the lit-erature ranging from simulated to emulated mobilewirelesstestbeds which are used to obtain subjective or objective QoEmetrics to extract a QoE model or to assess the correctnessof a previously generated QoE model Many of the testbedsreviewed have been developed for a specific research insteadof for a more general purpose such as the TRIANGLEtestbed which can serve a wide range of users (researchersapp developers service providers etc) In this section someQoE-related works that rely on testbeds are reviewed

The QoE Doctor tool [12] is closely related to the TRI-ANGLE testbed since its main purpose is the evaluation ofmobile apps QoE in an accurate systematic and repeatableway However QoE Doctor is just an Android tool that cantake measurements at different layers from the app userinterface (UI) to the network and quantify the factors thatimpact the app QoE It can be used to identify the causesof a degraded QoE but it is not able to control or monitorthe mobile network QoE Doctor uses an UI automationtool to reproduce user behaviour in the terminal (app userflows in TRIANGLE nomenclature) and tomeasure the user-perceived latency by detecting changes on the screen OtherQoE metrics computed by QoE Doctor are the mobile dataconsumption and the network energy consumption of the appby means of an offline analysis of the TCP flows The authorshave used QoE Doctor to evaluate the QoE of popular appssuch as YouTube Facebook or mobile web browsers One ofthe drawbacks of this approach is that most metrics are basedon detecting specific changes on the UI Thus the modulein charge of detecting UI changes has to be adapted for eachspecific app under test

QoE-Lab [13] is a multipurpose testbed that allows theevaluation of QoE in mobile networks One of its purposesis to evaluate the effect of new network scenarios on servicessuch as VoIP video streaming or web applications To thisend QoE-Lab extends BERLIN [14] testbed framework withsupport for next generation mobile networks and some newservices such as VoIP and video streaming The testbedallows the study of the effect of network handovers betweenwireless technologies dynamic migrations and virtualizedresources Similar to TRIANGLE the experiments are exe-cuted in a repeatable and controlled environment Howeverin the experiments presented in [13] the user equipmentwere laptops which usually have better performance andmore resources than smartphones (battery memory andCPU)The experiments also evaluated the impact of differentscenarios on the multimedia streaming services included inthe testbed The main limitations are that it is not possibleto evaluate different mobile apps running in different smart-phones or relate the QoE with the CPU battery usage and soforth

De Moor et al [15] proposed a user-centric methodologyfor the multidimensional evaluation of QoE in a mobile real-life environment The methodology relies on a distributedtestbed that monitors the network QoS and context infor-mation and integrates the subjective user experience basedon real-life settings The main component of the proposedarchitecture is theMobile Agent a component to be installedin the user device that monitors contextual data (location

Wireless Communications and Mobile Computing 3

velocity on-body sensors etc) and QoS parameters (CPUmemory signal strength throughput etc) and provides aninterface to collect user experience feedback A processingentity receives the (device and network) monitored data andanalyzes the incoming data The objective of this testbedinfrastructure is to study the effects of different networkparameters in the QoE in order to define new estimationmodels for QoE

In [16] the authors evaluated routing protocols BATMANand OLSR to support VoIP and video traffic from a QoSand QoE perspective The evaluation took place by runningexperiments in two different testbeds First experimentswere run in the Omnet++ simulator using the InetManetframework Second the same network topology and networkscenarios were deployed in the Emulab test bench a real(emulated) testbed and the same experiments were carriedout Finally the results of both testbeds (simulated andreal-emulated) were statistically compared in order to findinconsistencies The experiments in the simulated and emu-lated environments showed that BATMAN achieves betterthan OLSR and determined the relation between differentprotocol parameters and their performanceThese results canbe applied to implement network nodes that control in-stackprotocol parameters as a function of the observed traffic

In [17] a testbed to automatically extract a QoE modelof encrypted video streaming services was presented Thetestbed includes a software agent to be installed in theuser device which is able to reproduce the user interactionand collect the end-user application-level measurements thenetwork emulator NetEm which changes the link conditionsemulating the radio or core network and a Probe softwarewhich processes all the traffic at different levels computesthe TCPIP metrics and compares the end-user and networklevel measurements This testbed has been used to automat-ically construct the model (and validate the model) of thevideo performance of encrypted YouTube traffic over a Wi-Fi connection

More recently in [18] Solera et al presented a testbedfor evaluating video streaming services in LTE networks Inparticular the QoE of 3D video streaming services over LTEwas evaluated The testbed consists of a streaming serverthe NetEm network emulator and a streaming client Oneof the main contributions of the work is the extension ofNetEm to better model the characteristics of the packet delayin bursty services such as video streaming Previously torunning the experiments in the emulation-based testbedthe authors carried out a simulation campaign with anLTE simulator to obtain the configuration parameters ofNetEm for four different network scenarios These scenarioscombine different positions of the user in the cell anddifferent network loads From the review of these works itbecomes clear that the setup of a simulation or emulationframework for wireless or mobile environments requires inmany cases a deep understanding of the network scenariosTRIANGLE aims to reduce this effort by providing a set ofpreconfigured real network scenarios and the computationof the MOS in order to allow both researchers and appdevelopers to focus on the evaluation of new apps servicesand devices

3 TRIANGLE

The testbed the test methodology and the set of test caseshave been developed within the European funded TRIAN-GLE project Figure 1 shows the main functional blocks thatmake up the TRIANGLE testbed architecture

To facilitate the use of the TRIANGLE testbed fordifferent objectives (testing benchmarking and certifying)to remotely access the testbed and to gather and presentresults a web portal which offers an intuitive interfacehas been implemented It provides access to the testbedhiding unnecessary complexity to App developers Foradvanced users interested in deeper access to configurationparameters of the testbed elements or the test cases thetestbed offers a direct access to the Keysight TAP (TestingAutomation Platform) which is a programmable sequencerof actions with plugins that expose the configuration andcontrol of the instruments and tools integrated into thetestbed

In addition to the testbed itself TRIANGLE has devel-oped a test methodology and has implemented a set oftest cases which are made available through the portal Toachieve full test case automation all the testbed componentsare under the control of the testbed management frame-work which coordinates their configuration and executionprocesses the measurements made in each test case andcomputes QoE scores for the application tested

In addition as part of the testbed management frame-work each testbed component is controlled through a TAPdriver which serves as bridge between the TAP engine andthe actual component interface The configuration of thedifferent elements of the testbed is determined by the testcase to run within the set of test cases provided as partof TRIANGLE or the customized test cases built by usersThe testbed translates the test cases specific configurationssettings and actions into TAP commands that take care ofcommanding each testbed component

TRIANGLE test cases specify the measurements thatshould be collected to compute the KPI (Key PerformanceIndicators) of the feature under test Some measurementsare obtained directly from measurement instruments butothers require specific probes (either software or hardware)to help extract the specific measurements Software probesrunning on the same device (UE LTE User Equipment) thatthe application under test include DEKRA Agents and theTestelDroid [19] tool from UMA TRIANGLE also providesan instrumentation library so that app developers can delivermeasurement outputs which cannot otherwise be extractedand must be provided by the application itself Hardwareprobes include a power analyzer connected to the UE tomeasure power consumption and the radio access emulatorthat among others provides internal logs about the protocolexchange and radio interface low layers metrics

The radio access (LTE RAN) emulator plays a key role inthe TRIANGLE testbed The testbed RAN is provided by anoff-the-shelf E7515A UXM Wireless Test Set from Keysightan emulator that provides state-of-the-art test features Mostimportant the UXM also provides radio channel emulationfor the downlink radio channel

4 Wireless Communications and Mobile Computing

DBs

Interface and visualization (Portal)

Testbed management

Measurements and datacollections

UE

RAN

Transport

AppEPC Local application

serversApp App

TAP

Orcomposutor

Orchestration

Compositor ExecutorDBs

ETLframework

ETL modules

DEKRADriver

WebDriverDriver

AndroidTap Driver

EPCDriver

AppInstrumentat

ion TAPDriver

iOS TAPDriver

hellipTAP Driver

Figure 1 TRIANGLE testbed architecture

In order to provide an end-to-end system the testbedintegrates a commercial EPC (LTE Evolved Packet Core)from Polaris Networks which includes the main elements ofa standard 3GPP compliant LTE core network that is MME(Mobility Management Entity) SGW (Serving Gateway)PGW (Packet Gateway) HSS (Home Subscriber Server) andPCRF (Policy and Charging Rules Function) In additionthis EPC includes the EPDG (Evolved Packet Data Gateway)and ANDSF (Access Network Discovery and Session Func-tion) components for dual connectivity scenarios The RANemulator is connected to the EPC through the standard S1interface The testbed also offers the possibility of integratingartificial impairments in the interfaces between the corenetwork and the application servers

The Quamotion WebDriver another TRIANGLE ele-ment is able to automate user actions on both iOS andAndroid applications whether they are native hybrid offully web-based This tool is also used to prerecord the apprsquosuser flows which are needed to automate the otherwisemanual user actions in the test cases This completes the fullautomation operation

Finally the testbed also incorporates commercial mobiledevices (UEs) The devices are physically connected to thetestbed In order to preserve the radio conditions configuredat the radio access emulator the RAN emulator is cable con-ducted to the mobile device antenna connector To accuratelymeasure the power consumption theN6705B power analyzerdirectly powers the device Other measurement instrumentsmay be added in the future

4 TRIANGLE Approach

TheTRIANGLE testbed is an end-to-end framework devotedto testing and benchmarking mobile applications servicesand devices The idea behind the testing approach adoptedin the TRIANGLE testbed is to generalize QoE computationand provide a programmatic way of computing it Withthis approach the TRIANGLE testbed can accommodate thecomputation of the QoE for any application

The basic concept in TRIANGLErsquos approach to QoEevaluation is that the quality perceived by the user dependson many aspects (herein called domains) and that thisperception depends on its targeted use case For examplebattery life is critical for patient monitoring applications butless important in live streaming ones

To define the different 5G uses cases TRIANGLE basedits work in the Next Generation Mobile Network (NGMN)Alliance foundational White Paper which specifies theexpected services and network performance in future 5Gnetworks [20] More precisely the TRIANGLE project hasadopted a modular approach subdividing the so-calledldquoNGMN Use-Casesrdquo into blocks The name Use Case waskept in the TRIANGLE approach for describing the appli-cation service or vertical using the network services Thediversification of services expected in 5G requires a concretecategorization to have a sharp picture of what the user will beexpected to interact with This is essential for understandingwhich aspect of the QoE evaluation needs to be addressedThe final use cases categorization was defined in [11] andencompasses both the services normally accessible via mobile

Wireless Communications and Mobile Computing 5

Synthetic MOS

Test case

Scenario 1

Iteration 1

Meas 1

Meas

Meas P

Iteration

Meas 1

Meas

Meas P

Iteration N

Meas 1

Meas

Meas P

Scenario K

Iteration 1

Meas 1

Meas

Meas P

Iteration

Meas 1

Meas

Meas P

Iteration N

Meas 1

Meas

Meas P

KPI 1 Synthetic MOS 1

KPI Synthetic MOS

KPI R Synthetic MOS R

Aggregation Synthetic MOS Scenario 1

KPI 1 Synthetic MOS 1

KPI Synthetic MOS

KPI R Synthetic MOS R

Aggregation Synthetic MOS Scenario K

Figure 2 The process to obtain the synthetic-MOS score in a TRIANGLE test case

Table 1 Uses cases defined in the TRIANGLE project

Identifier Use CaseVR Virtual RealityGA GamingAR Augmented RealityCS Content Distribution Streaming ServicesLS Live Streaming ServicesSN Social NetworkingHS High Speed InternetPM Patient MonitoringES Emergency ServicesSM Smart MeteringSG Smart GridsCV Connected Vehicles

phones (UEs) and the ones that can be integrated in forexample gaming consoles advanced VR gear car units orIoT systems

The TRIANGLE domains group different aspects thatcan affect the final QoE perceived by the users The cur-rent testbed implementation supports three of the severaldomains that have been identified Apps User Experience(AUE) Apps Energy consumption (AEC) and ApplicationsDevice Resources Usage (RES)

Table 1 provides the use cases and Table 2 lists thedomains initially considered in TRIANGLE

To produce data to evaluate the QoE a series of testcases have been designed developed and implemented tobe run on the TRIANGLE testbed Obviously not all testcases are applicable to all applications under test becausenot all applications need or are designed to support all thefunctionalities that can be tested in the testbed In orderto automatically determine the test cases that are applicableto an application under test a questionnaire (identified as

features questionnaire in the portal) equivalent to the classi-cal conformance testing ICS (Implementation ConformanceStatement) has been developed and is accessible through theportal After filling the questionnaire the applicable test planthat is the test campaign with the list of applicable test casesis automatically generated

The sequence of user actions (type swipe tap etc) a userneeds to perform in the terminal (UE) to complete a task (egplay a video) is called the ldquoapp user flowrdquo In order to be ableto automatically run a test case the actual application userflow with the user actions a user would need to perform onthe phone to complete certain tasks defined in the test casealso has to be provided

Each test case univocally defines the conditions of execu-tion the sequence of actions the user would perform (ie theapp user flow) the sequence of actions that the elements ofthe testbed must perform the traffic injected the collectionof measurements to take and so forth In order to obtainstatistical significance each test case includes a numberof executions (iterations) under certain network conditions(herein called scenarios) Out of the various measurementsmade in the different iterations under any specific networkconditions (scenario) a number of KPIs (Key PerformanceIndicators) are computed The KPIs are normalized into astandard 1-to-5 scale as typically used in MOS scores andreferred to as synthetic-MOS a terminology that has beenadopted from previous works [7 21] The synthetic-MOSvalues are aggregated across network scenarios to produce anumber of intermediate synthetic-MOS scores which finallyare aggregated to obtain a synthetic-MOS score in each testcase (see Figure 2)

The process to obtain the final TRIANGLE mark issequential First for each domain a weighted average ofthe synthetic-MOS scores obtained in each test case inthe domain is calculated Next a weighted average of thesynthetic-MOS values in all the domains of a use case iscalculated to provide a single synthetic-MOS value per use

6 Wireless Communications and Mobile Computing

Table 2 TRIANGLE domains

Category Identifier Domain

Applications

AUE Apps User experienceAEC Apps Energy consumptionRES Device Resources UsageREL ReliabilityNWR Network Resources

Devices

Mobile Devices

DEC Energy ConsumptionDDP Data PerformanceDRF Radio PerformanceDRA User experience with reference apps

IoT DevicesIDR ReliabilityIDP Data PerformanceIEC Energy consumption

Synthetic MOS Domain A Use Case X

Synthetic MOS Domain B Use Case X

Synthetic MOS Use Case X

Synthetic MOS Use Case Y

TRIANGLE MARK

App

Synthetic MOS Test CaseDomain AUse Case X01

Synthetic MOS Test CaseDomain AUse Case X02

Synthetic MOS Test CaseDomain BUse Case X01

Synthetic MOS Test CaseDomain BUse Case X02

Synthetic MOS Test CaseDomain AUse Case Y01 Synthetic MOS Domain A Use Case Y

Synthetic MOS Test CaseDomain BUse Case Y01 Synthetic MOS Domain B Use Case Y

Figure 3 The process to obtain the TRIANGLE mark

case An application will usually be developed for one specificuse case as those defined in Table 1 but may be designed formore than one use case In the latter case a further weightedaverage is made with the synthetic-MOS scores obtained ineach use case supported by the application These sequentialsteps produce a single TRIANGLE mark an overall qualityscore as shown in Figure 3

This approach provides a common framework for testingapplications for benchmarking applications or even forcertifying disparate applications The overall process for anapp that implements features of different use cases is depictedin Figure 3

5 Details of the TRIANGLE QoE Computation

For each use case identified (see Table 1) and domain (seeTable 2) a number of test cases have been developed withinthe TRIANGLE project Each test case intends to test an

individual feature aspect or behaviour of the applicationunder test as shown in Figure 4

Each test case defines a number of measurements andbecause the results of the measurements depend on manyfactors they are not in general deterministic and thuseach test case has been designed not to perform just onesingle measurement but to run a number of iterations (N)of the same measurement Out of those measurements KPIsare computed For example if the time to load the firstmedia frame is the measurement taken in one specific testcase the average user waiting time KPI can be calculated bycomputing the mean of the values across all iterations Ingeneral different use case-domain pairs have a different set ofKPIsThe reader is encouraged to read [11] for further detailsabout the terminology used in TRIANGLE

Recommendation P10G100 Amendment 1 Definition ofQuality of Experience [2] notes that the overall acceptabilitymay be influenced by user expectations and context Forthe definition of the context technical specifications ITU-T

Wireless Communications and Mobile Computing 7

Feature Non-interactiveplayback

bullTime to load firstmedia frame

bullPlayback Cut-off

bullVideo resolution

bullContent Stall

i) Uses cases

vi) Synthetic MOS iv) Context

Networkscenarios

App User Flow Measurements KPIs MOS

ii) Domains

Apps Energy Consumption

Apps Device Resources

Apps User Experience

Mea

sure

men

ts

v) Test case execution

iii)Test caseAUECS01

Figure 4 QoE computation steps

G1030 ldquoEstimating end-to-end performance in IP networksfor data applicationsrdquo [9] and ITU-T G1031 ldquoQoE factors inweb-browsingrdquo [10] have been considered in TRIANGLEIn particular ITU-T G1031 [10] identifies the following con-text influence factors location (cafeteria office and home)interactivity (high-level interactivity versus low-level inter-activity) task type (business entertainment etc) and taskurgency (urgent versus casual) Userrsquos influence factors arehowever outside of the scope of the ITU recommendation

In the TRIANGLE project the context information hasbeen captured in the networks scenarios defined (Urban -Internet Cafe Off Peak Suburban - Shopping Mall BusyHours Urban ndash Pedestrian Urban ndash Office High speed trainndash Relay etc) and in the test cases specified in [11]

The test cases specify the conditions of the test butalso a sequence of actions that have to be executed by theapplication (app user flows) to test its features For examplethe test case that tests the ldquoPlay and Pauserdquo functionalitydefines the app user flow shown in Figure 5

The transformation of KPIs into QoE scores is the mostchallenging step in the TRIANGLE framework The execu-tion of the test cases will generate a significant amount of rawmeasurements about several aspects of the system SpecificKPIs can then be extracted through statistical analysis meandeviation cumulative distribution function (CDF) or ratio

TheKPIs will be individually interpolated in order to pro-vide a common homogeneous comparison and aggregationspace The interpolation is based on the application of twofunctions named Type I and Type II By using the proposedtwo types of interpolations the vast majority of KPIs can betranslated into normalized MOS-type of metric (synthetic-MOS) easy to be averaged in order to provide a simpleunified evaluation

Type I This function performs a linear interpolation on theoriginal data The variables 119898119894119899

119870119875119868and119898119886119909

119870119875119868are the worst

and best known values of a KPI from a reference case The

Perform login step (ifrequired) and wait for

10 seconds

Start playing avideo of 5

minutes during10 seconds

Pause thereproduction

Resume thereproduction after

2 minutes anduntil the end of

the video

Figure 5 App user flow used in the ldquoAUECS02 Play and Pauserdquotest case

function maps a value v of a KPI to vrsquo (synthetic-MOS) inthe range [1-to-5] by computing the following formula

V1015840 =V minus 119898119894119899

119870119875119868

119898119886119909119870119875119868minus 119898119894119899

119870119875119868

(50 minus 10) + 10 (1)

This function transforms a KPI to a synthetic-MOS value byapplying a simple linear interpolation between the worst andbest expected values from a reference case If a future inputcase falls outside the data range of the KPI the new value will

8 Wireless Communications and Mobile Computing

Table 3 AUECS002 test case description

Identifier AUECS002 (App User ExperienceContent Streaming002)Title Play and pauseObjective Measure the ability of the AUT to pause and the resume a media fileApplicability (ICSG ProductType = Application) AND (ICSG UseCases includes CS) AND ICSA CSPauseInitial Conditions AUT in in [AUT STARTED] mode (Note Defined in D22 [11] Appendix 4)

Steps(1) The Test System commands the AUT to replay the Application User Flow (Application User Flow that

presses first the Play button and later the Pause button)(2) The Test System measures whether pause operation was successful or not

Postamble (i) Execute the Postamble sequence (see section 26 in D22 [11] Appendix 4)

Measurements (Raw)

(i) Playback Cut-off Probability that successfully started stream reproduction is ended by a cause other thanthe intentional termination by the user

(ii) Pause Operation Whether pause operation is successful or not(iii) Time to load first media frame (s) after resuming The time elapsed since the user clicks resume button

until the media reproduction starts(Note For Exoplayer the RESUME button is the PLAY button)

be set to the extreme value minKPI (if it is worse) or maxKPI(if it is better)

Type II This function performs a logarithmic interpolationand is inspired on the opinion model recommended by theITU-T in [9] for a simple web search taskThis function mapsa value v of a KPI to vrsquo (synthetic-MOS) in the range [1-to-5]by computing the following formula

V1015840 =50 minus 10

ln ((119886 lowast 119908119900119903119904119905119870119875119868+ 119887) 119908119900119903119904119905

119870119875119868)

∙ (ln (V) minus ln (119886 lowast 119908119900119903119904119905119870119875119868+ 119886)) + 5

(2)

The default values of 119886 and 119887 correspond to the simple websearch task case (119886 = 0003 and 119887 = 012) [9 22] and theworst value has been extracted from the ITU-T G1030 Ifduring experimentation a future input case falls outside thedata range of the KPI the parameters 119886 and 119887will be updatedaccordingly Likewise if through subjective experimentationother values are considered better adjustments for specificservices the function can be easily updated

Once all KPIs are translated into synthetic-MOS valuesthey can be averaged with suitable weights In the averagingprocess the first step is to average over the network scenariosconsidered relevant for the use case as shown in Figure 2This provides the synthetic-MOS output value for the testcase If there is more than one test case per domain which isgenerally the case a weighted average is calculated in order toprovide one synthetic-MOS value per domain as depicted inFigure 3The final step is to average the synthetic-MOS scoresover all use cases supported by the application (see Figure 3)This provides the final score that is the TRIANGLE mark

6 A Practical Case Exoplayer under Test

For better understanding the complete process of obtainingtheTRIANGLEmark for a specific application the Exoplayer

Table4Measurement points associatedwith test caseAUECS002

Measurements Measurement points

Time to load first media frame Media File Playback - StartMedia File Playback - First Picture

Playback cut-off Media File Playback - StartMedia File Playback - End

Pause Media File Playback - Pause

is described in this section This application only has one usecase content distribution streaming services (CS)

Exoplayer is an application levelmedia player forAndroidpromoted by Google It provides an alternative to AndroidrsquosMediaPlayer API for playing audio and video both locally andover the Internet Exoplayer supports features not currentlysupported by Androidrsquos MediaPlayer API including DASHand SmoothStreaming adaptive playbacks

The TRIANGLE project has concentrated in testing justtwo of the Exoplayer features ldquoNoninteractive Playbackrdquoand ldquoPlay and Pauserdquo These features result in 6 test casesapplicable out of the test cases defined in TRIANGLETheseare test cases AUECS001 and AUECS002 in the App UserExperience domain test casesAECCS001 andAECCS002in the App Energy Consumption domain and test casesRESCS001 and RESCS002 in the Device Resources Usagedomain

The AUECS002 ldquoPlay and Pauserdquo test case descriptionbelonging to the AUE domain is shown in Table 3 The testcase description specifies the test conditions the generic appuser flow and the rawmeasurements which shall be collectedduring the execution of the test

The TRIANGLE project also offers a library that includesthe measurement points that should be inserted in thesource code of the app for enabling the collection of themeasurements specified Table 4 shows the measurementpoints required to compute the measurements specified intest case AUECS002

Wireless Communications and Mobile Computing 9

Table 5 Reference values for interpolation

Feature Domain KPI Synthetic MOS Calculation KPI min KPI maxNon-Interactive Playback AEC Average power consumption Type I 10 W 08 WNon-Interactive Playback AUE Time to load first media frame Type II KPI worst=20 msNon-Interactive Playback AUE Playback cut-off ratio Type I 50 0Non-Interactive Playback AUE Video resolution Type I 240p 720pNon-Interactive Playback RES Average CPU usage Type I 100 16Non-Interactive Playback RES Average memory usage Type I 100 40Play and Pause AEC Average power consumption Type I 10 W 08 WPlay and Pause AUE Pause operation success rate Type I 50 100Play and Pause RES Average CPU usage Type I 100 16Play and Pause RES Average memory usage Type I 100 40

The time to load first media picture measurement isobtained subtracting the timestamp of the measurementpoint ldquoMedia File Playback ndash Startrdquo from the measurementpoint ldquoMedia File Playback ndash First Picturerdquo

As specified in [11] all scenarios defined are applicableto the content streaming use case Therefore test cases inthe three domains currently supported by the testbed areexecuted in all the scenarios

Once the test campaign has finished the raw measure-ment results are processed to obtain the KPIs associated witheach test case average current consumption average time toload first media frame average CPU usage and so forth Theprocesses applied are detailed in Table 5 Based on previousexperiments performed by the authors the behaviour of thetime to load the first media frame KPI resembles the webresponse time KPI (ie the amount of time the user hasto wait for the service) and thus as recommended in theopinionmodel forweb search introduced in [9] a logarithmicinterpolation (type II) has been used for this metric

The results of the initial process that is the KPIs compu-tation are translated into synthetics-MOS values To computethese values reference benchmarking values for each of theKPIs need to be used according to the normalization andinterpolation process described in Section 5 Table 5 showswhat has been currently used by TRIANGLE for the AppUser Experience domain which is also used by NGMN asreference in their precommercial Trials document [23]

For example for the ldquotime to load first media framerdquo KPIshown in Table 5 the type of aggregation applied is averagingand the interpolation formula used is Type II

To achieve stable results each test case is executed 10times (10 iterations) in each network scenario The synthetic-MOS value in each domain is calculated by averaging themeasured synthetic-MOS values in the domain For examplesynthetic-MOS value is the RES domain obtained by aver-aging the synthetic-MOS value of ldquoaverage CPU usagerdquo andldquoaverage memory usagerdquo from the two test cases

Although Exoplayer supports several video streamingprotocols in this work only DASH [24] (Dynamic AdaptiveStreaming over HTTP) has been tested DASH clients shouldseamlessly adapt to changing network conditions by makingdecisions on which video segment to download (videosare encoded at multiple bitrates) The Exoplayerrsquos default

000

000

0

001

728

0

003

456

0

005

184

0

010

912

0

012

640

0

014

368

0

020

096

0

Timestamp

Video Resolution

0

200

400

600

800

1000

1200

Hor

izon

tal R

esol

utio

n

Figure 6 Video Resolution evolution in the Driving Urban Normalscenario

adaptation algorithm is basically throughput-based and someparameters control how often and when switching can occur

During the testing the testbed was configured with thedifferent network scenarios defined in [11] In these scenariosthe network configuration changes dynamically following arandom pattern resulting in different maximum throughputrates The expected behaviour of the application under testis that the video streaming client adapts to the availablethroughput by decreasing or increasing the resolution of thereceived video Figure 6 depicts how the client effectivelyadapts to the channel conditions

However the objective of the testing carried out in theTRIANGE testbed is not just to verify that the video stream-ing client actually adapts to the available maximum through-put but also to check whether this adaptation improves theusersrsquo experience quality

Table 6 shows a summary of the synthetic-MOS valuesobtained per scenario in one test case of each domain Thescores obtained in the RES andAECdomains are always highIn the AUE domain the synthetic MOS associated with theVideo Resolution shows low scores in some of the scenariosbecause the resolution decreases reasonable good scores inthe time to load first media and high scores in the time toplayback cut-off ratio Overall it can be concluded that the

10 Wireless Communications and Mobile Computing

Table 6 Synthetic MOS values per test case and scenario for the feature ldquoNoninteractive Playbackrdquo

AUE domain AEC domain RES domain

Test Case AUECS001 Test CaseAECCS001 Test Case RESCS001

ScenarioTime to loadfirst mediaframe

PlaybackCut-off ratio

VideoResolution

mode

AveragePower

Consumption

Average CPUUsage

AverageRAM Usage

HighSpeed DirectPassenger 21 31 23 47 43 42

Suburban Festival 38 47 31 48 43 41Suburban shopping mallbusy hours 37 37 13 48 44 41

Suburban shopping malloff-peak 36 31 23 48 43 41

Suburban stadium 38 29 21 47 44 41Urban Driving Normal 26 39 28 47 44 4Urban Driving TrafficJam 34 37 16 48 44 4

Urban Internet Cafe BusyHours 38 37 19 48 44 4

Urban Internet Cafe OffPeak 38 31 23 48 43 4

Urban Office 38 47 33 48 45 43Urban Pedestrian 39 26 2 47 44 4

35 36 23 47 44 41

DASH implementation of the video streaming client undertest is able to adapt to the changing conditions of the networkmaintaining an acceptable rate of video cut-off rebufferingtimes and resources usage

The final score in each domain is obtained by averagingthe synthetic-MOS values from all the tested network scenar-ios Figure 7 shows the spider diagram for the three domainstested In the User Experience domain the score obtained islower than the other domains due to the low synthetic-MOSvalues obtained for the video resolution

The final synthetic MOS for the use case Content Dis-tribution Streaming is obtained as a weighted average of thethree domains representing the overall QoE as perceived bythe userThefinal score for the Exoplayer version 1516 and thefeatures tested (Noninteractive Playback and Play and Pause)is 42 which means that the low score obtained in the videoresolution is compensated with the high scores in other KPIs

If an application under test has more than one use casethe next steps in the TRIANGLE mark project approachwould be the aggregation per use case and the aggregationover all use cases The final score the TRIANGLE mark is anestimation of the overall QoE as perceived by the user

In the current TRIANGLE implementation the weightsin all aggregations are the same Further research is neededto appropriately define the weights of each domain and eachuse case in the overall score of the applications

7 Conclusions

The main contribution of the TRIANGLE project is theprovision of a framework that generalizes QoE computation

and enables the execution of extensive and repeatable testcampaigns to obtainmeaningfulQoE scoresTheTRIANGLEproject has also defined amethodology which is based on thetransformation and aggregation of KPIs its transformationinto synthetic-MOS values and its aggregation over thedifferent domains and use cases

The TRIANGLE approach is a methodology flexibleenough to generalize the computation of QoE for any applica-tionservice Themethodology has been validated testing theDASH implementation in the Exoplayer App To confirm thesuitability of theweights used in the averaging process and theinterpolation parameters as well as to verify the correlationof the obtained MOS with that scored by users the authorshave started experiments with real users and initial results areencouraging

The process described produces a final TRIANGLEmarka single quality score which could eventually be used to cer-tify applications after achieving a consensus on the differentvalues of the process (weights limits etc) to use

Data Availability

Themethodology and results used to support the findings ofthis study are included within the article

Conflicts of Interest

The authors declare that they have no conflicts of inter-est

Wireless Communications and Mobile Computing 11

DER

AUE AEC

Device Resource Usage

User Experience Energy Consumption

Figure 7 Exoplayer synthetic-MOS values per domain

Acknowledgments

The TRIANGLE project is funded by the European UnionrsquosHorizon 2020 Research and Innovation Programme (GrantAgreement no 688712)

References

[1] ETSI ldquoHuman factors quality of experience (QoE) require-ments for real-time communication servicesrdquo Tech Rep 102643 2010

[2] ITU-T ldquoP10G100 (2006) amendment 1 (0107) new appendixI - definition of quality of experience (QoE)rdquo 2007

[3] F Kozamernik V Steinmann P Sunna and E WyckensldquoSAMVIQ - A new EBUmethodology for video quality evalua-tions in multimediardquo SMPTE Motion Imaging Journal vol 114no 4 pp 152ndash160 2005

[4] ITU-T ldquoG107 the E-model a computational model for use intransmission planningrdquo 2015

[5] J De Vriendt D De Vleeschauwer and D C Robinson ldquoQoEmodel for video delivered over an LTE network using HTTPadaptive streamingrdquo Bell Labs Technical Journal vol 18 no 4pp 45ndash62 2014

[6] S Jelassi G Rubino H Melvin H Youssef and G PujolleldquoQuality of Experience of VoIP Service A Survey of AssessmentApproaches andOpen Issuesrdquo IEEECommunications Surveys ampTutorials vol 14 no 2 pp 491ndash513 2012

[7] M Li C-L Yeh and S-Y Lu ldquoReal-Time QoE MonitoringSystem forVideo Streaming ServiceswithAdaptiveMedia Play-outrdquo International Journal of Digital Multimedia Broadcastingvol 2018 Article ID 2619438 11 pages 2018

[8] S Barakovic and L Skorin-Kapov ldquoSurvey and Challengesof QoE Management Issues in Wireless Networksrdquo Journal ofComputer Networks and Communications vol 2013 Article ID165146 28 pages 2013

[9] ITU-T ldquoG1030 estimating end-to-end performance in IPnetworks for data applicationsrdquo 2014

[10] ITU-T ldquoG1031 QoE factors in web-browsingrdquo 2014[11] EU H2020 TRIANGLE Project Deliverable D22 Final report

on the formalization of the certification process requirementsanduse cases 2017 httpswwwtriangle-projecteuproject-olddeliverables

[12] Q A Chen H Luo S Rosen et al ldquoQoE doctor diagnosingmobile app QoE with automated UI control and cross-layeranalysisrdquo in Proceedings of the Conference on Internet Mea-surement Conference (IMC rsquo14) pp 151ndash164 ACM VancouverCanada November 2014

[13] M A Mehmood A Wundsam S Uhlig D Levin N Sarrarand A Feldmann ldquoQoE-Lab Towards Evaluating Quality ofExperience for Future Internet Conditionsrdquo in Testbeds andResearch Infrastructure Korakis T Li H Tran-Gia P and HS Park Eds vol 90 of TridentCom 2011 Lnicst pp 286ndash301Springer Development of Networks and Communities BerlinGermany 2012

[14] D Levin A Wundsam A Mehmood and A FeldmannldquoBerlin The Berlin Experimental Router Laboratory for Inno-vative Networkingrdquo in TridentCom 2010 Lnicst T MagedanzA Gavras N H Thanh and J S Chase Eds vol 46 of LectureNotes of the Institute for Computer Sciences Social Informaticsand Telecommunications Engineering pp 602ndash604 SpringerHeidelberg Germany 2011

12 Wireless Communications and Mobile Computing

[15] K De Moor I Ketyko W Joseph et al ldquoProposed frameworkfor evaluating quality of experience in a mobile testbed-oriented living lab settingrdquo Mobile Networks and Applicationsvol 15 no 3 pp 378ndash391 2010

[16] R Sanchez-Iborra M-D Cano J J P C Rodrigues and JGarcia-Haro ldquoAnExperimental QoE Performance Study for theEfficient Transmission of High Demanding Traffic over an AdHoc Network Using BATMANrdquo Mobile Information Systemsvol 2015 Article ID 217106 14 pages 2015

[17] P Oliver-Balsalobre M Toril S Luna-Ramırez and R GarcıaGaraluz ldquoA system testbed for modeling encrypted video-streaming service performance indicators based on TCPIPmetricsrdquo EURASIP Journal on Wireless Communications andNetworking vol 2017 no 1 2017

[18] M Solera M Toril I Palomo G Gomez and J Poncela ldquoATestbed for Evaluating Video Streaming Services in LTErdquoWireless Personal Communications vol 98 no 3 pp 2753ndash27732018

[19] A Alvarez A Dıaz P Merino and F J Rivas ldquoField mea-surements of mobile services with Android smartphonesrdquoin Proceedings of the IEEE Consumer Communications andNetworking Conference (CCNC rsquo12) pp 105ndash109 Las Vegas NevUSA January 2012

[20] NGMN Alliance ldquoNGMN 5G white paperrdquo 2015 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2015NGMN 5G White Paper V1 0pdf

[21] ldquoInfrastructure and Design for Adaptivity and Flexibilityrdquo inMobile Information Systems Springer 2006

[22] J Nielsen ldquoResponse Times The Three Important Limitsrdquo inUsability Engineering 1993

[23] NGMN Alliance ldquoDefinition of the testing framework for theNGMN 5G pre-commercial networks trialsrdquo 2018 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2018180220 NGMN PreCommTrials Framework definition v1 0pdf

[24] 3GPP TS 26246 ldquoTransparent end-to-end Packet-switchedStreaming Services (PSS) Progressive Download and DynamicAdaptive Streaming over HTTP (3GP-DASH)rdquo 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: QoE Evaluation: The TRIANGLE Testbed Approach

Wireless Communications and Mobile Computing 3

velocity on-body sensors etc) and QoS parameters (CPUmemory signal strength throughput etc) and provides aninterface to collect user experience feedback A processingentity receives the (device and network) monitored data andanalyzes the incoming data The objective of this testbedinfrastructure is to study the effects of different networkparameters in the QoE in order to define new estimationmodels for QoE

In [16] the authors evaluated routing protocols BATMANand OLSR to support VoIP and video traffic from a QoSand QoE perspective The evaluation took place by runningexperiments in two different testbeds First experimentswere run in the Omnet++ simulator using the InetManetframework Second the same network topology and networkscenarios were deployed in the Emulab test bench a real(emulated) testbed and the same experiments were carriedout Finally the results of both testbeds (simulated andreal-emulated) were statistically compared in order to findinconsistencies The experiments in the simulated and emu-lated environments showed that BATMAN achieves betterthan OLSR and determined the relation between differentprotocol parameters and their performanceThese results canbe applied to implement network nodes that control in-stackprotocol parameters as a function of the observed traffic

In [17] a testbed to automatically extract a QoE modelof encrypted video streaming services was presented Thetestbed includes a software agent to be installed in theuser device which is able to reproduce the user interactionand collect the end-user application-level measurements thenetwork emulator NetEm which changes the link conditionsemulating the radio or core network and a Probe softwarewhich processes all the traffic at different levels computesthe TCPIP metrics and compares the end-user and networklevel measurements This testbed has been used to automat-ically construct the model (and validate the model) of thevideo performance of encrypted YouTube traffic over a Wi-Fi connection

More recently in [18] Solera et al presented a testbedfor evaluating video streaming services in LTE networks Inparticular the QoE of 3D video streaming services over LTEwas evaluated The testbed consists of a streaming serverthe NetEm network emulator and a streaming client Oneof the main contributions of the work is the extension ofNetEm to better model the characteristics of the packet delayin bursty services such as video streaming Previously torunning the experiments in the emulation-based testbedthe authors carried out a simulation campaign with anLTE simulator to obtain the configuration parameters ofNetEm for four different network scenarios These scenarioscombine different positions of the user in the cell anddifferent network loads From the review of these works itbecomes clear that the setup of a simulation or emulationframework for wireless or mobile environments requires inmany cases a deep understanding of the network scenariosTRIANGLE aims to reduce this effort by providing a set ofpreconfigured real network scenarios and the computationof the MOS in order to allow both researchers and appdevelopers to focus on the evaluation of new apps servicesand devices

3 TRIANGLE

The testbed the test methodology and the set of test caseshave been developed within the European funded TRIAN-GLE project Figure 1 shows the main functional blocks thatmake up the TRIANGLE testbed architecture

To facilitate the use of the TRIANGLE testbed fordifferent objectives (testing benchmarking and certifying)to remotely access the testbed and to gather and presentresults a web portal which offers an intuitive interfacehas been implemented It provides access to the testbedhiding unnecessary complexity to App developers Foradvanced users interested in deeper access to configurationparameters of the testbed elements or the test cases thetestbed offers a direct access to the Keysight TAP (TestingAutomation Platform) which is a programmable sequencerof actions with plugins that expose the configuration andcontrol of the instruments and tools integrated into thetestbed

In addition to the testbed itself TRIANGLE has devel-oped a test methodology and has implemented a set oftest cases which are made available through the portal Toachieve full test case automation all the testbed componentsare under the control of the testbed management frame-work which coordinates their configuration and executionprocesses the measurements made in each test case andcomputes QoE scores for the application tested

In addition as part of the testbed management frame-work each testbed component is controlled through a TAPdriver which serves as bridge between the TAP engine andthe actual component interface The configuration of thedifferent elements of the testbed is determined by the testcase to run within the set of test cases provided as partof TRIANGLE or the customized test cases built by usersThe testbed translates the test cases specific configurationssettings and actions into TAP commands that take care ofcommanding each testbed component

TRIANGLE test cases specify the measurements thatshould be collected to compute the KPI (Key PerformanceIndicators) of the feature under test Some measurementsare obtained directly from measurement instruments butothers require specific probes (either software or hardware)to help extract the specific measurements Software probesrunning on the same device (UE LTE User Equipment) thatthe application under test include DEKRA Agents and theTestelDroid [19] tool from UMA TRIANGLE also providesan instrumentation library so that app developers can delivermeasurement outputs which cannot otherwise be extractedand must be provided by the application itself Hardwareprobes include a power analyzer connected to the UE tomeasure power consumption and the radio access emulatorthat among others provides internal logs about the protocolexchange and radio interface low layers metrics

The radio access (LTE RAN) emulator plays a key role inthe TRIANGLE testbed The testbed RAN is provided by anoff-the-shelf E7515A UXM Wireless Test Set from Keysightan emulator that provides state-of-the-art test features Mostimportant the UXM also provides radio channel emulationfor the downlink radio channel

4 Wireless Communications and Mobile Computing

DBs

Interface and visualization (Portal)

Testbed management

Measurements and datacollections

UE

RAN

Transport

AppEPC Local application

serversApp App

TAP

Orcomposutor

Orchestration

Compositor ExecutorDBs

ETLframework

ETL modules

DEKRADriver

WebDriverDriver

AndroidTap Driver

EPCDriver

AppInstrumentat

ion TAPDriver

iOS TAPDriver

hellipTAP Driver

Figure 1 TRIANGLE testbed architecture

In order to provide an end-to-end system the testbedintegrates a commercial EPC (LTE Evolved Packet Core)from Polaris Networks which includes the main elements ofa standard 3GPP compliant LTE core network that is MME(Mobility Management Entity) SGW (Serving Gateway)PGW (Packet Gateway) HSS (Home Subscriber Server) andPCRF (Policy and Charging Rules Function) In additionthis EPC includes the EPDG (Evolved Packet Data Gateway)and ANDSF (Access Network Discovery and Session Func-tion) components for dual connectivity scenarios The RANemulator is connected to the EPC through the standard S1interface The testbed also offers the possibility of integratingartificial impairments in the interfaces between the corenetwork and the application servers

The Quamotion WebDriver another TRIANGLE ele-ment is able to automate user actions on both iOS andAndroid applications whether they are native hybrid offully web-based This tool is also used to prerecord the apprsquosuser flows which are needed to automate the otherwisemanual user actions in the test cases This completes the fullautomation operation

Finally the testbed also incorporates commercial mobiledevices (UEs) The devices are physically connected to thetestbed In order to preserve the radio conditions configuredat the radio access emulator the RAN emulator is cable con-ducted to the mobile device antenna connector To accuratelymeasure the power consumption theN6705B power analyzerdirectly powers the device Other measurement instrumentsmay be added in the future

4 TRIANGLE Approach

TheTRIANGLE testbed is an end-to-end framework devotedto testing and benchmarking mobile applications servicesand devices The idea behind the testing approach adoptedin the TRIANGLE testbed is to generalize QoE computationand provide a programmatic way of computing it Withthis approach the TRIANGLE testbed can accommodate thecomputation of the QoE for any application

The basic concept in TRIANGLErsquos approach to QoEevaluation is that the quality perceived by the user dependson many aspects (herein called domains) and that thisperception depends on its targeted use case For examplebattery life is critical for patient monitoring applications butless important in live streaming ones

To define the different 5G uses cases TRIANGLE basedits work in the Next Generation Mobile Network (NGMN)Alliance foundational White Paper which specifies theexpected services and network performance in future 5Gnetworks [20] More precisely the TRIANGLE project hasadopted a modular approach subdividing the so-calledldquoNGMN Use-Casesrdquo into blocks The name Use Case waskept in the TRIANGLE approach for describing the appli-cation service or vertical using the network services Thediversification of services expected in 5G requires a concretecategorization to have a sharp picture of what the user will beexpected to interact with This is essential for understandingwhich aspect of the QoE evaluation needs to be addressedThe final use cases categorization was defined in [11] andencompasses both the services normally accessible via mobile

Wireless Communications and Mobile Computing 5

Synthetic MOS

Test case

Scenario 1

Iteration 1

Meas 1

Meas

Meas P

Iteration

Meas 1

Meas

Meas P

Iteration N

Meas 1

Meas

Meas P

Scenario K

Iteration 1

Meas 1

Meas

Meas P

Iteration

Meas 1

Meas

Meas P

Iteration N

Meas 1

Meas

Meas P

KPI 1 Synthetic MOS 1

KPI Synthetic MOS

KPI R Synthetic MOS R

Aggregation Synthetic MOS Scenario 1

KPI 1 Synthetic MOS 1

KPI Synthetic MOS

KPI R Synthetic MOS R

Aggregation Synthetic MOS Scenario K

Figure 2 The process to obtain the synthetic-MOS score in a TRIANGLE test case

Table 1 Uses cases defined in the TRIANGLE project

Identifier Use CaseVR Virtual RealityGA GamingAR Augmented RealityCS Content Distribution Streaming ServicesLS Live Streaming ServicesSN Social NetworkingHS High Speed InternetPM Patient MonitoringES Emergency ServicesSM Smart MeteringSG Smart GridsCV Connected Vehicles

phones (UEs) and the ones that can be integrated in forexample gaming consoles advanced VR gear car units orIoT systems

The TRIANGLE domains group different aspects thatcan affect the final QoE perceived by the users The cur-rent testbed implementation supports three of the severaldomains that have been identified Apps User Experience(AUE) Apps Energy consumption (AEC) and ApplicationsDevice Resources Usage (RES)

Table 1 provides the use cases and Table 2 lists thedomains initially considered in TRIANGLE

To produce data to evaluate the QoE a series of testcases have been designed developed and implemented tobe run on the TRIANGLE testbed Obviously not all testcases are applicable to all applications under test becausenot all applications need or are designed to support all thefunctionalities that can be tested in the testbed In orderto automatically determine the test cases that are applicableto an application under test a questionnaire (identified as

features questionnaire in the portal) equivalent to the classi-cal conformance testing ICS (Implementation ConformanceStatement) has been developed and is accessible through theportal After filling the questionnaire the applicable test planthat is the test campaign with the list of applicable test casesis automatically generated

The sequence of user actions (type swipe tap etc) a userneeds to perform in the terminal (UE) to complete a task (egplay a video) is called the ldquoapp user flowrdquo In order to be ableto automatically run a test case the actual application userflow with the user actions a user would need to perform onthe phone to complete certain tasks defined in the test casealso has to be provided

Each test case univocally defines the conditions of execu-tion the sequence of actions the user would perform (ie theapp user flow) the sequence of actions that the elements ofthe testbed must perform the traffic injected the collectionof measurements to take and so forth In order to obtainstatistical significance each test case includes a numberof executions (iterations) under certain network conditions(herein called scenarios) Out of the various measurementsmade in the different iterations under any specific networkconditions (scenario) a number of KPIs (Key PerformanceIndicators) are computed The KPIs are normalized into astandard 1-to-5 scale as typically used in MOS scores andreferred to as synthetic-MOS a terminology that has beenadopted from previous works [7 21] The synthetic-MOSvalues are aggregated across network scenarios to produce anumber of intermediate synthetic-MOS scores which finallyare aggregated to obtain a synthetic-MOS score in each testcase (see Figure 2)

The process to obtain the final TRIANGLE mark issequential First for each domain a weighted average ofthe synthetic-MOS scores obtained in each test case inthe domain is calculated Next a weighted average of thesynthetic-MOS values in all the domains of a use case iscalculated to provide a single synthetic-MOS value per use

6 Wireless Communications and Mobile Computing

Table 2 TRIANGLE domains

Category Identifier Domain

Applications

AUE Apps User experienceAEC Apps Energy consumptionRES Device Resources UsageREL ReliabilityNWR Network Resources

Devices

Mobile Devices

DEC Energy ConsumptionDDP Data PerformanceDRF Radio PerformanceDRA User experience with reference apps

IoT DevicesIDR ReliabilityIDP Data PerformanceIEC Energy consumption

Synthetic MOS Domain A Use Case X

Synthetic MOS Domain B Use Case X

Synthetic MOS Use Case X

Synthetic MOS Use Case Y

TRIANGLE MARK

App

Synthetic MOS Test CaseDomain AUse Case X01

Synthetic MOS Test CaseDomain AUse Case X02

Synthetic MOS Test CaseDomain BUse Case X01

Synthetic MOS Test CaseDomain BUse Case X02

Synthetic MOS Test CaseDomain AUse Case Y01 Synthetic MOS Domain A Use Case Y

Synthetic MOS Test CaseDomain BUse Case Y01 Synthetic MOS Domain B Use Case Y

Figure 3 The process to obtain the TRIANGLE mark

case An application will usually be developed for one specificuse case as those defined in Table 1 but may be designed formore than one use case In the latter case a further weightedaverage is made with the synthetic-MOS scores obtained ineach use case supported by the application These sequentialsteps produce a single TRIANGLE mark an overall qualityscore as shown in Figure 3

This approach provides a common framework for testingapplications for benchmarking applications or even forcertifying disparate applications The overall process for anapp that implements features of different use cases is depictedin Figure 3

5 Details of the TRIANGLE QoE Computation

For each use case identified (see Table 1) and domain (seeTable 2) a number of test cases have been developed withinthe TRIANGLE project Each test case intends to test an

individual feature aspect or behaviour of the applicationunder test as shown in Figure 4

Each test case defines a number of measurements andbecause the results of the measurements depend on manyfactors they are not in general deterministic and thuseach test case has been designed not to perform just onesingle measurement but to run a number of iterations (N)of the same measurement Out of those measurements KPIsare computed For example if the time to load the firstmedia frame is the measurement taken in one specific testcase the average user waiting time KPI can be calculated bycomputing the mean of the values across all iterations Ingeneral different use case-domain pairs have a different set ofKPIsThe reader is encouraged to read [11] for further detailsabout the terminology used in TRIANGLE

Recommendation P10G100 Amendment 1 Definition ofQuality of Experience [2] notes that the overall acceptabilitymay be influenced by user expectations and context Forthe definition of the context technical specifications ITU-T

Wireless Communications and Mobile Computing 7

Feature Non-interactiveplayback

bullTime to load firstmedia frame

bullPlayback Cut-off

bullVideo resolution

bullContent Stall

i) Uses cases

vi) Synthetic MOS iv) Context

Networkscenarios

App User Flow Measurements KPIs MOS

ii) Domains

Apps Energy Consumption

Apps Device Resources

Apps User Experience

Mea

sure

men

ts

v) Test case execution

iii)Test caseAUECS01

Figure 4 QoE computation steps

G1030 ldquoEstimating end-to-end performance in IP networksfor data applicationsrdquo [9] and ITU-T G1031 ldquoQoE factors inweb-browsingrdquo [10] have been considered in TRIANGLEIn particular ITU-T G1031 [10] identifies the following con-text influence factors location (cafeteria office and home)interactivity (high-level interactivity versus low-level inter-activity) task type (business entertainment etc) and taskurgency (urgent versus casual) Userrsquos influence factors arehowever outside of the scope of the ITU recommendation

In the TRIANGLE project the context information hasbeen captured in the networks scenarios defined (Urban -Internet Cafe Off Peak Suburban - Shopping Mall BusyHours Urban ndash Pedestrian Urban ndash Office High speed trainndash Relay etc) and in the test cases specified in [11]

The test cases specify the conditions of the test butalso a sequence of actions that have to be executed by theapplication (app user flows) to test its features For examplethe test case that tests the ldquoPlay and Pauserdquo functionalitydefines the app user flow shown in Figure 5

The transformation of KPIs into QoE scores is the mostchallenging step in the TRIANGLE framework The execu-tion of the test cases will generate a significant amount of rawmeasurements about several aspects of the system SpecificKPIs can then be extracted through statistical analysis meandeviation cumulative distribution function (CDF) or ratio

TheKPIs will be individually interpolated in order to pro-vide a common homogeneous comparison and aggregationspace The interpolation is based on the application of twofunctions named Type I and Type II By using the proposedtwo types of interpolations the vast majority of KPIs can betranslated into normalized MOS-type of metric (synthetic-MOS) easy to be averaged in order to provide a simpleunified evaluation

Type I This function performs a linear interpolation on theoriginal data The variables 119898119894119899

119870119875119868and119898119886119909

119870119875119868are the worst

and best known values of a KPI from a reference case The

Perform login step (ifrequired) and wait for

10 seconds

Start playing avideo of 5

minutes during10 seconds

Pause thereproduction

Resume thereproduction after

2 minutes anduntil the end of

the video

Figure 5 App user flow used in the ldquoAUECS02 Play and Pauserdquotest case

function maps a value v of a KPI to vrsquo (synthetic-MOS) inthe range [1-to-5] by computing the following formula

V1015840 =V minus 119898119894119899

119870119875119868

119898119886119909119870119875119868minus 119898119894119899

119870119875119868

(50 minus 10) + 10 (1)

This function transforms a KPI to a synthetic-MOS value byapplying a simple linear interpolation between the worst andbest expected values from a reference case If a future inputcase falls outside the data range of the KPI the new value will

8 Wireless Communications and Mobile Computing

Table 3 AUECS002 test case description

Identifier AUECS002 (App User ExperienceContent Streaming002)Title Play and pauseObjective Measure the ability of the AUT to pause and the resume a media fileApplicability (ICSG ProductType = Application) AND (ICSG UseCases includes CS) AND ICSA CSPauseInitial Conditions AUT in in [AUT STARTED] mode (Note Defined in D22 [11] Appendix 4)

Steps(1) The Test System commands the AUT to replay the Application User Flow (Application User Flow that

presses first the Play button and later the Pause button)(2) The Test System measures whether pause operation was successful or not

Postamble (i) Execute the Postamble sequence (see section 26 in D22 [11] Appendix 4)

Measurements (Raw)

(i) Playback Cut-off Probability that successfully started stream reproduction is ended by a cause other thanthe intentional termination by the user

(ii) Pause Operation Whether pause operation is successful or not(iii) Time to load first media frame (s) after resuming The time elapsed since the user clicks resume button

until the media reproduction starts(Note For Exoplayer the RESUME button is the PLAY button)

be set to the extreme value minKPI (if it is worse) or maxKPI(if it is better)

Type II This function performs a logarithmic interpolationand is inspired on the opinion model recommended by theITU-T in [9] for a simple web search taskThis function mapsa value v of a KPI to vrsquo (synthetic-MOS) in the range [1-to-5]by computing the following formula

V1015840 =50 minus 10

ln ((119886 lowast 119908119900119903119904119905119870119875119868+ 119887) 119908119900119903119904119905

119870119875119868)

∙ (ln (V) minus ln (119886 lowast 119908119900119903119904119905119870119875119868+ 119886)) + 5

(2)

The default values of 119886 and 119887 correspond to the simple websearch task case (119886 = 0003 and 119887 = 012) [9 22] and theworst value has been extracted from the ITU-T G1030 Ifduring experimentation a future input case falls outside thedata range of the KPI the parameters 119886 and 119887will be updatedaccordingly Likewise if through subjective experimentationother values are considered better adjustments for specificservices the function can be easily updated

Once all KPIs are translated into synthetic-MOS valuesthey can be averaged with suitable weights In the averagingprocess the first step is to average over the network scenariosconsidered relevant for the use case as shown in Figure 2This provides the synthetic-MOS output value for the testcase If there is more than one test case per domain which isgenerally the case a weighted average is calculated in order toprovide one synthetic-MOS value per domain as depicted inFigure 3The final step is to average the synthetic-MOS scoresover all use cases supported by the application (see Figure 3)This provides the final score that is the TRIANGLE mark

6 A Practical Case Exoplayer under Test

For better understanding the complete process of obtainingtheTRIANGLEmark for a specific application the Exoplayer

Table4Measurement points associatedwith test caseAUECS002

Measurements Measurement points

Time to load first media frame Media File Playback - StartMedia File Playback - First Picture

Playback cut-off Media File Playback - StartMedia File Playback - End

Pause Media File Playback - Pause

is described in this section This application only has one usecase content distribution streaming services (CS)

Exoplayer is an application levelmedia player forAndroidpromoted by Google It provides an alternative to AndroidrsquosMediaPlayer API for playing audio and video both locally andover the Internet Exoplayer supports features not currentlysupported by Androidrsquos MediaPlayer API including DASHand SmoothStreaming adaptive playbacks

The TRIANGLE project has concentrated in testing justtwo of the Exoplayer features ldquoNoninteractive Playbackrdquoand ldquoPlay and Pauserdquo These features result in 6 test casesapplicable out of the test cases defined in TRIANGLETheseare test cases AUECS001 and AUECS002 in the App UserExperience domain test casesAECCS001 andAECCS002in the App Energy Consumption domain and test casesRESCS001 and RESCS002 in the Device Resources Usagedomain

The AUECS002 ldquoPlay and Pauserdquo test case descriptionbelonging to the AUE domain is shown in Table 3 The testcase description specifies the test conditions the generic appuser flow and the rawmeasurements which shall be collectedduring the execution of the test

The TRIANGLE project also offers a library that includesthe measurement points that should be inserted in thesource code of the app for enabling the collection of themeasurements specified Table 4 shows the measurementpoints required to compute the measurements specified intest case AUECS002

Wireless Communications and Mobile Computing 9

Table 5 Reference values for interpolation

Feature Domain KPI Synthetic MOS Calculation KPI min KPI maxNon-Interactive Playback AEC Average power consumption Type I 10 W 08 WNon-Interactive Playback AUE Time to load first media frame Type II KPI worst=20 msNon-Interactive Playback AUE Playback cut-off ratio Type I 50 0Non-Interactive Playback AUE Video resolution Type I 240p 720pNon-Interactive Playback RES Average CPU usage Type I 100 16Non-Interactive Playback RES Average memory usage Type I 100 40Play and Pause AEC Average power consumption Type I 10 W 08 WPlay and Pause AUE Pause operation success rate Type I 50 100Play and Pause RES Average CPU usage Type I 100 16Play and Pause RES Average memory usage Type I 100 40

The time to load first media picture measurement isobtained subtracting the timestamp of the measurementpoint ldquoMedia File Playback ndash Startrdquo from the measurementpoint ldquoMedia File Playback ndash First Picturerdquo

As specified in [11] all scenarios defined are applicableto the content streaming use case Therefore test cases inthe three domains currently supported by the testbed areexecuted in all the scenarios

Once the test campaign has finished the raw measure-ment results are processed to obtain the KPIs associated witheach test case average current consumption average time toload first media frame average CPU usage and so forth Theprocesses applied are detailed in Table 5 Based on previousexperiments performed by the authors the behaviour of thetime to load the first media frame KPI resembles the webresponse time KPI (ie the amount of time the user hasto wait for the service) and thus as recommended in theopinionmodel forweb search introduced in [9] a logarithmicinterpolation (type II) has been used for this metric

The results of the initial process that is the KPIs compu-tation are translated into synthetics-MOS values To computethese values reference benchmarking values for each of theKPIs need to be used according to the normalization andinterpolation process described in Section 5 Table 5 showswhat has been currently used by TRIANGLE for the AppUser Experience domain which is also used by NGMN asreference in their precommercial Trials document [23]

For example for the ldquotime to load first media framerdquo KPIshown in Table 5 the type of aggregation applied is averagingand the interpolation formula used is Type II

To achieve stable results each test case is executed 10times (10 iterations) in each network scenario The synthetic-MOS value in each domain is calculated by averaging themeasured synthetic-MOS values in the domain For examplesynthetic-MOS value is the RES domain obtained by aver-aging the synthetic-MOS value of ldquoaverage CPU usagerdquo andldquoaverage memory usagerdquo from the two test cases

Although Exoplayer supports several video streamingprotocols in this work only DASH [24] (Dynamic AdaptiveStreaming over HTTP) has been tested DASH clients shouldseamlessly adapt to changing network conditions by makingdecisions on which video segment to download (videosare encoded at multiple bitrates) The Exoplayerrsquos default

000

000

0

001

728

0

003

456

0

005

184

0

010

912

0

012

640

0

014

368

0

020

096

0

Timestamp

Video Resolution

0

200

400

600

800

1000

1200

Hor

izon

tal R

esol

utio

n

Figure 6 Video Resolution evolution in the Driving Urban Normalscenario

adaptation algorithm is basically throughput-based and someparameters control how often and when switching can occur

During the testing the testbed was configured with thedifferent network scenarios defined in [11] In these scenariosthe network configuration changes dynamically following arandom pattern resulting in different maximum throughputrates The expected behaviour of the application under testis that the video streaming client adapts to the availablethroughput by decreasing or increasing the resolution of thereceived video Figure 6 depicts how the client effectivelyadapts to the channel conditions

However the objective of the testing carried out in theTRIANGE testbed is not just to verify that the video stream-ing client actually adapts to the available maximum through-put but also to check whether this adaptation improves theusersrsquo experience quality

Table 6 shows a summary of the synthetic-MOS valuesobtained per scenario in one test case of each domain Thescores obtained in the RES andAECdomains are always highIn the AUE domain the synthetic MOS associated with theVideo Resolution shows low scores in some of the scenariosbecause the resolution decreases reasonable good scores inthe time to load first media and high scores in the time toplayback cut-off ratio Overall it can be concluded that the

10 Wireless Communications and Mobile Computing

Table 6 Synthetic MOS values per test case and scenario for the feature ldquoNoninteractive Playbackrdquo

AUE domain AEC domain RES domain

Test Case AUECS001 Test CaseAECCS001 Test Case RESCS001

ScenarioTime to loadfirst mediaframe

PlaybackCut-off ratio

VideoResolution

mode

AveragePower

Consumption

Average CPUUsage

AverageRAM Usage

HighSpeed DirectPassenger 21 31 23 47 43 42

Suburban Festival 38 47 31 48 43 41Suburban shopping mallbusy hours 37 37 13 48 44 41

Suburban shopping malloff-peak 36 31 23 48 43 41

Suburban stadium 38 29 21 47 44 41Urban Driving Normal 26 39 28 47 44 4Urban Driving TrafficJam 34 37 16 48 44 4

Urban Internet Cafe BusyHours 38 37 19 48 44 4

Urban Internet Cafe OffPeak 38 31 23 48 43 4

Urban Office 38 47 33 48 45 43Urban Pedestrian 39 26 2 47 44 4

35 36 23 47 44 41

DASH implementation of the video streaming client undertest is able to adapt to the changing conditions of the networkmaintaining an acceptable rate of video cut-off rebufferingtimes and resources usage

The final score in each domain is obtained by averagingthe synthetic-MOS values from all the tested network scenar-ios Figure 7 shows the spider diagram for the three domainstested In the User Experience domain the score obtained islower than the other domains due to the low synthetic-MOSvalues obtained for the video resolution

The final synthetic MOS for the use case Content Dis-tribution Streaming is obtained as a weighted average of thethree domains representing the overall QoE as perceived bythe userThefinal score for the Exoplayer version 1516 and thefeatures tested (Noninteractive Playback and Play and Pause)is 42 which means that the low score obtained in the videoresolution is compensated with the high scores in other KPIs

If an application under test has more than one use casethe next steps in the TRIANGLE mark project approachwould be the aggregation per use case and the aggregationover all use cases The final score the TRIANGLE mark is anestimation of the overall QoE as perceived by the user

In the current TRIANGLE implementation the weightsin all aggregations are the same Further research is neededto appropriately define the weights of each domain and eachuse case in the overall score of the applications

7 Conclusions

The main contribution of the TRIANGLE project is theprovision of a framework that generalizes QoE computation

and enables the execution of extensive and repeatable testcampaigns to obtainmeaningfulQoE scoresTheTRIANGLEproject has also defined amethodology which is based on thetransformation and aggregation of KPIs its transformationinto synthetic-MOS values and its aggregation over thedifferent domains and use cases

The TRIANGLE approach is a methodology flexibleenough to generalize the computation of QoE for any applica-tionservice Themethodology has been validated testing theDASH implementation in the Exoplayer App To confirm thesuitability of theweights used in the averaging process and theinterpolation parameters as well as to verify the correlationof the obtained MOS with that scored by users the authorshave started experiments with real users and initial results areencouraging

The process described produces a final TRIANGLEmarka single quality score which could eventually be used to cer-tify applications after achieving a consensus on the differentvalues of the process (weights limits etc) to use

Data Availability

Themethodology and results used to support the findings ofthis study are included within the article

Conflicts of Interest

The authors declare that they have no conflicts of inter-est

Wireless Communications and Mobile Computing 11

DER

AUE AEC

Device Resource Usage

User Experience Energy Consumption

Figure 7 Exoplayer synthetic-MOS values per domain

Acknowledgments

The TRIANGLE project is funded by the European UnionrsquosHorizon 2020 Research and Innovation Programme (GrantAgreement no 688712)

References

[1] ETSI ldquoHuman factors quality of experience (QoE) require-ments for real-time communication servicesrdquo Tech Rep 102643 2010

[2] ITU-T ldquoP10G100 (2006) amendment 1 (0107) new appendixI - definition of quality of experience (QoE)rdquo 2007

[3] F Kozamernik V Steinmann P Sunna and E WyckensldquoSAMVIQ - A new EBUmethodology for video quality evalua-tions in multimediardquo SMPTE Motion Imaging Journal vol 114no 4 pp 152ndash160 2005

[4] ITU-T ldquoG107 the E-model a computational model for use intransmission planningrdquo 2015

[5] J De Vriendt D De Vleeschauwer and D C Robinson ldquoQoEmodel for video delivered over an LTE network using HTTPadaptive streamingrdquo Bell Labs Technical Journal vol 18 no 4pp 45ndash62 2014

[6] S Jelassi G Rubino H Melvin H Youssef and G PujolleldquoQuality of Experience of VoIP Service A Survey of AssessmentApproaches andOpen Issuesrdquo IEEECommunications Surveys ampTutorials vol 14 no 2 pp 491ndash513 2012

[7] M Li C-L Yeh and S-Y Lu ldquoReal-Time QoE MonitoringSystem forVideo Streaming ServiceswithAdaptiveMedia Play-outrdquo International Journal of Digital Multimedia Broadcastingvol 2018 Article ID 2619438 11 pages 2018

[8] S Barakovic and L Skorin-Kapov ldquoSurvey and Challengesof QoE Management Issues in Wireless Networksrdquo Journal ofComputer Networks and Communications vol 2013 Article ID165146 28 pages 2013

[9] ITU-T ldquoG1030 estimating end-to-end performance in IPnetworks for data applicationsrdquo 2014

[10] ITU-T ldquoG1031 QoE factors in web-browsingrdquo 2014[11] EU H2020 TRIANGLE Project Deliverable D22 Final report

on the formalization of the certification process requirementsanduse cases 2017 httpswwwtriangle-projecteuproject-olddeliverables

[12] Q A Chen H Luo S Rosen et al ldquoQoE doctor diagnosingmobile app QoE with automated UI control and cross-layeranalysisrdquo in Proceedings of the Conference on Internet Mea-surement Conference (IMC rsquo14) pp 151ndash164 ACM VancouverCanada November 2014

[13] M A Mehmood A Wundsam S Uhlig D Levin N Sarrarand A Feldmann ldquoQoE-Lab Towards Evaluating Quality ofExperience for Future Internet Conditionsrdquo in Testbeds andResearch Infrastructure Korakis T Li H Tran-Gia P and HS Park Eds vol 90 of TridentCom 2011 Lnicst pp 286ndash301Springer Development of Networks and Communities BerlinGermany 2012

[14] D Levin A Wundsam A Mehmood and A FeldmannldquoBerlin The Berlin Experimental Router Laboratory for Inno-vative Networkingrdquo in TridentCom 2010 Lnicst T MagedanzA Gavras N H Thanh and J S Chase Eds vol 46 of LectureNotes of the Institute for Computer Sciences Social Informaticsand Telecommunications Engineering pp 602ndash604 SpringerHeidelberg Germany 2011

12 Wireless Communications and Mobile Computing

[15] K De Moor I Ketyko W Joseph et al ldquoProposed frameworkfor evaluating quality of experience in a mobile testbed-oriented living lab settingrdquo Mobile Networks and Applicationsvol 15 no 3 pp 378ndash391 2010

[16] R Sanchez-Iborra M-D Cano J J P C Rodrigues and JGarcia-Haro ldquoAnExperimental QoE Performance Study for theEfficient Transmission of High Demanding Traffic over an AdHoc Network Using BATMANrdquo Mobile Information Systemsvol 2015 Article ID 217106 14 pages 2015

[17] P Oliver-Balsalobre M Toril S Luna-Ramırez and R GarcıaGaraluz ldquoA system testbed for modeling encrypted video-streaming service performance indicators based on TCPIPmetricsrdquo EURASIP Journal on Wireless Communications andNetworking vol 2017 no 1 2017

[18] M Solera M Toril I Palomo G Gomez and J Poncela ldquoATestbed for Evaluating Video Streaming Services in LTErdquoWireless Personal Communications vol 98 no 3 pp 2753ndash27732018

[19] A Alvarez A Dıaz P Merino and F J Rivas ldquoField mea-surements of mobile services with Android smartphonesrdquoin Proceedings of the IEEE Consumer Communications andNetworking Conference (CCNC rsquo12) pp 105ndash109 Las Vegas NevUSA January 2012

[20] NGMN Alliance ldquoNGMN 5G white paperrdquo 2015 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2015NGMN 5G White Paper V1 0pdf

[21] ldquoInfrastructure and Design for Adaptivity and Flexibilityrdquo inMobile Information Systems Springer 2006

[22] J Nielsen ldquoResponse Times The Three Important Limitsrdquo inUsability Engineering 1993

[23] NGMN Alliance ldquoDefinition of the testing framework for theNGMN 5G pre-commercial networks trialsrdquo 2018 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2018180220 NGMN PreCommTrials Framework definition v1 0pdf

[24] 3GPP TS 26246 ldquoTransparent end-to-end Packet-switchedStreaming Services (PSS) Progressive Download and DynamicAdaptive Streaming over HTTP (3GP-DASH)rdquo 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: QoE Evaluation: The TRIANGLE Testbed Approach

4 Wireless Communications and Mobile Computing

DBs

Interface and visualization (Portal)

Testbed management

Measurements and datacollections

UE

RAN

Transport

AppEPC Local application

serversApp App

TAP

Orcomposutor

Orchestration

Compositor ExecutorDBs

ETLframework

ETL modules

DEKRADriver

WebDriverDriver

AndroidTap Driver

EPCDriver

AppInstrumentat

ion TAPDriver

iOS TAPDriver

hellipTAP Driver

Figure 1 TRIANGLE testbed architecture

In order to provide an end-to-end system the testbedintegrates a commercial EPC (LTE Evolved Packet Core)from Polaris Networks which includes the main elements ofa standard 3GPP compliant LTE core network that is MME(Mobility Management Entity) SGW (Serving Gateway)PGW (Packet Gateway) HSS (Home Subscriber Server) andPCRF (Policy and Charging Rules Function) In additionthis EPC includes the EPDG (Evolved Packet Data Gateway)and ANDSF (Access Network Discovery and Session Func-tion) components for dual connectivity scenarios The RANemulator is connected to the EPC through the standard S1interface The testbed also offers the possibility of integratingartificial impairments in the interfaces between the corenetwork and the application servers

The Quamotion WebDriver another TRIANGLE ele-ment is able to automate user actions on both iOS andAndroid applications whether they are native hybrid offully web-based This tool is also used to prerecord the apprsquosuser flows which are needed to automate the otherwisemanual user actions in the test cases This completes the fullautomation operation

Finally the testbed also incorporates commercial mobiledevices (UEs) The devices are physically connected to thetestbed In order to preserve the radio conditions configuredat the radio access emulator the RAN emulator is cable con-ducted to the mobile device antenna connector To accuratelymeasure the power consumption theN6705B power analyzerdirectly powers the device Other measurement instrumentsmay be added in the future

4 TRIANGLE Approach

TheTRIANGLE testbed is an end-to-end framework devotedto testing and benchmarking mobile applications servicesand devices The idea behind the testing approach adoptedin the TRIANGLE testbed is to generalize QoE computationand provide a programmatic way of computing it Withthis approach the TRIANGLE testbed can accommodate thecomputation of the QoE for any application

The basic concept in TRIANGLErsquos approach to QoEevaluation is that the quality perceived by the user dependson many aspects (herein called domains) and that thisperception depends on its targeted use case For examplebattery life is critical for patient monitoring applications butless important in live streaming ones

To define the different 5G uses cases TRIANGLE basedits work in the Next Generation Mobile Network (NGMN)Alliance foundational White Paper which specifies theexpected services and network performance in future 5Gnetworks [20] More precisely the TRIANGLE project hasadopted a modular approach subdividing the so-calledldquoNGMN Use-Casesrdquo into blocks The name Use Case waskept in the TRIANGLE approach for describing the appli-cation service or vertical using the network services Thediversification of services expected in 5G requires a concretecategorization to have a sharp picture of what the user will beexpected to interact with This is essential for understandingwhich aspect of the QoE evaluation needs to be addressedThe final use cases categorization was defined in [11] andencompasses both the services normally accessible via mobile

Wireless Communications and Mobile Computing 5

Synthetic MOS

Test case

Scenario 1

Iteration 1

Meas 1

Meas

Meas P

Iteration

Meas 1

Meas

Meas P

Iteration N

Meas 1

Meas

Meas P

Scenario K

Iteration 1

Meas 1

Meas

Meas P

Iteration

Meas 1

Meas

Meas P

Iteration N

Meas 1

Meas

Meas P

KPI 1 Synthetic MOS 1

KPI Synthetic MOS

KPI R Synthetic MOS R

Aggregation Synthetic MOS Scenario 1

KPI 1 Synthetic MOS 1

KPI Synthetic MOS

KPI R Synthetic MOS R

Aggregation Synthetic MOS Scenario K

Figure 2 The process to obtain the synthetic-MOS score in a TRIANGLE test case

Table 1 Uses cases defined in the TRIANGLE project

Identifier Use CaseVR Virtual RealityGA GamingAR Augmented RealityCS Content Distribution Streaming ServicesLS Live Streaming ServicesSN Social NetworkingHS High Speed InternetPM Patient MonitoringES Emergency ServicesSM Smart MeteringSG Smart GridsCV Connected Vehicles

phones (UEs) and the ones that can be integrated in forexample gaming consoles advanced VR gear car units orIoT systems

The TRIANGLE domains group different aspects thatcan affect the final QoE perceived by the users The cur-rent testbed implementation supports three of the severaldomains that have been identified Apps User Experience(AUE) Apps Energy consumption (AEC) and ApplicationsDevice Resources Usage (RES)

Table 1 provides the use cases and Table 2 lists thedomains initially considered in TRIANGLE

To produce data to evaluate the QoE a series of testcases have been designed developed and implemented tobe run on the TRIANGLE testbed Obviously not all testcases are applicable to all applications under test becausenot all applications need or are designed to support all thefunctionalities that can be tested in the testbed In orderto automatically determine the test cases that are applicableto an application under test a questionnaire (identified as

features questionnaire in the portal) equivalent to the classi-cal conformance testing ICS (Implementation ConformanceStatement) has been developed and is accessible through theportal After filling the questionnaire the applicable test planthat is the test campaign with the list of applicable test casesis automatically generated

The sequence of user actions (type swipe tap etc) a userneeds to perform in the terminal (UE) to complete a task (egplay a video) is called the ldquoapp user flowrdquo In order to be ableto automatically run a test case the actual application userflow with the user actions a user would need to perform onthe phone to complete certain tasks defined in the test casealso has to be provided

Each test case univocally defines the conditions of execu-tion the sequence of actions the user would perform (ie theapp user flow) the sequence of actions that the elements ofthe testbed must perform the traffic injected the collectionof measurements to take and so forth In order to obtainstatistical significance each test case includes a numberof executions (iterations) under certain network conditions(herein called scenarios) Out of the various measurementsmade in the different iterations under any specific networkconditions (scenario) a number of KPIs (Key PerformanceIndicators) are computed The KPIs are normalized into astandard 1-to-5 scale as typically used in MOS scores andreferred to as synthetic-MOS a terminology that has beenadopted from previous works [7 21] The synthetic-MOSvalues are aggregated across network scenarios to produce anumber of intermediate synthetic-MOS scores which finallyare aggregated to obtain a synthetic-MOS score in each testcase (see Figure 2)

The process to obtain the final TRIANGLE mark issequential First for each domain a weighted average ofthe synthetic-MOS scores obtained in each test case inthe domain is calculated Next a weighted average of thesynthetic-MOS values in all the domains of a use case iscalculated to provide a single synthetic-MOS value per use

6 Wireless Communications and Mobile Computing

Table 2 TRIANGLE domains

Category Identifier Domain

Applications

AUE Apps User experienceAEC Apps Energy consumptionRES Device Resources UsageREL ReliabilityNWR Network Resources

Devices

Mobile Devices

DEC Energy ConsumptionDDP Data PerformanceDRF Radio PerformanceDRA User experience with reference apps

IoT DevicesIDR ReliabilityIDP Data PerformanceIEC Energy consumption

Synthetic MOS Domain A Use Case X

Synthetic MOS Domain B Use Case X

Synthetic MOS Use Case X

Synthetic MOS Use Case Y

TRIANGLE MARK

App

Synthetic MOS Test CaseDomain AUse Case X01

Synthetic MOS Test CaseDomain AUse Case X02

Synthetic MOS Test CaseDomain BUse Case X01

Synthetic MOS Test CaseDomain BUse Case X02

Synthetic MOS Test CaseDomain AUse Case Y01 Synthetic MOS Domain A Use Case Y

Synthetic MOS Test CaseDomain BUse Case Y01 Synthetic MOS Domain B Use Case Y

Figure 3 The process to obtain the TRIANGLE mark

case An application will usually be developed for one specificuse case as those defined in Table 1 but may be designed formore than one use case In the latter case a further weightedaverage is made with the synthetic-MOS scores obtained ineach use case supported by the application These sequentialsteps produce a single TRIANGLE mark an overall qualityscore as shown in Figure 3

This approach provides a common framework for testingapplications for benchmarking applications or even forcertifying disparate applications The overall process for anapp that implements features of different use cases is depictedin Figure 3

5 Details of the TRIANGLE QoE Computation

For each use case identified (see Table 1) and domain (seeTable 2) a number of test cases have been developed withinthe TRIANGLE project Each test case intends to test an

individual feature aspect or behaviour of the applicationunder test as shown in Figure 4

Each test case defines a number of measurements andbecause the results of the measurements depend on manyfactors they are not in general deterministic and thuseach test case has been designed not to perform just onesingle measurement but to run a number of iterations (N)of the same measurement Out of those measurements KPIsare computed For example if the time to load the firstmedia frame is the measurement taken in one specific testcase the average user waiting time KPI can be calculated bycomputing the mean of the values across all iterations Ingeneral different use case-domain pairs have a different set ofKPIsThe reader is encouraged to read [11] for further detailsabout the terminology used in TRIANGLE

Recommendation P10G100 Amendment 1 Definition ofQuality of Experience [2] notes that the overall acceptabilitymay be influenced by user expectations and context Forthe definition of the context technical specifications ITU-T

Wireless Communications and Mobile Computing 7

Feature Non-interactiveplayback

bullTime to load firstmedia frame

bullPlayback Cut-off

bullVideo resolution

bullContent Stall

i) Uses cases

vi) Synthetic MOS iv) Context

Networkscenarios

App User Flow Measurements KPIs MOS

ii) Domains

Apps Energy Consumption

Apps Device Resources

Apps User Experience

Mea

sure

men

ts

v) Test case execution

iii)Test caseAUECS01

Figure 4 QoE computation steps

G1030 ldquoEstimating end-to-end performance in IP networksfor data applicationsrdquo [9] and ITU-T G1031 ldquoQoE factors inweb-browsingrdquo [10] have been considered in TRIANGLEIn particular ITU-T G1031 [10] identifies the following con-text influence factors location (cafeteria office and home)interactivity (high-level interactivity versus low-level inter-activity) task type (business entertainment etc) and taskurgency (urgent versus casual) Userrsquos influence factors arehowever outside of the scope of the ITU recommendation

In the TRIANGLE project the context information hasbeen captured in the networks scenarios defined (Urban -Internet Cafe Off Peak Suburban - Shopping Mall BusyHours Urban ndash Pedestrian Urban ndash Office High speed trainndash Relay etc) and in the test cases specified in [11]

The test cases specify the conditions of the test butalso a sequence of actions that have to be executed by theapplication (app user flows) to test its features For examplethe test case that tests the ldquoPlay and Pauserdquo functionalitydefines the app user flow shown in Figure 5

The transformation of KPIs into QoE scores is the mostchallenging step in the TRIANGLE framework The execu-tion of the test cases will generate a significant amount of rawmeasurements about several aspects of the system SpecificKPIs can then be extracted through statistical analysis meandeviation cumulative distribution function (CDF) or ratio

TheKPIs will be individually interpolated in order to pro-vide a common homogeneous comparison and aggregationspace The interpolation is based on the application of twofunctions named Type I and Type II By using the proposedtwo types of interpolations the vast majority of KPIs can betranslated into normalized MOS-type of metric (synthetic-MOS) easy to be averaged in order to provide a simpleunified evaluation

Type I This function performs a linear interpolation on theoriginal data The variables 119898119894119899

119870119875119868and119898119886119909

119870119875119868are the worst

and best known values of a KPI from a reference case The

Perform login step (ifrequired) and wait for

10 seconds

Start playing avideo of 5

minutes during10 seconds

Pause thereproduction

Resume thereproduction after

2 minutes anduntil the end of

the video

Figure 5 App user flow used in the ldquoAUECS02 Play and Pauserdquotest case

function maps a value v of a KPI to vrsquo (synthetic-MOS) inthe range [1-to-5] by computing the following formula

V1015840 =V minus 119898119894119899

119870119875119868

119898119886119909119870119875119868minus 119898119894119899

119870119875119868

(50 minus 10) + 10 (1)

This function transforms a KPI to a synthetic-MOS value byapplying a simple linear interpolation between the worst andbest expected values from a reference case If a future inputcase falls outside the data range of the KPI the new value will

8 Wireless Communications and Mobile Computing

Table 3 AUECS002 test case description

Identifier AUECS002 (App User ExperienceContent Streaming002)Title Play and pauseObjective Measure the ability of the AUT to pause and the resume a media fileApplicability (ICSG ProductType = Application) AND (ICSG UseCases includes CS) AND ICSA CSPauseInitial Conditions AUT in in [AUT STARTED] mode (Note Defined in D22 [11] Appendix 4)

Steps(1) The Test System commands the AUT to replay the Application User Flow (Application User Flow that

presses first the Play button and later the Pause button)(2) The Test System measures whether pause operation was successful or not

Postamble (i) Execute the Postamble sequence (see section 26 in D22 [11] Appendix 4)

Measurements (Raw)

(i) Playback Cut-off Probability that successfully started stream reproduction is ended by a cause other thanthe intentional termination by the user

(ii) Pause Operation Whether pause operation is successful or not(iii) Time to load first media frame (s) after resuming The time elapsed since the user clicks resume button

until the media reproduction starts(Note For Exoplayer the RESUME button is the PLAY button)

be set to the extreme value minKPI (if it is worse) or maxKPI(if it is better)

Type II This function performs a logarithmic interpolationand is inspired on the opinion model recommended by theITU-T in [9] for a simple web search taskThis function mapsa value v of a KPI to vrsquo (synthetic-MOS) in the range [1-to-5]by computing the following formula

V1015840 =50 minus 10

ln ((119886 lowast 119908119900119903119904119905119870119875119868+ 119887) 119908119900119903119904119905

119870119875119868)

∙ (ln (V) minus ln (119886 lowast 119908119900119903119904119905119870119875119868+ 119886)) + 5

(2)

The default values of 119886 and 119887 correspond to the simple websearch task case (119886 = 0003 and 119887 = 012) [9 22] and theworst value has been extracted from the ITU-T G1030 Ifduring experimentation a future input case falls outside thedata range of the KPI the parameters 119886 and 119887will be updatedaccordingly Likewise if through subjective experimentationother values are considered better adjustments for specificservices the function can be easily updated

Once all KPIs are translated into synthetic-MOS valuesthey can be averaged with suitable weights In the averagingprocess the first step is to average over the network scenariosconsidered relevant for the use case as shown in Figure 2This provides the synthetic-MOS output value for the testcase If there is more than one test case per domain which isgenerally the case a weighted average is calculated in order toprovide one synthetic-MOS value per domain as depicted inFigure 3The final step is to average the synthetic-MOS scoresover all use cases supported by the application (see Figure 3)This provides the final score that is the TRIANGLE mark

6 A Practical Case Exoplayer under Test

For better understanding the complete process of obtainingtheTRIANGLEmark for a specific application the Exoplayer

Table4Measurement points associatedwith test caseAUECS002

Measurements Measurement points

Time to load first media frame Media File Playback - StartMedia File Playback - First Picture

Playback cut-off Media File Playback - StartMedia File Playback - End

Pause Media File Playback - Pause

is described in this section This application only has one usecase content distribution streaming services (CS)

Exoplayer is an application levelmedia player forAndroidpromoted by Google It provides an alternative to AndroidrsquosMediaPlayer API for playing audio and video both locally andover the Internet Exoplayer supports features not currentlysupported by Androidrsquos MediaPlayer API including DASHand SmoothStreaming adaptive playbacks

The TRIANGLE project has concentrated in testing justtwo of the Exoplayer features ldquoNoninteractive Playbackrdquoand ldquoPlay and Pauserdquo These features result in 6 test casesapplicable out of the test cases defined in TRIANGLETheseare test cases AUECS001 and AUECS002 in the App UserExperience domain test casesAECCS001 andAECCS002in the App Energy Consumption domain and test casesRESCS001 and RESCS002 in the Device Resources Usagedomain

The AUECS002 ldquoPlay and Pauserdquo test case descriptionbelonging to the AUE domain is shown in Table 3 The testcase description specifies the test conditions the generic appuser flow and the rawmeasurements which shall be collectedduring the execution of the test

The TRIANGLE project also offers a library that includesthe measurement points that should be inserted in thesource code of the app for enabling the collection of themeasurements specified Table 4 shows the measurementpoints required to compute the measurements specified intest case AUECS002

Wireless Communications and Mobile Computing 9

Table 5 Reference values for interpolation

Feature Domain KPI Synthetic MOS Calculation KPI min KPI maxNon-Interactive Playback AEC Average power consumption Type I 10 W 08 WNon-Interactive Playback AUE Time to load first media frame Type II KPI worst=20 msNon-Interactive Playback AUE Playback cut-off ratio Type I 50 0Non-Interactive Playback AUE Video resolution Type I 240p 720pNon-Interactive Playback RES Average CPU usage Type I 100 16Non-Interactive Playback RES Average memory usage Type I 100 40Play and Pause AEC Average power consumption Type I 10 W 08 WPlay and Pause AUE Pause operation success rate Type I 50 100Play and Pause RES Average CPU usage Type I 100 16Play and Pause RES Average memory usage Type I 100 40

The time to load first media picture measurement isobtained subtracting the timestamp of the measurementpoint ldquoMedia File Playback ndash Startrdquo from the measurementpoint ldquoMedia File Playback ndash First Picturerdquo

As specified in [11] all scenarios defined are applicableto the content streaming use case Therefore test cases inthe three domains currently supported by the testbed areexecuted in all the scenarios

Once the test campaign has finished the raw measure-ment results are processed to obtain the KPIs associated witheach test case average current consumption average time toload first media frame average CPU usage and so forth Theprocesses applied are detailed in Table 5 Based on previousexperiments performed by the authors the behaviour of thetime to load the first media frame KPI resembles the webresponse time KPI (ie the amount of time the user hasto wait for the service) and thus as recommended in theopinionmodel forweb search introduced in [9] a logarithmicinterpolation (type II) has been used for this metric

The results of the initial process that is the KPIs compu-tation are translated into synthetics-MOS values To computethese values reference benchmarking values for each of theKPIs need to be used according to the normalization andinterpolation process described in Section 5 Table 5 showswhat has been currently used by TRIANGLE for the AppUser Experience domain which is also used by NGMN asreference in their precommercial Trials document [23]

For example for the ldquotime to load first media framerdquo KPIshown in Table 5 the type of aggregation applied is averagingand the interpolation formula used is Type II

To achieve stable results each test case is executed 10times (10 iterations) in each network scenario The synthetic-MOS value in each domain is calculated by averaging themeasured synthetic-MOS values in the domain For examplesynthetic-MOS value is the RES domain obtained by aver-aging the synthetic-MOS value of ldquoaverage CPU usagerdquo andldquoaverage memory usagerdquo from the two test cases

Although Exoplayer supports several video streamingprotocols in this work only DASH [24] (Dynamic AdaptiveStreaming over HTTP) has been tested DASH clients shouldseamlessly adapt to changing network conditions by makingdecisions on which video segment to download (videosare encoded at multiple bitrates) The Exoplayerrsquos default

000

000

0

001

728

0

003

456

0

005

184

0

010

912

0

012

640

0

014

368

0

020

096

0

Timestamp

Video Resolution

0

200

400

600

800

1000

1200

Hor

izon

tal R

esol

utio

n

Figure 6 Video Resolution evolution in the Driving Urban Normalscenario

adaptation algorithm is basically throughput-based and someparameters control how often and when switching can occur

During the testing the testbed was configured with thedifferent network scenarios defined in [11] In these scenariosthe network configuration changes dynamically following arandom pattern resulting in different maximum throughputrates The expected behaviour of the application under testis that the video streaming client adapts to the availablethroughput by decreasing or increasing the resolution of thereceived video Figure 6 depicts how the client effectivelyadapts to the channel conditions

However the objective of the testing carried out in theTRIANGE testbed is not just to verify that the video stream-ing client actually adapts to the available maximum through-put but also to check whether this adaptation improves theusersrsquo experience quality

Table 6 shows a summary of the synthetic-MOS valuesobtained per scenario in one test case of each domain Thescores obtained in the RES andAECdomains are always highIn the AUE domain the synthetic MOS associated with theVideo Resolution shows low scores in some of the scenariosbecause the resolution decreases reasonable good scores inthe time to load first media and high scores in the time toplayback cut-off ratio Overall it can be concluded that the

10 Wireless Communications and Mobile Computing

Table 6 Synthetic MOS values per test case and scenario for the feature ldquoNoninteractive Playbackrdquo

AUE domain AEC domain RES domain

Test Case AUECS001 Test CaseAECCS001 Test Case RESCS001

ScenarioTime to loadfirst mediaframe

PlaybackCut-off ratio

VideoResolution

mode

AveragePower

Consumption

Average CPUUsage

AverageRAM Usage

HighSpeed DirectPassenger 21 31 23 47 43 42

Suburban Festival 38 47 31 48 43 41Suburban shopping mallbusy hours 37 37 13 48 44 41

Suburban shopping malloff-peak 36 31 23 48 43 41

Suburban stadium 38 29 21 47 44 41Urban Driving Normal 26 39 28 47 44 4Urban Driving TrafficJam 34 37 16 48 44 4

Urban Internet Cafe BusyHours 38 37 19 48 44 4

Urban Internet Cafe OffPeak 38 31 23 48 43 4

Urban Office 38 47 33 48 45 43Urban Pedestrian 39 26 2 47 44 4

35 36 23 47 44 41

DASH implementation of the video streaming client undertest is able to adapt to the changing conditions of the networkmaintaining an acceptable rate of video cut-off rebufferingtimes and resources usage

The final score in each domain is obtained by averagingthe synthetic-MOS values from all the tested network scenar-ios Figure 7 shows the spider diagram for the three domainstested In the User Experience domain the score obtained islower than the other domains due to the low synthetic-MOSvalues obtained for the video resolution

The final synthetic MOS for the use case Content Dis-tribution Streaming is obtained as a weighted average of thethree domains representing the overall QoE as perceived bythe userThefinal score for the Exoplayer version 1516 and thefeatures tested (Noninteractive Playback and Play and Pause)is 42 which means that the low score obtained in the videoresolution is compensated with the high scores in other KPIs

If an application under test has more than one use casethe next steps in the TRIANGLE mark project approachwould be the aggregation per use case and the aggregationover all use cases The final score the TRIANGLE mark is anestimation of the overall QoE as perceived by the user

In the current TRIANGLE implementation the weightsin all aggregations are the same Further research is neededto appropriately define the weights of each domain and eachuse case in the overall score of the applications

7 Conclusions

The main contribution of the TRIANGLE project is theprovision of a framework that generalizes QoE computation

and enables the execution of extensive and repeatable testcampaigns to obtainmeaningfulQoE scoresTheTRIANGLEproject has also defined amethodology which is based on thetransformation and aggregation of KPIs its transformationinto synthetic-MOS values and its aggregation over thedifferent domains and use cases

The TRIANGLE approach is a methodology flexibleenough to generalize the computation of QoE for any applica-tionservice Themethodology has been validated testing theDASH implementation in the Exoplayer App To confirm thesuitability of theweights used in the averaging process and theinterpolation parameters as well as to verify the correlationof the obtained MOS with that scored by users the authorshave started experiments with real users and initial results areencouraging

The process described produces a final TRIANGLEmarka single quality score which could eventually be used to cer-tify applications after achieving a consensus on the differentvalues of the process (weights limits etc) to use

Data Availability

Themethodology and results used to support the findings ofthis study are included within the article

Conflicts of Interest

The authors declare that they have no conflicts of inter-est

Wireless Communications and Mobile Computing 11

DER

AUE AEC

Device Resource Usage

User Experience Energy Consumption

Figure 7 Exoplayer synthetic-MOS values per domain

Acknowledgments

The TRIANGLE project is funded by the European UnionrsquosHorizon 2020 Research and Innovation Programme (GrantAgreement no 688712)

References

[1] ETSI ldquoHuman factors quality of experience (QoE) require-ments for real-time communication servicesrdquo Tech Rep 102643 2010

[2] ITU-T ldquoP10G100 (2006) amendment 1 (0107) new appendixI - definition of quality of experience (QoE)rdquo 2007

[3] F Kozamernik V Steinmann P Sunna and E WyckensldquoSAMVIQ - A new EBUmethodology for video quality evalua-tions in multimediardquo SMPTE Motion Imaging Journal vol 114no 4 pp 152ndash160 2005

[4] ITU-T ldquoG107 the E-model a computational model for use intransmission planningrdquo 2015

[5] J De Vriendt D De Vleeschauwer and D C Robinson ldquoQoEmodel for video delivered over an LTE network using HTTPadaptive streamingrdquo Bell Labs Technical Journal vol 18 no 4pp 45ndash62 2014

[6] S Jelassi G Rubino H Melvin H Youssef and G PujolleldquoQuality of Experience of VoIP Service A Survey of AssessmentApproaches andOpen Issuesrdquo IEEECommunications Surveys ampTutorials vol 14 no 2 pp 491ndash513 2012

[7] M Li C-L Yeh and S-Y Lu ldquoReal-Time QoE MonitoringSystem forVideo Streaming ServiceswithAdaptiveMedia Play-outrdquo International Journal of Digital Multimedia Broadcastingvol 2018 Article ID 2619438 11 pages 2018

[8] S Barakovic and L Skorin-Kapov ldquoSurvey and Challengesof QoE Management Issues in Wireless Networksrdquo Journal ofComputer Networks and Communications vol 2013 Article ID165146 28 pages 2013

[9] ITU-T ldquoG1030 estimating end-to-end performance in IPnetworks for data applicationsrdquo 2014

[10] ITU-T ldquoG1031 QoE factors in web-browsingrdquo 2014[11] EU H2020 TRIANGLE Project Deliverable D22 Final report

on the formalization of the certification process requirementsanduse cases 2017 httpswwwtriangle-projecteuproject-olddeliverables

[12] Q A Chen H Luo S Rosen et al ldquoQoE doctor diagnosingmobile app QoE with automated UI control and cross-layeranalysisrdquo in Proceedings of the Conference on Internet Mea-surement Conference (IMC rsquo14) pp 151ndash164 ACM VancouverCanada November 2014

[13] M A Mehmood A Wundsam S Uhlig D Levin N Sarrarand A Feldmann ldquoQoE-Lab Towards Evaluating Quality ofExperience for Future Internet Conditionsrdquo in Testbeds andResearch Infrastructure Korakis T Li H Tran-Gia P and HS Park Eds vol 90 of TridentCom 2011 Lnicst pp 286ndash301Springer Development of Networks and Communities BerlinGermany 2012

[14] D Levin A Wundsam A Mehmood and A FeldmannldquoBerlin The Berlin Experimental Router Laboratory for Inno-vative Networkingrdquo in TridentCom 2010 Lnicst T MagedanzA Gavras N H Thanh and J S Chase Eds vol 46 of LectureNotes of the Institute for Computer Sciences Social Informaticsand Telecommunications Engineering pp 602ndash604 SpringerHeidelberg Germany 2011

12 Wireless Communications and Mobile Computing

[15] K De Moor I Ketyko W Joseph et al ldquoProposed frameworkfor evaluating quality of experience in a mobile testbed-oriented living lab settingrdquo Mobile Networks and Applicationsvol 15 no 3 pp 378ndash391 2010

[16] R Sanchez-Iborra M-D Cano J J P C Rodrigues and JGarcia-Haro ldquoAnExperimental QoE Performance Study for theEfficient Transmission of High Demanding Traffic over an AdHoc Network Using BATMANrdquo Mobile Information Systemsvol 2015 Article ID 217106 14 pages 2015

[17] P Oliver-Balsalobre M Toril S Luna-Ramırez and R GarcıaGaraluz ldquoA system testbed for modeling encrypted video-streaming service performance indicators based on TCPIPmetricsrdquo EURASIP Journal on Wireless Communications andNetworking vol 2017 no 1 2017

[18] M Solera M Toril I Palomo G Gomez and J Poncela ldquoATestbed for Evaluating Video Streaming Services in LTErdquoWireless Personal Communications vol 98 no 3 pp 2753ndash27732018

[19] A Alvarez A Dıaz P Merino and F J Rivas ldquoField mea-surements of mobile services with Android smartphonesrdquoin Proceedings of the IEEE Consumer Communications andNetworking Conference (CCNC rsquo12) pp 105ndash109 Las Vegas NevUSA January 2012

[20] NGMN Alliance ldquoNGMN 5G white paperrdquo 2015 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2015NGMN 5G White Paper V1 0pdf

[21] ldquoInfrastructure and Design for Adaptivity and Flexibilityrdquo inMobile Information Systems Springer 2006

[22] J Nielsen ldquoResponse Times The Three Important Limitsrdquo inUsability Engineering 1993

[23] NGMN Alliance ldquoDefinition of the testing framework for theNGMN 5G pre-commercial networks trialsrdquo 2018 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2018180220 NGMN PreCommTrials Framework definition v1 0pdf

[24] 3GPP TS 26246 ldquoTransparent end-to-end Packet-switchedStreaming Services (PSS) Progressive Download and DynamicAdaptive Streaming over HTTP (3GP-DASH)rdquo 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: QoE Evaluation: The TRIANGLE Testbed Approach

Wireless Communications and Mobile Computing 5

Synthetic MOS

Test case

Scenario 1

Iteration 1

Meas 1

Meas

Meas P

Iteration

Meas 1

Meas

Meas P

Iteration N

Meas 1

Meas

Meas P

Scenario K

Iteration 1

Meas 1

Meas

Meas P

Iteration

Meas 1

Meas

Meas P

Iteration N

Meas 1

Meas

Meas P

KPI 1 Synthetic MOS 1

KPI Synthetic MOS

KPI R Synthetic MOS R

Aggregation Synthetic MOS Scenario 1

KPI 1 Synthetic MOS 1

KPI Synthetic MOS

KPI R Synthetic MOS R

Aggregation Synthetic MOS Scenario K

Figure 2 The process to obtain the synthetic-MOS score in a TRIANGLE test case

Table 1 Uses cases defined in the TRIANGLE project

Identifier Use CaseVR Virtual RealityGA GamingAR Augmented RealityCS Content Distribution Streaming ServicesLS Live Streaming ServicesSN Social NetworkingHS High Speed InternetPM Patient MonitoringES Emergency ServicesSM Smart MeteringSG Smart GridsCV Connected Vehicles

phones (UEs) and the ones that can be integrated in forexample gaming consoles advanced VR gear car units orIoT systems

The TRIANGLE domains group different aspects thatcan affect the final QoE perceived by the users The cur-rent testbed implementation supports three of the severaldomains that have been identified Apps User Experience(AUE) Apps Energy consumption (AEC) and ApplicationsDevice Resources Usage (RES)

Table 1 provides the use cases and Table 2 lists thedomains initially considered in TRIANGLE

To produce data to evaluate the QoE a series of testcases have been designed developed and implemented tobe run on the TRIANGLE testbed Obviously not all testcases are applicable to all applications under test becausenot all applications need or are designed to support all thefunctionalities that can be tested in the testbed In orderto automatically determine the test cases that are applicableto an application under test a questionnaire (identified as

features questionnaire in the portal) equivalent to the classi-cal conformance testing ICS (Implementation ConformanceStatement) has been developed and is accessible through theportal After filling the questionnaire the applicable test planthat is the test campaign with the list of applicable test casesis automatically generated

The sequence of user actions (type swipe tap etc) a userneeds to perform in the terminal (UE) to complete a task (egplay a video) is called the ldquoapp user flowrdquo In order to be ableto automatically run a test case the actual application userflow with the user actions a user would need to perform onthe phone to complete certain tasks defined in the test casealso has to be provided

Each test case univocally defines the conditions of execu-tion the sequence of actions the user would perform (ie theapp user flow) the sequence of actions that the elements ofthe testbed must perform the traffic injected the collectionof measurements to take and so forth In order to obtainstatistical significance each test case includes a numberof executions (iterations) under certain network conditions(herein called scenarios) Out of the various measurementsmade in the different iterations under any specific networkconditions (scenario) a number of KPIs (Key PerformanceIndicators) are computed The KPIs are normalized into astandard 1-to-5 scale as typically used in MOS scores andreferred to as synthetic-MOS a terminology that has beenadopted from previous works [7 21] The synthetic-MOSvalues are aggregated across network scenarios to produce anumber of intermediate synthetic-MOS scores which finallyare aggregated to obtain a synthetic-MOS score in each testcase (see Figure 2)

The process to obtain the final TRIANGLE mark issequential First for each domain a weighted average ofthe synthetic-MOS scores obtained in each test case inthe domain is calculated Next a weighted average of thesynthetic-MOS values in all the domains of a use case iscalculated to provide a single synthetic-MOS value per use

6 Wireless Communications and Mobile Computing

Table 2 TRIANGLE domains

Category Identifier Domain

Applications

AUE Apps User experienceAEC Apps Energy consumptionRES Device Resources UsageREL ReliabilityNWR Network Resources

Devices

Mobile Devices

DEC Energy ConsumptionDDP Data PerformanceDRF Radio PerformanceDRA User experience with reference apps

IoT DevicesIDR ReliabilityIDP Data PerformanceIEC Energy consumption

Synthetic MOS Domain A Use Case X

Synthetic MOS Domain B Use Case X

Synthetic MOS Use Case X

Synthetic MOS Use Case Y

TRIANGLE MARK

App

Synthetic MOS Test CaseDomain AUse Case X01

Synthetic MOS Test CaseDomain AUse Case X02

Synthetic MOS Test CaseDomain BUse Case X01

Synthetic MOS Test CaseDomain BUse Case X02

Synthetic MOS Test CaseDomain AUse Case Y01 Synthetic MOS Domain A Use Case Y

Synthetic MOS Test CaseDomain BUse Case Y01 Synthetic MOS Domain B Use Case Y

Figure 3 The process to obtain the TRIANGLE mark

case An application will usually be developed for one specificuse case as those defined in Table 1 but may be designed formore than one use case In the latter case a further weightedaverage is made with the synthetic-MOS scores obtained ineach use case supported by the application These sequentialsteps produce a single TRIANGLE mark an overall qualityscore as shown in Figure 3

This approach provides a common framework for testingapplications for benchmarking applications or even forcertifying disparate applications The overall process for anapp that implements features of different use cases is depictedin Figure 3

5 Details of the TRIANGLE QoE Computation

For each use case identified (see Table 1) and domain (seeTable 2) a number of test cases have been developed withinthe TRIANGLE project Each test case intends to test an

individual feature aspect or behaviour of the applicationunder test as shown in Figure 4

Each test case defines a number of measurements andbecause the results of the measurements depend on manyfactors they are not in general deterministic and thuseach test case has been designed not to perform just onesingle measurement but to run a number of iterations (N)of the same measurement Out of those measurements KPIsare computed For example if the time to load the firstmedia frame is the measurement taken in one specific testcase the average user waiting time KPI can be calculated bycomputing the mean of the values across all iterations Ingeneral different use case-domain pairs have a different set ofKPIsThe reader is encouraged to read [11] for further detailsabout the terminology used in TRIANGLE

Recommendation P10G100 Amendment 1 Definition ofQuality of Experience [2] notes that the overall acceptabilitymay be influenced by user expectations and context Forthe definition of the context technical specifications ITU-T

Wireless Communications and Mobile Computing 7

Feature Non-interactiveplayback

bullTime to load firstmedia frame

bullPlayback Cut-off

bullVideo resolution

bullContent Stall

i) Uses cases

vi) Synthetic MOS iv) Context

Networkscenarios

App User Flow Measurements KPIs MOS

ii) Domains

Apps Energy Consumption

Apps Device Resources

Apps User Experience

Mea

sure

men

ts

v) Test case execution

iii)Test caseAUECS01

Figure 4 QoE computation steps

G1030 ldquoEstimating end-to-end performance in IP networksfor data applicationsrdquo [9] and ITU-T G1031 ldquoQoE factors inweb-browsingrdquo [10] have been considered in TRIANGLEIn particular ITU-T G1031 [10] identifies the following con-text influence factors location (cafeteria office and home)interactivity (high-level interactivity versus low-level inter-activity) task type (business entertainment etc) and taskurgency (urgent versus casual) Userrsquos influence factors arehowever outside of the scope of the ITU recommendation

In the TRIANGLE project the context information hasbeen captured in the networks scenarios defined (Urban -Internet Cafe Off Peak Suburban - Shopping Mall BusyHours Urban ndash Pedestrian Urban ndash Office High speed trainndash Relay etc) and in the test cases specified in [11]

The test cases specify the conditions of the test butalso a sequence of actions that have to be executed by theapplication (app user flows) to test its features For examplethe test case that tests the ldquoPlay and Pauserdquo functionalitydefines the app user flow shown in Figure 5

The transformation of KPIs into QoE scores is the mostchallenging step in the TRIANGLE framework The execu-tion of the test cases will generate a significant amount of rawmeasurements about several aspects of the system SpecificKPIs can then be extracted through statistical analysis meandeviation cumulative distribution function (CDF) or ratio

TheKPIs will be individually interpolated in order to pro-vide a common homogeneous comparison and aggregationspace The interpolation is based on the application of twofunctions named Type I and Type II By using the proposedtwo types of interpolations the vast majority of KPIs can betranslated into normalized MOS-type of metric (synthetic-MOS) easy to be averaged in order to provide a simpleunified evaluation

Type I This function performs a linear interpolation on theoriginal data The variables 119898119894119899

119870119875119868and119898119886119909

119870119875119868are the worst

and best known values of a KPI from a reference case The

Perform login step (ifrequired) and wait for

10 seconds

Start playing avideo of 5

minutes during10 seconds

Pause thereproduction

Resume thereproduction after

2 minutes anduntil the end of

the video

Figure 5 App user flow used in the ldquoAUECS02 Play and Pauserdquotest case

function maps a value v of a KPI to vrsquo (synthetic-MOS) inthe range [1-to-5] by computing the following formula

V1015840 =V minus 119898119894119899

119870119875119868

119898119886119909119870119875119868minus 119898119894119899

119870119875119868

(50 minus 10) + 10 (1)

This function transforms a KPI to a synthetic-MOS value byapplying a simple linear interpolation between the worst andbest expected values from a reference case If a future inputcase falls outside the data range of the KPI the new value will

8 Wireless Communications and Mobile Computing

Table 3 AUECS002 test case description

Identifier AUECS002 (App User ExperienceContent Streaming002)Title Play and pauseObjective Measure the ability of the AUT to pause and the resume a media fileApplicability (ICSG ProductType = Application) AND (ICSG UseCases includes CS) AND ICSA CSPauseInitial Conditions AUT in in [AUT STARTED] mode (Note Defined in D22 [11] Appendix 4)

Steps(1) The Test System commands the AUT to replay the Application User Flow (Application User Flow that

presses first the Play button and later the Pause button)(2) The Test System measures whether pause operation was successful or not

Postamble (i) Execute the Postamble sequence (see section 26 in D22 [11] Appendix 4)

Measurements (Raw)

(i) Playback Cut-off Probability that successfully started stream reproduction is ended by a cause other thanthe intentional termination by the user

(ii) Pause Operation Whether pause operation is successful or not(iii) Time to load first media frame (s) after resuming The time elapsed since the user clicks resume button

until the media reproduction starts(Note For Exoplayer the RESUME button is the PLAY button)

be set to the extreme value minKPI (if it is worse) or maxKPI(if it is better)

Type II This function performs a logarithmic interpolationand is inspired on the opinion model recommended by theITU-T in [9] for a simple web search taskThis function mapsa value v of a KPI to vrsquo (synthetic-MOS) in the range [1-to-5]by computing the following formula

V1015840 =50 minus 10

ln ((119886 lowast 119908119900119903119904119905119870119875119868+ 119887) 119908119900119903119904119905

119870119875119868)

∙ (ln (V) minus ln (119886 lowast 119908119900119903119904119905119870119875119868+ 119886)) + 5

(2)

The default values of 119886 and 119887 correspond to the simple websearch task case (119886 = 0003 and 119887 = 012) [9 22] and theworst value has been extracted from the ITU-T G1030 Ifduring experimentation a future input case falls outside thedata range of the KPI the parameters 119886 and 119887will be updatedaccordingly Likewise if through subjective experimentationother values are considered better adjustments for specificservices the function can be easily updated

Once all KPIs are translated into synthetic-MOS valuesthey can be averaged with suitable weights In the averagingprocess the first step is to average over the network scenariosconsidered relevant for the use case as shown in Figure 2This provides the synthetic-MOS output value for the testcase If there is more than one test case per domain which isgenerally the case a weighted average is calculated in order toprovide one synthetic-MOS value per domain as depicted inFigure 3The final step is to average the synthetic-MOS scoresover all use cases supported by the application (see Figure 3)This provides the final score that is the TRIANGLE mark

6 A Practical Case Exoplayer under Test

For better understanding the complete process of obtainingtheTRIANGLEmark for a specific application the Exoplayer

Table4Measurement points associatedwith test caseAUECS002

Measurements Measurement points

Time to load first media frame Media File Playback - StartMedia File Playback - First Picture

Playback cut-off Media File Playback - StartMedia File Playback - End

Pause Media File Playback - Pause

is described in this section This application only has one usecase content distribution streaming services (CS)

Exoplayer is an application levelmedia player forAndroidpromoted by Google It provides an alternative to AndroidrsquosMediaPlayer API for playing audio and video both locally andover the Internet Exoplayer supports features not currentlysupported by Androidrsquos MediaPlayer API including DASHand SmoothStreaming adaptive playbacks

The TRIANGLE project has concentrated in testing justtwo of the Exoplayer features ldquoNoninteractive Playbackrdquoand ldquoPlay and Pauserdquo These features result in 6 test casesapplicable out of the test cases defined in TRIANGLETheseare test cases AUECS001 and AUECS002 in the App UserExperience domain test casesAECCS001 andAECCS002in the App Energy Consumption domain and test casesRESCS001 and RESCS002 in the Device Resources Usagedomain

The AUECS002 ldquoPlay and Pauserdquo test case descriptionbelonging to the AUE domain is shown in Table 3 The testcase description specifies the test conditions the generic appuser flow and the rawmeasurements which shall be collectedduring the execution of the test

The TRIANGLE project also offers a library that includesthe measurement points that should be inserted in thesource code of the app for enabling the collection of themeasurements specified Table 4 shows the measurementpoints required to compute the measurements specified intest case AUECS002

Wireless Communications and Mobile Computing 9

Table 5 Reference values for interpolation

Feature Domain KPI Synthetic MOS Calculation KPI min KPI maxNon-Interactive Playback AEC Average power consumption Type I 10 W 08 WNon-Interactive Playback AUE Time to load first media frame Type II KPI worst=20 msNon-Interactive Playback AUE Playback cut-off ratio Type I 50 0Non-Interactive Playback AUE Video resolution Type I 240p 720pNon-Interactive Playback RES Average CPU usage Type I 100 16Non-Interactive Playback RES Average memory usage Type I 100 40Play and Pause AEC Average power consumption Type I 10 W 08 WPlay and Pause AUE Pause operation success rate Type I 50 100Play and Pause RES Average CPU usage Type I 100 16Play and Pause RES Average memory usage Type I 100 40

The time to load first media picture measurement isobtained subtracting the timestamp of the measurementpoint ldquoMedia File Playback ndash Startrdquo from the measurementpoint ldquoMedia File Playback ndash First Picturerdquo

As specified in [11] all scenarios defined are applicableto the content streaming use case Therefore test cases inthe three domains currently supported by the testbed areexecuted in all the scenarios

Once the test campaign has finished the raw measure-ment results are processed to obtain the KPIs associated witheach test case average current consumption average time toload first media frame average CPU usage and so forth Theprocesses applied are detailed in Table 5 Based on previousexperiments performed by the authors the behaviour of thetime to load the first media frame KPI resembles the webresponse time KPI (ie the amount of time the user hasto wait for the service) and thus as recommended in theopinionmodel forweb search introduced in [9] a logarithmicinterpolation (type II) has been used for this metric

The results of the initial process that is the KPIs compu-tation are translated into synthetics-MOS values To computethese values reference benchmarking values for each of theKPIs need to be used according to the normalization andinterpolation process described in Section 5 Table 5 showswhat has been currently used by TRIANGLE for the AppUser Experience domain which is also used by NGMN asreference in their precommercial Trials document [23]

For example for the ldquotime to load first media framerdquo KPIshown in Table 5 the type of aggregation applied is averagingand the interpolation formula used is Type II

To achieve stable results each test case is executed 10times (10 iterations) in each network scenario The synthetic-MOS value in each domain is calculated by averaging themeasured synthetic-MOS values in the domain For examplesynthetic-MOS value is the RES domain obtained by aver-aging the synthetic-MOS value of ldquoaverage CPU usagerdquo andldquoaverage memory usagerdquo from the two test cases

Although Exoplayer supports several video streamingprotocols in this work only DASH [24] (Dynamic AdaptiveStreaming over HTTP) has been tested DASH clients shouldseamlessly adapt to changing network conditions by makingdecisions on which video segment to download (videosare encoded at multiple bitrates) The Exoplayerrsquos default

000

000

0

001

728

0

003

456

0

005

184

0

010

912

0

012

640

0

014

368

0

020

096

0

Timestamp

Video Resolution

0

200

400

600

800

1000

1200

Hor

izon

tal R

esol

utio

n

Figure 6 Video Resolution evolution in the Driving Urban Normalscenario

adaptation algorithm is basically throughput-based and someparameters control how often and when switching can occur

During the testing the testbed was configured with thedifferent network scenarios defined in [11] In these scenariosthe network configuration changes dynamically following arandom pattern resulting in different maximum throughputrates The expected behaviour of the application under testis that the video streaming client adapts to the availablethroughput by decreasing or increasing the resolution of thereceived video Figure 6 depicts how the client effectivelyadapts to the channel conditions

However the objective of the testing carried out in theTRIANGE testbed is not just to verify that the video stream-ing client actually adapts to the available maximum through-put but also to check whether this adaptation improves theusersrsquo experience quality

Table 6 shows a summary of the synthetic-MOS valuesobtained per scenario in one test case of each domain Thescores obtained in the RES andAECdomains are always highIn the AUE domain the synthetic MOS associated with theVideo Resolution shows low scores in some of the scenariosbecause the resolution decreases reasonable good scores inthe time to load first media and high scores in the time toplayback cut-off ratio Overall it can be concluded that the

10 Wireless Communications and Mobile Computing

Table 6 Synthetic MOS values per test case and scenario for the feature ldquoNoninteractive Playbackrdquo

AUE domain AEC domain RES domain

Test Case AUECS001 Test CaseAECCS001 Test Case RESCS001

ScenarioTime to loadfirst mediaframe

PlaybackCut-off ratio

VideoResolution

mode

AveragePower

Consumption

Average CPUUsage

AverageRAM Usage

HighSpeed DirectPassenger 21 31 23 47 43 42

Suburban Festival 38 47 31 48 43 41Suburban shopping mallbusy hours 37 37 13 48 44 41

Suburban shopping malloff-peak 36 31 23 48 43 41

Suburban stadium 38 29 21 47 44 41Urban Driving Normal 26 39 28 47 44 4Urban Driving TrafficJam 34 37 16 48 44 4

Urban Internet Cafe BusyHours 38 37 19 48 44 4

Urban Internet Cafe OffPeak 38 31 23 48 43 4

Urban Office 38 47 33 48 45 43Urban Pedestrian 39 26 2 47 44 4

35 36 23 47 44 41

DASH implementation of the video streaming client undertest is able to adapt to the changing conditions of the networkmaintaining an acceptable rate of video cut-off rebufferingtimes and resources usage

The final score in each domain is obtained by averagingthe synthetic-MOS values from all the tested network scenar-ios Figure 7 shows the spider diagram for the three domainstested In the User Experience domain the score obtained islower than the other domains due to the low synthetic-MOSvalues obtained for the video resolution

The final synthetic MOS for the use case Content Dis-tribution Streaming is obtained as a weighted average of thethree domains representing the overall QoE as perceived bythe userThefinal score for the Exoplayer version 1516 and thefeatures tested (Noninteractive Playback and Play and Pause)is 42 which means that the low score obtained in the videoresolution is compensated with the high scores in other KPIs

If an application under test has more than one use casethe next steps in the TRIANGLE mark project approachwould be the aggregation per use case and the aggregationover all use cases The final score the TRIANGLE mark is anestimation of the overall QoE as perceived by the user

In the current TRIANGLE implementation the weightsin all aggregations are the same Further research is neededto appropriately define the weights of each domain and eachuse case in the overall score of the applications

7 Conclusions

The main contribution of the TRIANGLE project is theprovision of a framework that generalizes QoE computation

and enables the execution of extensive and repeatable testcampaigns to obtainmeaningfulQoE scoresTheTRIANGLEproject has also defined amethodology which is based on thetransformation and aggregation of KPIs its transformationinto synthetic-MOS values and its aggregation over thedifferent domains and use cases

The TRIANGLE approach is a methodology flexibleenough to generalize the computation of QoE for any applica-tionservice Themethodology has been validated testing theDASH implementation in the Exoplayer App To confirm thesuitability of theweights used in the averaging process and theinterpolation parameters as well as to verify the correlationof the obtained MOS with that scored by users the authorshave started experiments with real users and initial results areencouraging

The process described produces a final TRIANGLEmarka single quality score which could eventually be used to cer-tify applications after achieving a consensus on the differentvalues of the process (weights limits etc) to use

Data Availability

Themethodology and results used to support the findings ofthis study are included within the article

Conflicts of Interest

The authors declare that they have no conflicts of inter-est

Wireless Communications and Mobile Computing 11

DER

AUE AEC

Device Resource Usage

User Experience Energy Consumption

Figure 7 Exoplayer synthetic-MOS values per domain

Acknowledgments

The TRIANGLE project is funded by the European UnionrsquosHorizon 2020 Research and Innovation Programme (GrantAgreement no 688712)

References

[1] ETSI ldquoHuman factors quality of experience (QoE) require-ments for real-time communication servicesrdquo Tech Rep 102643 2010

[2] ITU-T ldquoP10G100 (2006) amendment 1 (0107) new appendixI - definition of quality of experience (QoE)rdquo 2007

[3] F Kozamernik V Steinmann P Sunna and E WyckensldquoSAMVIQ - A new EBUmethodology for video quality evalua-tions in multimediardquo SMPTE Motion Imaging Journal vol 114no 4 pp 152ndash160 2005

[4] ITU-T ldquoG107 the E-model a computational model for use intransmission planningrdquo 2015

[5] J De Vriendt D De Vleeschauwer and D C Robinson ldquoQoEmodel for video delivered over an LTE network using HTTPadaptive streamingrdquo Bell Labs Technical Journal vol 18 no 4pp 45ndash62 2014

[6] S Jelassi G Rubino H Melvin H Youssef and G PujolleldquoQuality of Experience of VoIP Service A Survey of AssessmentApproaches andOpen Issuesrdquo IEEECommunications Surveys ampTutorials vol 14 no 2 pp 491ndash513 2012

[7] M Li C-L Yeh and S-Y Lu ldquoReal-Time QoE MonitoringSystem forVideo Streaming ServiceswithAdaptiveMedia Play-outrdquo International Journal of Digital Multimedia Broadcastingvol 2018 Article ID 2619438 11 pages 2018

[8] S Barakovic and L Skorin-Kapov ldquoSurvey and Challengesof QoE Management Issues in Wireless Networksrdquo Journal ofComputer Networks and Communications vol 2013 Article ID165146 28 pages 2013

[9] ITU-T ldquoG1030 estimating end-to-end performance in IPnetworks for data applicationsrdquo 2014

[10] ITU-T ldquoG1031 QoE factors in web-browsingrdquo 2014[11] EU H2020 TRIANGLE Project Deliverable D22 Final report

on the formalization of the certification process requirementsanduse cases 2017 httpswwwtriangle-projecteuproject-olddeliverables

[12] Q A Chen H Luo S Rosen et al ldquoQoE doctor diagnosingmobile app QoE with automated UI control and cross-layeranalysisrdquo in Proceedings of the Conference on Internet Mea-surement Conference (IMC rsquo14) pp 151ndash164 ACM VancouverCanada November 2014

[13] M A Mehmood A Wundsam S Uhlig D Levin N Sarrarand A Feldmann ldquoQoE-Lab Towards Evaluating Quality ofExperience for Future Internet Conditionsrdquo in Testbeds andResearch Infrastructure Korakis T Li H Tran-Gia P and HS Park Eds vol 90 of TridentCom 2011 Lnicst pp 286ndash301Springer Development of Networks and Communities BerlinGermany 2012

[14] D Levin A Wundsam A Mehmood and A FeldmannldquoBerlin The Berlin Experimental Router Laboratory for Inno-vative Networkingrdquo in TridentCom 2010 Lnicst T MagedanzA Gavras N H Thanh and J S Chase Eds vol 46 of LectureNotes of the Institute for Computer Sciences Social Informaticsand Telecommunications Engineering pp 602ndash604 SpringerHeidelberg Germany 2011

12 Wireless Communications and Mobile Computing

[15] K De Moor I Ketyko W Joseph et al ldquoProposed frameworkfor evaluating quality of experience in a mobile testbed-oriented living lab settingrdquo Mobile Networks and Applicationsvol 15 no 3 pp 378ndash391 2010

[16] R Sanchez-Iborra M-D Cano J J P C Rodrigues and JGarcia-Haro ldquoAnExperimental QoE Performance Study for theEfficient Transmission of High Demanding Traffic over an AdHoc Network Using BATMANrdquo Mobile Information Systemsvol 2015 Article ID 217106 14 pages 2015

[17] P Oliver-Balsalobre M Toril S Luna-Ramırez and R GarcıaGaraluz ldquoA system testbed for modeling encrypted video-streaming service performance indicators based on TCPIPmetricsrdquo EURASIP Journal on Wireless Communications andNetworking vol 2017 no 1 2017

[18] M Solera M Toril I Palomo G Gomez and J Poncela ldquoATestbed for Evaluating Video Streaming Services in LTErdquoWireless Personal Communications vol 98 no 3 pp 2753ndash27732018

[19] A Alvarez A Dıaz P Merino and F J Rivas ldquoField mea-surements of mobile services with Android smartphonesrdquoin Proceedings of the IEEE Consumer Communications andNetworking Conference (CCNC rsquo12) pp 105ndash109 Las Vegas NevUSA January 2012

[20] NGMN Alliance ldquoNGMN 5G white paperrdquo 2015 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2015NGMN 5G White Paper V1 0pdf

[21] ldquoInfrastructure and Design for Adaptivity and Flexibilityrdquo inMobile Information Systems Springer 2006

[22] J Nielsen ldquoResponse Times The Three Important Limitsrdquo inUsability Engineering 1993

[23] NGMN Alliance ldquoDefinition of the testing framework for theNGMN 5G pre-commercial networks trialsrdquo 2018 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2018180220 NGMN PreCommTrials Framework definition v1 0pdf

[24] 3GPP TS 26246 ldquoTransparent end-to-end Packet-switchedStreaming Services (PSS) Progressive Download and DynamicAdaptive Streaming over HTTP (3GP-DASH)rdquo 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: QoE Evaluation: The TRIANGLE Testbed Approach

6 Wireless Communications and Mobile Computing

Table 2 TRIANGLE domains

Category Identifier Domain

Applications

AUE Apps User experienceAEC Apps Energy consumptionRES Device Resources UsageREL ReliabilityNWR Network Resources

Devices

Mobile Devices

DEC Energy ConsumptionDDP Data PerformanceDRF Radio PerformanceDRA User experience with reference apps

IoT DevicesIDR ReliabilityIDP Data PerformanceIEC Energy consumption

Synthetic MOS Domain A Use Case X

Synthetic MOS Domain B Use Case X

Synthetic MOS Use Case X

Synthetic MOS Use Case Y

TRIANGLE MARK

App

Synthetic MOS Test CaseDomain AUse Case X01

Synthetic MOS Test CaseDomain AUse Case X02

Synthetic MOS Test CaseDomain BUse Case X01

Synthetic MOS Test CaseDomain BUse Case X02

Synthetic MOS Test CaseDomain AUse Case Y01 Synthetic MOS Domain A Use Case Y

Synthetic MOS Test CaseDomain BUse Case Y01 Synthetic MOS Domain B Use Case Y

Figure 3 The process to obtain the TRIANGLE mark

case An application will usually be developed for one specificuse case as those defined in Table 1 but may be designed formore than one use case In the latter case a further weightedaverage is made with the synthetic-MOS scores obtained ineach use case supported by the application These sequentialsteps produce a single TRIANGLE mark an overall qualityscore as shown in Figure 3

This approach provides a common framework for testingapplications for benchmarking applications or even forcertifying disparate applications The overall process for anapp that implements features of different use cases is depictedin Figure 3

5 Details of the TRIANGLE QoE Computation

For each use case identified (see Table 1) and domain (seeTable 2) a number of test cases have been developed withinthe TRIANGLE project Each test case intends to test an

individual feature aspect or behaviour of the applicationunder test as shown in Figure 4

Each test case defines a number of measurements andbecause the results of the measurements depend on manyfactors they are not in general deterministic and thuseach test case has been designed not to perform just onesingle measurement but to run a number of iterations (N)of the same measurement Out of those measurements KPIsare computed For example if the time to load the firstmedia frame is the measurement taken in one specific testcase the average user waiting time KPI can be calculated bycomputing the mean of the values across all iterations Ingeneral different use case-domain pairs have a different set ofKPIsThe reader is encouraged to read [11] for further detailsabout the terminology used in TRIANGLE

Recommendation P10G100 Amendment 1 Definition ofQuality of Experience [2] notes that the overall acceptabilitymay be influenced by user expectations and context Forthe definition of the context technical specifications ITU-T

Wireless Communications and Mobile Computing 7

Feature Non-interactiveplayback

bullTime to load firstmedia frame

bullPlayback Cut-off

bullVideo resolution

bullContent Stall

i) Uses cases

vi) Synthetic MOS iv) Context

Networkscenarios

App User Flow Measurements KPIs MOS

ii) Domains

Apps Energy Consumption

Apps Device Resources

Apps User Experience

Mea

sure

men

ts

v) Test case execution

iii)Test caseAUECS01

Figure 4 QoE computation steps

G1030 ldquoEstimating end-to-end performance in IP networksfor data applicationsrdquo [9] and ITU-T G1031 ldquoQoE factors inweb-browsingrdquo [10] have been considered in TRIANGLEIn particular ITU-T G1031 [10] identifies the following con-text influence factors location (cafeteria office and home)interactivity (high-level interactivity versus low-level inter-activity) task type (business entertainment etc) and taskurgency (urgent versus casual) Userrsquos influence factors arehowever outside of the scope of the ITU recommendation

In the TRIANGLE project the context information hasbeen captured in the networks scenarios defined (Urban -Internet Cafe Off Peak Suburban - Shopping Mall BusyHours Urban ndash Pedestrian Urban ndash Office High speed trainndash Relay etc) and in the test cases specified in [11]

The test cases specify the conditions of the test butalso a sequence of actions that have to be executed by theapplication (app user flows) to test its features For examplethe test case that tests the ldquoPlay and Pauserdquo functionalitydefines the app user flow shown in Figure 5

The transformation of KPIs into QoE scores is the mostchallenging step in the TRIANGLE framework The execu-tion of the test cases will generate a significant amount of rawmeasurements about several aspects of the system SpecificKPIs can then be extracted through statistical analysis meandeviation cumulative distribution function (CDF) or ratio

TheKPIs will be individually interpolated in order to pro-vide a common homogeneous comparison and aggregationspace The interpolation is based on the application of twofunctions named Type I and Type II By using the proposedtwo types of interpolations the vast majority of KPIs can betranslated into normalized MOS-type of metric (synthetic-MOS) easy to be averaged in order to provide a simpleunified evaluation

Type I This function performs a linear interpolation on theoriginal data The variables 119898119894119899

119870119875119868and119898119886119909

119870119875119868are the worst

and best known values of a KPI from a reference case The

Perform login step (ifrequired) and wait for

10 seconds

Start playing avideo of 5

minutes during10 seconds

Pause thereproduction

Resume thereproduction after

2 minutes anduntil the end of

the video

Figure 5 App user flow used in the ldquoAUECS02 Play and Pauserdquotest case

function maps a value v of a KPI to vrsquo (synthetic-MOS) inthe range [1-to-5] by computing the following formula

V1015840 =V minus 119898119894119899

119870119875119868

119898119886119909119870119875119868minus 119898119894119899

119870119875119868

(50 minus 10) + 10 (1)

This function transforms a KPI to a synthetic-MOS value byapplying a simple linear interpolation between the worst andbest expected values from a reference case If a future inputcase falls outside the data range of the KPI the new value will

8 Wireless Communications and Mobile Computing

Table 3 AUECS002 test case description

Identifier AUECS002 (App User ExperienceContent Streaming002)Title Play and pauseObjective Measure the ability of the AUT to pause and the resume a media fileApplicability (ICSG ProductType = Application) AND (ICSG UseCases includes CS) AND ICSA CSPauseInitial Conditions AUT in in [AUT STARTED] mode (Note Defined in D22 [11] Appendix 4)

Steps(1) The Test System commands the AUT to replay the Application User Flow (Application User Flow that

presses first the Play button and later the Pause button)(2) The Test System measures whether pause operation was successful or not

Postamble (i) Execute the Postamble sequence (see section 26 in D22 [11] Appendix 4)

Measurements (Raw)

(i) Playback Cut-off Probability that successfully started stream reproduction is ended by a cause other thanthe intentional termination by the user

(ii) Pause Operation Whether pause operation is successful or not(iii) Time to load first media frame (s) after resuming The time elapsed since the user clicks resume button

until the media reproduction starts(Note For Exoplayer the RESUME button is the PLAY button)

be set to the extreme value minKPI (if it is worse) or maxKPI(if it is better)

Type II This function performs a logarithmic interpolationand is inspired on the opinion model recommended by theITU-T in [9] for a simple web search taskThis function mapsa value v of a KPI to vrsquo (synthetic-MOS) in the range [1-to-5]by computing the following formula

V1015840 =50 minus 10

ln ((119886 lowast 119908119900119903119904119905119870119875119868+ 119887) 119908119900119903119904119905

119870119875119868)

∙ (ln (V) minus ln (119886 lowast 119908119900119903119904119905119870119875119868+ 119886)) + 5

(2)

The default values of 119886 and 119887 correspond to the simple websearch task case (119886 = 0003 and 119887 = 012) [9 22] and theworst value has been extracted from the ITU-T G1030 Ifduring experimentation a future input case falls outside thedata range of the KPI the parameters 119886 and 119887will be updatedaccordingly Likewise if through subjective experimentationother values are considered better adjustments for specificservices the function can be easily updated

Once all KPIs are translated into synthetic-MOS valuesthey can be averaged with suitable weights In the averagingprocess the first step is to average over the network scenariosconsidered relevant for the use case as shown in Figure 2This provides the synthetic-MOS output value for the testcase If there is more than one test case per domain which isgenerally the case a weighted average is calculated in order toprovide one synthetic-MOS value per domain as depicted inFigure 3The final step is to average the synthetic-MOS scoresover all use cases supported by the application (see Figure 3)This provides the final score that is the TRIANGLE mark

6 A Practical Case Exoplayer under Test

For better understanding the complete process of obtainingtheTRIANGLEmark for a specific application the Exoplayer

Table4Measurement points associatedwith test caseAUECS002

Measurements Measurement points

Time to load first media frame Media File Playback - StartMedia File Playback - First Picture

Playback cut-off Media File Playback - StartMedia File Playback - End

Pause Media File Playback - Pause

is described in this section This application only has one usecase content distribution streaming services (CS)

Exoplayer is an application levelmedia player forAndroidpromoted by Google It provides an alternative to AndroidrsquosMediaPlayer API for playing audio and video both locally andover the Internet Exoplayer supports features not currentlysupported by Androidrsquos MediaPlayer API including DASHand SmoothStreaming adaptive playbacks

The TRIANGLE project has concentrated in testing justtwo of the Exoplayer features ldquoNoninteractive Playbackrdquoand ldquoPlay and Pauserdquo These features result in 6 test casesapplicable out of the test cases defined in TRIANGLETheseare test cases AUECS001 and AUECS002 in the App UserExperience domain test casesAECCS001 andAECCS002in the App Energy Consumption domain and test casesRESCS001 and RESCS002 in the Device Resources Usagedomain

The AUECS002 ldquoPlay and Pauserdquo test case descriptionbelonging to the AUE domain is shown in Table 3 The testcase description specifies the test conditions the generic appuser flow and the rawmeasurements which shall be collectedduring the execution of the test

The TRIANGLE project also offers a library that includesthe measurement points that should be inserted in thesource code of the app for enabling the collection of themeasurements specified Table 4 shows the measurementpoints required to compute the measurements specified intest case AUECS002

Wireless Communications and Mobile Computing 9

Table 5 Reference values for interpolation

Feature Domain KPI Synthetic MOS Calculation KPI min KPI maxNon-Interactive Playback AEC Average power consumption Type I 10 W 08 WNon-Interactive Playback AUE Time to load first media frame Type II KPI worst=20 msNon-Interactive Playback AUE Playback cut-off ratio Type I 50 0Non-Interactive Playback AUE Video resolution Type I 240p 720pNon-Interactive Playback RES Average CPU usage Type I 100 16Non-Interactive Playback RES Average memory usage Type I 100 40Play and Pause AEC Average power consumption Type I 10 W 08 WPlay and Pause AUE Pause operation success rate Type I 50 100Play and Pause RES Average CPU usage Type I 100 16Play and Pause RES Average memory usage Type I 100 40

The time to load first media picture measurement isobtained subtracting the timestamp of the measurementpoint ldquoMedia File Playback ndash Startrdquo from the measurementpoint ldquoMedia File Playback ndash First Picturerdquo

As specified in [11] all scenarios defined are applicableto the content streaming use case Therefore test cases inthe three domains currently supported by the testbed areexecuted in all the scenarios

Once the test campaign has finished the raw measure-ment results are processed to obtain the KPIs associated witheach test case average current consumption average time toload first media frame average CPU usage and so forth Theprocesses applied are detailed in Table 5 Based on previousexperiments performed by the authors the behaviour of thetime to load the first media frame KPI resembles the webresponse time KPI (ie the amount of time the user hasto wait for the service) and thus as recommended in theopinionmodel forweb search introduced in [9] a logarithmicinterpolation (type II) has been used for this metric

The results of the initial process that is the KPIs compu-tation are translated into synthetics-MOS values To computethese values reference benchmarking values for each of theKPIs need to be used according to the normalization andinterpolation process described in Section 5 Table 5 showswhat has been currently used by TRIANGLE for the AppUser Experience domain which is also used by NGMN asreference in their precommercial Trials document [23]

For example for the ldquotime to load first media framerdquo KPIshown in Table 5 the type of aggregation applied is averagingand the interpolation formula used is Type II

To achieve stable results each test case is executed 10times (10 iterations) in each network scenario The synthetic-MOS value in each domain is calculated by averaging themeasured synthetic-MOS values in the domain For examplesynthetic-MOS value is the RES domain obtained by aver-aging the synthetic-MOS value of ldquoaverage CPU usagerdquo andldquoaverage memory usagerdquo from the two test cases

Although Exoplayer supports several video streamingprotocols in this work only DASH [24] (Dynamic AdaptiveStreaming over HTTP) has been tested DASH clients shouldseamlessly adapt to changing network conditions by makingdecisions on which video segment to download (videosare encoded at multiple bitrates) The Exoplayerrsquos default

000

000

0

001

728

0

003

456

0

005

184

0

010

912

0

012

640

0

014

368

0

020

096

0

Timestamp

Video Resolution

0

200

400

600

800

1000

1200

Hor

izon

tal R

esol

utio

n

Figure 6 Video Resolution evolution in the Driving Urban Normalscenario

adaptation algorithm is basically throughput-based and someparameters control how often and when switching can occur

During the testing the testbed was configured with thedifferent network scenarios defined in [11] In these scenariosthe network configuration changes dynamically following arandom pattern resulting in different maximum throughputrates The expected behaviour of the application under testis that the video streaming client adapts to the availablethroughput by decreasing or increasing the resolution of thereceived video Figure 6 depicts how the client effectivelyadapts to the channel conditions

However the objective of the testing carried out in theTRIANGE testbed is not just to verify that the video stream-ing client actually adapts to the available maximum through-put but also to check whether this adaptation improves theusersrsquo experience quality

Table 6 shows a summary of the synthetic-MOS valuesobtained per scenario in one test case of each domain Thescores obtained in the RES andAECdomains are always highIn the AUE domain the synthetic MOS associated with theVideo Resolution shows low scores in some of the scenariosbecause the resolution decreases reasonable good scores inthe time to load first media and high scores in the time toplayback cut-off ratio Overall it can be concluded that the

10 Wireless Communications and Mobile Computing

Table 6 Synthetic MOS values per test case and scenario for the feature ldquoNoninteractive Playbackrdquo

AUE domain AEC domain RES domain

Test Case AUECS001 Test CaseAECCS001 Test Case RESCS001

ScenarioTime to loadfirst mediaframe

PlaybackCut-off ratio

VideoResolution

mode

AveragePower

Consumption

Average CPUUsage

AverageRAM Usage

HighSpeed DirectPassenger 21 31 23 47 43 42

Suburban Festival 38 47 31 48 43 41Suburban shopping mallbusy hours 37 37 13 48 44 41

Suburban shopping malloff-peak 36 31 23 48 43 41

Suburban stadium 38 29 21 47 44 41Urban Driving Normal 26 39 28 47 44 4Urban Driving TrafficJam 34 37 16 48 44 4

Urban Internet Cafe BusyHours 38 37 19 48 44 4

Urban Internet Cafe OffPeak 38 31 23 48 43 4

Urban Office 38 47 33 48 45 43Urban Pedestrian 39 26 2 47 44 4

35 36 23 47 44 41

DASH implementation of the video streaming client undertest is able to adapt to the changing conditions of the networkmaintaining an acceptable rate of video cut-off rebufferingtimes and resources usage

The final score in each domain is obtained by averagingthe synthetic-MOS values from all the tested network scenar-ios Figure 7 shows the spider diagram for the three domainstested In the User Experience domain the score obtained islower than the other domains due to the low synthetic-MOSvalues obtained for the video resolution

The final synthetic MOS for the use case Content Dis-tribution Streaming is obtained as a weighted average of thethree domains representing the overall QoE as perceived bythe userThefinal score for the Exoplayer version 1516 and thefeatures tested (Noninteractive Playback and Play and Pause)is 42 which means that the low score obtained in the videoresolution is compensated with the high scores in other KPIs

If an application under test has more than one use casethe next steps in the TRIANGLE mark project approachwould be the aggregation per use case and the aggregationover all use cases The final score the TRIANGLE mark is anestimation of the overall QoE as perceived by the user

In the current TRIANGLE implementation the weightsin all aggregations are the same Further research is neededto appropriately define the weights of each domain and eachuse case in the overall score of the applications

7 Conclusions

The main contribution of the TRIANGLE project is theprovision of a framework that generalizes QoE computation

and enables the execution of extensive and repeatable testcampaigns to obtainmeaningfulQoE scoresTheTRIANGLEproject has also defined amethodology which is based on thetransformation and aggregation of KPIs its transformationinto synthetic-MOS values and its aggregation over thedifferent domains and use cases

The TRIANGLE approach is a methodology flexibleenough to generalize the computation of QoE for any applica-tionservice Themethodology has been validated testing theDASH implementation in the Exoplayer App To confirm thesuitability of theweights used in the averaging process and theinterpolation parameters as well as to verify the correlationof the obtained MOS with that scored by users the authorshave started experiments with real users and initial results areencouraging

The process described produces a final TRIANGLEmarka single quality score which could eventually be used to cer-tify applications after achieving a consensus on the differentvalues of the process (weights limits etc) to use

Data Availability

Themethodology and results used to support the findings ofthis study are included within the article

Conflicts of Interest

The authors declare that they have no conflicts of inter-est

Wireless Communications and Mobile Computing 11

DER

AUE AEC

Device Resource Usage

User Experience Energy Consumption

Figure 7 Exoplayer synthetic-MOS values per domain

Acknowledgments

The TRIANGLE project is funded by the European UnionrsquosHorizon 2020 Research and Innovation Programme (GrantAgreement no 688712)

References

[1] ETSI ldquoHuman factors quality of experience (QoE) require-ments for real-time communication servicesrdquo Tech Rep 102643 2010

[2] ITU-T ldquoP10G100 (2006) amendment 1 (0107) new appendixI - definition of quality of experience (QoE)rdquo 2007

[3] F Kozamernik V Steinmann P Sunna and E WyckensldquoSAMVIQ - A new EBUmethodology for video quality evalua-tions in multimediardquo SMPTE Motion Imaging Journal vol 114no 4 pp 152ndash160 2005

[4] ITU-T ldquoG107 the E-model a computational model for use intransmission planningrdquo 2015

[5] J De Vriendt D De Vleeschauwer and D C Robinson ldquoQoEmodel for video delivered over an LTE network using HTTPadaptive streamingrdquo Bell Labs Technical Journal vol 18 no 4pp 45ndash62 2014

[6] S Jelassi G Rubino H Melvin H Youssef and G PujolleldquoQuality of Experience of VoIP Service A Survey of AssessmentApproaches andOpen Issuesrdquo IEEECommunications Surveys ampTutorials vol 14 no 2 pp 491ndash513 2012

[7] M Li C-L Yeh and S-Y Lu ldquoReal-Time QoE MonitoringSystem forVideo Streaming ServiceswithAdaptiveMedia Play-outrdquo International Journal of Digital Multimedia Broadcastingvol 2018 Article ID 2619438 11 pages 2018

[8] S Barakovic and L Skorin-Kapov ldquoSurvey and Challengesof QoE Management Issues in Wireless Networksrdquo Journal ofComputer Networks and Communications vol 2013 Article ID165146 28 pages 2013

[9] ITU-T ldquoG1030 estimating end-to-end performance in IPnetworks for data applicationsrdquo 2014

[10] ITU-T ldquoG1031 QoE factors in web-browsingrdquo 2014[11] EU H2020 TRIANGLE Project Deliverable D22 Final report

on the formalization of the certification process requirementsanduse cases 2017 httpswwwtriangle-projecteuproject-olddeliverables

[12] Q A Chen H Luo S Rosen et al ldquoQoE doctor diagnosingmobile app QoE with automated UI control and cross-layeranalysisrdquo in Proceedings of the Conference on Internet Mea-surement Conference (IMC rsquo14) pp 151ndash164 ACM VancouverCanada November 2014

[13] M A Mehmood A Wundsam S Uhlig D Levin N Sarrarand A Feldmann ldquoQoE-Lab Towards Evaluating Quality ofExperience for Future Internet Conditionsrdquo in Testbeds andResearch Infrastructure Korakis T Li H Tran-Gia P and HS Park Eds vol 90 of TridentCom 2011 Lnicst pp 286ndash301Springer Development of Networks and Communities BerlinGermany 2012

[14] D Levin A Wundsam A Mehmood and A FeldmannldquoBerlin The Berlin Experimental Router Laboratory for Inno-vative Networkingrdquo in TridentCom 2010 Lnicst T MagedanzA Gavras N H Thanh and J S Chase Eds vol 46 of LectureNotes of the Institute for Computer Sciences Social Informaticsand Telecommunications Engineering pp 602ndash604 SpringerHeidelberg Germany 2011

12 Wireless Communications and Mobile Computing

[15] K De Moor I Ketyko W Joseph et al ldquoProposed frameworkfor evaluating quality of experience in a mobile testbed-oriented living lab settingrdquo Mobile Networks and Applicationsvol 15 no 3 pp 378ndash391 2010

[16] R Sanchez-Iborra M-D Cano J J P C Rodrigues and JGarcia-Haro ldquoAnExperimental QoE Performance Study for theEfficient Transmission of High Demanding Traffic over an AdHoc Network Using BATMANrdquo Mobile Information Systemsvol 2015 Article ID 217106 14 pages 2015

[17] P Oliver-Balsalobre M Toril S Luna-Ramırez and R GarcıaGaraluz ldquoA system testbed for modeling encrypted video-streaming service performance indicators based on TCPIPmetricsrdquo EURASIP Journal on Wireless Communications andNetworking vol 2017 no 1 2017

[18] M Solera M Toril I Palomo G Gomez and J Poncela ldquoATestbed for Evaluating Video Streaming Services in LTErdquoWireless Personal Communications vol 98 no 3 pp 2753ndash27732018

[19] A Alvarez A Dıaz P Merino and F J Rivas ldquoField mea-surements of mobile services with Android smartphonesrdquoin Proceedings of the IEEE Consumer Communications andNetworking Conference (CCNC rsquo12) pp 105ndash109 Las Vegas NevUSA January 2012

[20] NGMN Alliance ldquoNGMN 5G white paperrdquo 2015 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2015NGMN 5G White Paper V1 0pdf

[21] ldquoInfrastructure and Design for Adaptivity and Flexibilityrdquo inMobile Information Systems Springer 2006

[22] J Nielsen ldquoResponse Times The Three Important Limitsrdquo inUsability Engineering 1993

[23] NGMN Alliance ldquoDefinition of the testing framework for theNGMN 5G pre-commercial networks trialsrdquo 2018 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2018180220 NGMN PreCommTrials Framework definition v1 0pdf

[24] 3GPP TS 26246 ldquoTransparent end-to-end Packet-switchedStreaming Services (PSS) Progressive Download and DynamicAdaptive Streaming over HTTP (3GP-DASH)rdquo 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: QoE Evaluation: The TRIANGLE Testbed Approach

Wireless Communications and Mobile Computing 7

Feature Non-interactiveplayback

bullTime to load firstmedia frame

bullPlayback Cut-off

bullVideo resolution

bullContent Stall

i) Uses cases

vi) Synthetic MOS iv) Context

Networkscenarios

App User Flow Measurements KPIs MOS

ii) Domains

Apps Energy Consumption

Apps Device Resources

Apps User Experience

Mea

sure

men

ts

v) Test case execution

iii)Test caseAUECS01

Figure 4 QoE computation steps

G1030 ldquoEstimating end-to-end performance in IP networksfor data applicationsrdquo [9] and ITU-T G1031 ldquoQoE factors inweb-browsingrdquo [10] have been considered in TRIANGLEIn particular ITU-T G1031 [10] identifies the following con-text influence factors location (cafeteria office and home)interactivity (high-level interactivity versus low-level inter-activity) task type (business entertainment etc) and taskurgency (urgent versus casual) Userrsquos influence factors arehowever outside of the scope of the ITU recommendation

In the TRIANGLE project the context information hasbeen captured in the networks scenarios defined (Urban -Internet Cafe Off Peak Suburban - Shopping Mall BusyHours Urban ndash Pedestrian Urban ndash Office High speed trainndash Relay etc) and in the test cases specified in [11]

The test cases specify the conditions of the test butalso a sequence of actions that have to be executed by theapplication (app user flows) to test its features For examplethe test case that tests the ldquoPlay and Pauserdquo functionalitydefines the app user flow shown in Figure 5

The transformation of KPIs into QoE scores is the mostchallenging step in the TRIANGLE framework The execu-tion of the test cases will generate a significant amount of rawmeasurements about several aspects of the system SpecificKPIs can then be extracted through statistical analysis meandeviation cumulative distribution function (CDF) or ratio

TheKPIs will be individually interpolated in order to pro-vide a common homogeneous comparison and aggregationspace The interpolation is based on the application of twofunctions named Type I and Type II By using the proposedtwo types of interpolations the vast majority of KPIs can betranslated into normalized MOS-type of metric (synthetic-MOS) easy to be averaged in order to provide a simpleunified evaluation

Type I This function performs a linear interpolation on theoriginal data The variables 119898119894119899

119870119875119868and119898119886119909

119870119875119868are the worst

and best known values of a KPI from a reference case The

Perform login step (ifrequired) and wait for

10 seconds

Start playing avideo of 5

minutes during10 seconds

Pause thereproduction

Resume thereproduction after

2 minutes anduntil the end of

the video

Figure 5 App user flow used in the ldquoAUECS02 Play and Pauserdquotest case

function maps a value v of a KPI to vrsquo (synthetic-MOS) inthe range [1-to-5] by computing the following formula

V1015840 =V minus 119898119894119899

119870119875119868

119898119886119909119870119875119868minus 119898119894119899

119870119875119868

(50 minus 10) + 10 (1)

This function transforms a KPI to a synthetic-MOS value byapplying a simple linear interpolation between the worst andbest expected values from a reference case If a future inputcase falls outside the data range of the KPI the new value will

8 Wireless Communications and Mobile Computing

Table 3 AUECS002 test case description

Identifier AUECS002 (App User ExperienceContent Streaming002)Title Play and pauseObjective Measure the ability of the AUT to pause and the resume a media fileApplicability (ICSG ProductType = Application) AND (ICSG UseCases includes CS) AND ICSA CSPauseInitial Conditions AUT in in [AUT STARTED] mode (Note Defined in D22 [11] Appendix 4)

Steps(1) The Test System commands the AUT to replay the Application User Flow (Application User Flow that

presses first the Play button and later the Pause button)(2) The Test System measures whether pause operation was successful or not

Postamble (i) Execute the Postamble sequence (see section 26 in D22 [11] Appendix 4)

Measurements (Raw)

(i) Playback Cut-off Probability that successfully started stream reproduction is ended by a cause other thanthe intentional termination by the user

(ii) Pause Operation Whether pause operation is successful or not(iii) Time to load first media frame (s) after resuming The time elapsed since the user clicks resume button

until the media reproduction starts(Note For Exoplayer the RESUME button is the PLAY button)

be set to the extreme value minKPI (if it is worse) or maxKPI(if it is better)

Type II This function performs a logarithmic interpolationand is inspired on the opinion model recommended by theITU-T in [9] for a simple web search taskThis function mapsa value v of a KPI to vrsquo (synthetic-MOS) in the range [1-to-5]by computing the following formula

V1015840 =50 minus 10

ln ((119886 lowast 119908119900119903119904119905119870119875119868+ 119887) 119908119900119903119904119905

119870119875119868)

∙ (ln (V) minus ln (119886 lowast 119908119900119903119904119905119870119875119868+ 119886)) + 5

(2)

The default values of 119886 and 119887 correspond to the simple websearch task case (119886 = 0003 and 119887 = 012) [9 22] and theworst value has been extracted from the ITU-T G1030 Ifduring experimentation a future input case falls outside thedata range of the KPI the parameters 119886 and 119887will be updatedaccordingly Likewise if through subjective experimentationother values are considered better adjustments for specificservices the function can be easily updated

Once all KPIs are translated into synthetic-MOS valuesthey can be averaged with suitable weights In the averagingprocess the first step is to average over the network scenariosconsidered relevant for the use case as shown in Figure 2This provides the synthetic-MOS output value for the testcase If there is more than one test case per domain which isgenerally the case a weighted average is calculated in order toprovide one synthetic-MOS value per domain as depicted inFigure 3The final step is to average the synthetic-MOS scoresover all use cases supported by the application (see Figure 3)This provides the final score that is the TRIANGLE mark

6 A Practical Case Exoplayer under Test

For better understanding the complete process of obtainingtheTRIANGLEmark for a specific application the Exoplayer

Table4Measurement points associatedwith test caseAUECS002

Measurements Measurement points

Time to load first media frame Media File Playback - StartMedia File Playback - First Picture

Playback cut-off Media File Playback - StartMedia File Playback - End

Pause Media File Playback - Pause

is described in this section This application only has one usecase content distribution streaming services (CS)

Exoplayer is an application levelmedia player forAndroidpromoted by Google It provides an alternative to AndroidrsquosMediaPlayer API for playing audio and video both locally andover the Internet Exoplayer supports features not currentlysupported by Androidrsquos MediaPlayer API including DASHand SmoothStreaming adaptive playbacks

The TRIANGLE project has concentrated in testing justtwo of the Exoplayer features ldquoNoninteractive Playbackrdquoand ldquoPlay and Pauserdquo These features result in 6 test casesapplicable out of the test cases defined in TRIANGLETheseare test cases AUECS001 and AUECS002 in the App UserExperience domain test casesAECCS001 andAECCS002in the App Energy Consumption domain and test casesRESCS001 and RESCS002 in the Device Resources Usagedomain

The AUECS002 ldquoPlay and Pauserdquo test case descriptionbelonging to the AUE domain is shown in Table 3 The testcase description specifies the test conditions the generic appuser flow and the rawmeasurements which shall be collectedduring the execution of the test

The TRIANGLE project also offers a library that includesthe measurement points that should be inserted in thesource code of the app for enabling the collection of themeasurements specified Table 4 shows the measurementpoints required to compute the measurements specified intest case AUECS002

Wireless Communications and Mobile Computing 9

Table 5 Reference values for interpolation

Feature Domain KPI Synthetic MOS Calculation KPI min KPI maxNon-Interactive Playback AEC Average power consumption Type I 10 W 08 WNon-Interactive Playback AUE Time to load first media frame Type II KPI worst=20 msNon-Interactive Playback AUE Playback cut-off ratio Type I 50 0Non-Interactive Playback AUE Video resolution Type I 240p 720pNon-Interactive Playback RES Average CPU usage Type I 100 16Non-Interactive Playback RES Average memory usage Type I 100 40Play and Pause AEC Average power consumption Type I 10 W 08 WPlay and Pause AUE Pause operation success rate Type I 50 100Play and Pause RES Average CPU usage Type I 100 16Play and Pause RES Average memory usage Type I 100 40

The time to load first media picture measurement isobtained subtracting the timestamp of the measurementpoint ldquoMedia File Playback ndash Startrdquo from the measurementpoint ldquoMedia File Playback ndash First Picturerdquo

As specified in [11] all scenarios defined are applicableto the content streaming use case Therefore test cases inthe three domains currently supported by the testbed areexecuted in all the scenarios

Once the test campaign has finished the raw measure-ment results are processed to obtain the KPIs associated witheach test case average current consumption average time toload first media frame average CPU usage and so forth Theprocesses applied are detailed in Table 5 Based on previousexperiments performed by the authors the behaviour of thetime to load the first media frame KPI resembles the webresponse time KPI (ie the amount of time the user hasto wait for the service) and thus as recommended in theopinionmodel forweb search introduced in [9] a logarithmicinterpolation (type II) has been used for this metric

The results of the initial process that is the KPIs compu-tation are translated into synthetics-MOS values To computethese values reference benchmarking values for each of theKPIs need to be used according to the normalization andinterpolation process described in Section 5 Table 5 showswhat has been currently used by TRIANGLE for the AppUser Experience domain which is also used by NGMN asreference in their precommercial Trials document [23]

For example for the ldquotime to load first media framerdquo KPIshown in Table 5 the type of aggregation applied is averagingand the interpolation formula used is Type II

To achieve stable results each test case is executed 10times (10 iterations) in each network scenario The synthetic-MOS value in each domain is calculated by averaging themeasured synthetic-MOS values in the domain For examplesynthetic-MOS value is the RES domain obtained by aver-aging the synthetic-MOS value of ldquoaverage CPU usagerdquo andldquoaverage memory usagerdquo from the two test cases

Although Exoplayer supports several video streamingprotocols in this work only DASH [24] (Dynamic AdaptiveStreaming over HTTP) has been tested DASH clients shouldseamlessly adapt to changing network conditions by makingdecisions on which video segment to download (videosare encoded at multiple bitrates) The Exoplayerrsquos default

000

000

0

001

728

0

003

456

0

005

184

0

010

912

0

012

640

0

014

368

0

020

096

0

Timestamp

Video Resolution

0

200

400

600

800

1000

1200

Hor

izon

tal R

esol

utio

n

Figure 6 Video Resolution evolution in the Driving Urban Normalscenario

adaptation algorithm is basically throughput-based and someparameters control how often and when switching can occur

During the testing the testbed was configured with thedifferent network scenarios defined in [11] In these scenariosthe network configuration changes dynamically following arandom pattern resulting in different maximum throughputrates The expected behaviour of the application under testis that the video streaming client adapts to the availablethroughput by decreasing or increasing the resolution of thereceived video Figure 6 depicts how the client effectivelyadapts to the channel conditions

However the objective of the testing carried out in theTRIANGE testbed is not just to verify that the video stream-ing client actually adapts to the available maximum through-put but also to check whether this adaptation improves theusersrsquo experience quality

Table 6 shows a summary of the synthetic-MOS valuesobtained per scenario in one test case of each domain Thescores obtained in the RES andAECdomains are always highIn the AUE domain the synthetic MOS associated with theVideo Resolution shows low scores in some of the scenariosbecause the resolution decreases reasonable good scores inthe time to load first media and high scores in the time toplayback cut-off ratio Overall it can be concluded that the

10 Wireless Communications and Mobile Computing

Table 6 Synthetic MOS values per test case and scenario for the feature ldquoNoninteractive Playbackrdquo

AUE domain AEC domain RES domain

Test Case AUECS001 Test CaseAECCS001 Test Case RESCS001

ScenarioTime to loadfirst mediaframe

PlaybackCut-off ratio

VideoResolution

mode

AveragePower

Consumption

Average CPUUsage

AverageRAM Usage

HighSpeed DirectPassenger 21 31 23 47 43 42

Suburban Festival 38 47 31 48 43 41Suburban shopping mallbusy hours 37 37 13 48 44 41

Suburban shopping malloff-peak 36 31 23 48 43 41

Suburban stadium 38 29 21 47 44 41Urban Driving Normal 26 39 28 47 44 4Urban Driving TrafficJam 34 37 16 48 44 4

Urban Internet Cafe BusyHours 38 37 19 48 44 4

Urban Internet Cafe OffPeak 38 31 23 48 43 4

Urban Office 38 47 33 48 45 43Urban Pedestrian 39 26 2 47 44 4

35 36 23 47 44 41

DASH implementation of the video streaming client undertest is able to adapt to the changing conditions of the networkmaintaining an acceptable rate of video cut-off rebufferingtimes and resources usage

The final score in each domain is obtained by averagingthe synthetic-MOS values from all the tested network scenar-ios Figure 7 shows the spider diagram for the three domainstested In the User Experience domain the score obtained islower than the other domains due to the low synthetic-MOSvalues obtained for the video resolution

The final synthetic MOS for the use case Content Dis-tribution Streaming is obtained as a weighted average of thethree domains representing the overall QoE as perceived bythe userThefinal score for the Exoplayer version 1516 and thefeatures tested (Noninteractive Playback and Play and Pause)is 42 which means that the low score obtained in the videoresolution is compensated with the high scores in other KPIs

If an application under test has more than one use casethe next steps in the TRIANGLE mark project approachwould be the aggregation per use case and the aggregationover all use cases The final score the TRIANGLE mark is anestimation of the overall QoE as perceived by the user

In the current TRIANGLE implementation the weightsin all aggregations are the same Further research is neededto appropriately define the weights of each domain and eachuse case in the overall score of the applications

7 Conclusions

The main contribution of the TRIANGLE project is theprovision of a framework that generalizes QoE computation

and enables the execution of extensive and repeatable testcampaigns to obtainmeaningfulQoE scoresTheTRIANGLEproject has also defined amethodology which is based on thetransformation and aggregation of KPIs its transformationinto synthetic-MOS values and its aggregation over thedifferent domains and use cases

The TRIANGLE approach is a methodology flexibleenough to generalize the computation of QoE for any applica-tionservice Themethodology has been validated testing theDASH implementation in the Exoplayer App To confirm thesuitability of theweights used in the averaging process and theinterpolation parameters as well as to verify the correlationof the obtained MOS with that scored by users the authorshave started experiments with real users and initial results areencouraging

The process described produces a final TRIANGLEmarka single quality score which could eventually be used to cer-tify applications after achieving a consensus on the differentvalues of the process (weights limits etc) to use

Data Availability

Themethodology and results used to support the findings ofthis study are included within the article

Conflicts of Interest

The authors declare that they have no conflicts of inter-est

Wireless Communications and Mobile Computing 11

DER

AUE AEC

Device Resource Usage

User Experience Energy Consumption

Figure 7 Exoplayer synthetic-MOS values per domain

Acknowledgments

The TRIANGLE project is funded by the European UnionrsquosHorizon 2020 Research and Innovation Programme (GrantAgreement no 688712)

References

[1] ETSI ldquoHuman factors quality of experience (QoE) require-ments for real-time communication servicesrdquo Tech Rep 102643 2010

[2] ITU-T ldquoP10G100 (2006) amendment 1 (0107) new appendixI - definition of quality of experience (QoE)rdquo 2007

[3] F Kozamernik V Steinmann P Sunna and E WyckensldquoSAMVIQ - A new EBUmethodology for video quality evalua-tions in multimediardquo SMPTE Motion Imaging Journal vol 114no 4 pp 152ndash160 2005

[4] ITU-T ldquoG107 the E-model a computational model for use intransmission planningrdquo 2015

[5] J De Vriendt D De Vleeschauwer and D C Robinson ldquoQoEmodel for video delivered over an LTE network using HTTPadaptive streamingrdquo Bell Labs Technical Journal vol 18 no 4pp 45ndash62 2014

[6] S Jelassi G Rubino H Melvin H Youssef and G PujolleldquoQuality of Experience of VoIP Service A Survey of AssessmentApproaches andOpen Issuesrdquo IEEECommunications Surveys ampTutorials vol 14 no 2 pp 491ndash513 2012

[7] M Li C-L Yeh and S-Y Lu ldquoReal-Time QoE MonitoringSystem forVideo Streaming ServiceswithAdaptiveMedia Play-outrdquo International Journal of Digital Multimedia Broadcastingvol 2018 Article ID 2619438 11 pages 2018

[8] S Barakovic and L Skorin-Kapov ldquoSurvey and Challengesof QoE Management Issues in Wireless Networksrdquo Journal ofComputer Networks and Communications vol 2013 Article ID165146 28 pages 2013

[9] ITU-T ldquoG1030 estimating end-to-end performance in IPnetworks for data applicationsrdquo 2014

[10] ITU-T ldquoG1031 QoE factors in web-browsingrdquo 2014[11] EU H2020 TRIANGLE Project Deliverable D22 Final report

on the formalization of the certification process requirementsanduse cases 2017 httpswwwtriangle-projecteuproject-olddeliverables

[12] Q A Chen H Luo S Rosen et al ldquoQoE doctor diagnosingmobile app QoE with automated UI control and cross-layeranalysisrdquo in Proceedings of the Conference on Internet Mea-surement Conference (IMC rsquo14) pp 151ndash164 ACM VancouverCanada November 2014

[13] M A Mehmood A Wundsam S Uhlig D Levin N Sarrarand A Feldmann ldquoQoE-Lab Towards Evaluating Quality ofExperience for Future Internet Conditionsrdquo in Testbeds andResearch Infrastructure Korakis T Li H Tran-Gia P and HS Park Eds vol 90 of TridentCom 2011 Lnicst pp 286ndash301Springer Development of Networks and Communities BerlinGermany 2012

[14] D Levin A Wundsam A Mehmood and A FeldmannldquoBerlin The Berlin Experimental Router Laboratory for Inno-vative Networkingrdquo in TridentCom 2010 Lnicst T MagedanzA Gavras N H Thanh and J S Chase Eds vol 46 of LectureNotes of the Institute for Computer Sciences Social Informaticsand Telecommunications Engineering pp 602ndash604 SpringerHeidelberg Germany 2011

12 Wireless Communications and Mobile Computing

[15] K De Moor I Ketyko W Joseph et al ldquoProposed frameworkfor evaluating quality of experience in a mobile testbed-oriented living lab settingrdquo Mobile Networks and Applicationsvol 15 no 3 pp 378ndash391 2010

[16] R Sanchez-Iborra M-D Cano J J P C Rodrigues and JGarcia-Haro ldquoAnExperimental QoE Performance Study for theEfficient Transmission of High Demanding Traffic over an AdHoc Network Using BATMANrdquo Mobile Information Systemsvol 2015 Article ID 217106 14 pages 2015

[17] P Oliver-Balsalobre M Toril S Luna-Ramırez and R GarcıaGaraluz ldquoA system testbed for modeling encrypted video-streaming service performance indicators based on TCPIPmetricsrdquo EURASIP Journal on Wireless Communications andNetworking vol 2017 no 1 2017

[18] M Solera M Toril I Palomo G Gomez and J Poncela ldquoATestbed for Evaluating Video Streaming Services in LTErdquoWireless Personal Communications vol 98 no 3 pp 2753ndash27732018

[19] A Alvarez A Dıaz P Merino and F J Rivas ldquoField mea-surements of mobile services with Android smartphonesrdquoin Proceedings of the IEEE Consumer Communications andNetworking Conference (CCNC rsquo12) pp 105ndash109 Las Vegas NevUSA January 2012

[20] NGMN Alliance ldquoNGMN 5G white paperrdquo 2015 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2015NGMN 5G White Paper V1 0pdf

[21] ldquoInfrastructure and Design for Adaptivity and Flexibilityrdquo inMobile Information Systems Springer 2006

[22] J Nielsen ldquoResponse Times The Three Important Limitsrdquo inUsability Engineering 1993

[23] NGMN Alliance ldquoDefinition of the testing framework for theNGMN 5G pre-commercial networks trialsrdquo 2018 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2018180220 NGMN PreCommTrials Framework definition v1 0pdf

[24] 3GPP TS 26246 ldquoTransparent end-to-end Packet-switchedStreaming Services (PSS) Progressive Download and DynamicAdaptive Streaming over HTTP (3GP-DASH)rdquo 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: QoE Evaluation: The TRIANGLE Testbed Approach

8 Wireless Communications and Mobile Computing

Table 3 AUECS002 test case description

Identifier AUECS002 (App User ExperienceContent Streaming002)Title Play and pauseObjective Measure the ability of the AUT to pause and the resume a media fileApplicability (ICSG ProductType = Application) AND (ICSG UseCases includes CS) AND ICSA CSPauseInitial Conditions AUT in in [AUT STARTED] mode (Note Defined in D22 [11] Appendix 4)

Steps(1) The Test System commands the AUT to replay the Application User Flow (Application User Flow that

presses first the Play button and later the Pause button)(2) The Test System measures whether pause operation was successful or not

Postamble (i) Execute the Postamble sequence (see section 26 in D22 [11] Appendix 4)

Measurements (Raw)

(i) Playback Cut-off Probability that successfully started stream reproduction is ended by a cause other thanthe intentional termination by the user

(ii) Pause Operation Whether pause operation is successful or not(iii) Time to load first media frame (s) after resuming The time elapsed since the user clicks resume button

until the media reproduction starts(Note For Exoplayer the RESUME button is the PLAY button)

be set to the extreme value minKPI (if it is worse) or maxKPI(if it is better)

Type II This function performs a logarithmic interpolationand is inspired on the opinion model recommended by theITU-T in [9] for a simple web search taskThis function mapsa value v of a KPI to vrsquo (synthetic-MOS) in the range [1-to-5]by computing the following formula

V1015840 =50 minus 10

ln ((119886 lowast 119908119900119903119904119905119870119875119868+ 119887) 119908119900119903119904119905

119870119875119868)

∙ (ln (V) minus ln (119886 lowast 119908119900119903119904119905119870119875119868+ 119886)) + 5

(2)

The default values of 119886 and 119887 correspond to the simple websearch task case (119886 = 0003 and 119887 = 012) [9 22] and theworst value has been extracted from the ITU-T G1030 Ifduring experimentation a future input case falls outside thedata range of the KPI the parameters 119886 and 119887will be updatedaccordingly Likewise if through subjective experimentationother values are considered better adjustments for specificservices the function can be easily updated

Once all KPIs are translated into synthetic-MOS valuesthey can be averaged with suitable weights In the averagingprocess the first step is to average over the network scenariosconsidered relevant for the use case as shown in Figure 2This provides the synthetic-MOS output value for the testcase If there is more than one test case per domain which isgenerally the case a weighted average is calculated in order toprovide one synthetic-MOS value per domain as depicted inFigure 3The final step is to average the synthetic-MOS scoresover all use cases supported by the application (see Figure 3)This provides the final score that is the TRIANGLE mark

6 A Practical Case Exoplayer under Test

For better understanding the complete process of obtainingtheTRIANGLEmark for a specific application the Exoplayer

Table4Measurement points associatedwith test caseAUECS002

Measurements Measurement points

Time to load first media frame Media File Playback - StartMedia File Playback - First Picture

Playback cut-off Media File Playback - StartMedia File Playback - End

Pause Media File Playback - Pause

is described in this section This application only has one usecase content distribution streaming services (CS)

Exoplayer is an application levelmedia player forAndroidpromoted by Google It provides an alternative to AndroidrsquosMediaPlayer API for playing audio and video both locally andover the Internet Exoplayer supports features not currentlysupported by Androidrsquos MediaPlayer API including DASHand SmoothStreaming adaptive playbacks

The TRIANGLE project has concentrated in testing justtwo of the Exoplayer features ldquoNoninteractive Playbackrdquoand ldquoPlay and Pauserdquo These features result in 6 test casesapplicable out of the test cases defined in TRIANGLETheseare test cases AUECS001 and AUECS002 in the App UserExperience domain test casesAECCS001 andAECCS002in the App Energy Consumption domain and test casesRESCS001 and RESCS002 in the Device Resources Usagedomain

The AUECS002 ldquoPlay and Pauserdquo test case descriptionbelonging to the AUE domain is shown in Table 3 The testcase description specifies the test conditions the generic appuser flow and the rawmeasurements which shall be collectedduring the execution of the test

The TRIANGLE project also offers a library that includesthe measurement points that should be inserted in thesource code of the app for enabling the collection of themeasurements specified Table 4 shows the measurementpoints required to compute the measurements specified intest case AUECS002

Wireless Communications and Mobile Computing 9

Table 5 Reference values for interpolation

Feature Domain KPI Synthetic MOS Calculation KPI min KPI maxNon-Interactive Playback AEC Average power consumption Type I 10 W 08 WNon-Interactive Playback AUE Time to load first media frame Type II KPI worst=20 msNon-Interactive Playback AUE Playback cut-off ratio Type I 50 0Non-Interactive Playback AUE Video resolution Type I 240p 720pNon-Interactive Playback RES Average CPU usage Type I 100 16Non-Interactive Playback RES Average memory usage Type I 100 40Play and Pause AEC Average power consumption Type I 10 W 08 WPlay and Pause AUE Pause operation success rate Type I 50 100Play and Pause RES Average CPU usage Type I 100 16Play and Pause RES Average memory usage Type I 100 40

The time to load first media picture measurement isobtained subtracting the timestamp of the measurementpoint ldquoMedia File Playback ndash Startrdquo from the measurementpoint ldquoMedia File Playback ndash First Picturerdquo

As specified in [11] all scenarios defined are applicableto the content streaming use case Therefore test cases inthe three domains currently supported by the testbed areexecuted in all the scenarios

Once the test campaign has finished the raw measure-ment results are processed to obtain the KPIs associated witheach test case average current consumption average time toload first media frame average CPU usage and so forth Theprocesses applied are detailed in Table 5 Based on previousexperiments performed by the authors the behaviour of thetime to load the first media frame KPI resembles the webresponse time KPI (ie the amount of time the user hasto wait for the service) and thus as recommended in theopinionmodel forweb search introduced in [9] a logarithmicinterpolation (type II) has been used for this metric

The results of the initial process that is the KPIs compu-tation are translated into synthetics-MOS values To computethese values reference benchmarking values for each of theKPIs need to be used according to the normalization andinterpolation process described in Section 5 Table 5 showswhat has been currently used by TRIANGLE for the AppUser Experience domain which is also used by NGMN asreference in their precommercial Trials document [23]

For example for the ldquotime to load first media framerdquo KPIshown in Table 5 the type of aggregation applied is averagingand the interpolation formula used is Type II

To achieve stable results each test case is executed 10times (10 iterations) in each network scenario The synthetic-MOS value in each domain is calculated by averaging themeasured synthetic-MOS values in the domain For examplesynthetic-MOS value is the RES domain obtained by aver-aging the synthetic-MOS value of ldquoaverage CPU usagerdquo andldquoaverage memory usagerdquo from the two test cases

Although Exoplayer supports several video streamingprotocols in this work only DASH [24] (Dynamic AdaptiveStreaming over HTTP) has been tested DASH clients shouldseamlessly adapt to changing network conditions by makingdecisions on which video segment to download (videosare encoded at multiple bitrates) The Exoplayerrsquos default

000

000

0

001

728

0

003

456

0

005

184

0

010

912

0

012

640

0

014

368

0

020

096

0

Timestamp

Video Resolution

0

200

400

600

800

1000

1200

Hor

izon

tal R

esol

utio

n

Figure 6 Video Resolution evolution in the Driving Urban Normalscenario

adaptation algorithm is basically throughput-based and someparameters control how often and when switching can occur

During the testing the testbed was configured with thedifferent network scenarios defined in [11] In these scenariosthe network configuration changes dynamically following arandom pattern resulting in different maximum throughputrates The expected behaviour of the application under testis that the video streaming client adapts to the availablethroughput by decreasing or increasing the resolution of thereceived video Figure 6 depicts how the client effectivelyadapts to the channel conditions

However the objective of the testing carried out in theTRIANGE testbed is not just to verify that the video stream-ing client actually adapts to the available maximum through-put but also to check whether this adaptation improves theusersrsquo experience quality

Table 6 shows a summary of the synthetic-MOS valuesobtained per scenario in one test case of each domain Thescores obtained in the RES andAECdomains are always highIn the AUE domain the synthetic MOS associated with theVideo Resolution shows low scores in some of the scenariosbecause the resolution decreases reasonable good scores inthe time to load first media and high scores in the time toplayback cut-off ratio Overall it can be concluded that the

10 Wireless Communications and Mobile Computing

Table 6 Synthetic MOS values per test case and scenario for the feature ldquoNoninteractive Playbackrdquo

AUE domain AEC domain RES domain

Test Case AUECS001 Test CaseAECCS001 Test Case RESCS001

ScenarioTime to loadfirst mediaframe

PlaybackCut-off ratio

VideoResolution

mode

AveragePower

Consumption

Average CPUUsage

AverageRAM Usage

HighSpeed DirectPassenger 21 31 23 47 43 42

Suburban Festival 38 47 31 48 43 41Suburban shopping mallbusy hours 37 37 13 48 44 41

Suburban shopping malloff-peak 36 31 23 48 43 41

Suburban stadium 38 29 21 47 44 41Urban Driving Normal 26 39 28 47 44 4Urban Driving TrafficJam 34 37 16 48 44 4

Urban Internet Cafe BusyHours 38 37 19 48 44 4

Urban Internet Cafe OffPeak 38 31 23 48 43 4

Urban Office 38 47 33 48 45 43Urban Pedestrian 39 26 2 47 44 4

35 36 23 47 44 41

DASH implementation of the video streaming client undertest is able to adapt to the changing conditions of the networkmaintaining an acceptable rate of video cut-off rebufferingtimes and resources usage

The final score in each domain is obtained by averagingthe synthetic-MOS values from all the tested network scenar-ios Figure 7 shows the spider diagram for the three domainstested In the User Experience domain the score obtained islower than the other domains due to the low synthetic-MOSvalues obtained for the video resolution

The final synthetic MOS for the use case Content Dis-tribution Streaming is obtained as a weighted average of thethree domains representing the overall QoE as perceived bythe userThefinal score for the Exoplayer version 1516 and thefeatures tested (Noninteractive Playback and Play and Pause)is 42 which means that the low score obtained in the videoresolution is compensated with the high scores in other KPIs

If an application under test has more than one use casethe next steps in the TRIANGLE mark project approachwould be the aggregation per use case and the aggregationover all use cases The final score the TRIANGLE mark is anestimation of the overall QoE as perceived by the user

In the current TRIANGLE implementation the weightsin all aggregations are the same Further research is neededto appropriately define the weights of each domain and eachuse case in the overall score of the applications

7 Conclusions

The main contribution of the TRIANGLE project is theprovision of a framework that generalizes QoE computation

and enables the execution of extensive and repeatable testcampaigns to obtainmeaningfulQoE scoresTheTRIANGLEproject has also defined amethodology which is based on thetransformation and aggregation of KPIs its transformationinto synthetic-MOS values and its aggregation over thedifferent domains and use cases

The TRIANGLE approach is a methodology flexibleenough to generalize the computation of QoE for any applica-tionservice Themethodology has been validated testing theDASH implementation in the Exoplayer App To confirm thesuitability of theweights used in the averaging process and theinterpolation parameters as well as to verify the correlationof the obtained MOS with that scored by users the authorshave started experiments with real users and initial results areencouraging

The process described produces a final TRIANGLEmarka single quality score which could eventually be used to cer-tify applications after achieving a consensus on the differentvalues of the process (weights limits etc) to use

Data Availability

Themethodology and results used to support the findings ofthis study are included within the article

Conflicts of Interest

The authors declare that they have no conflicts of inter-est

Wireless Communications and Mobile Computing 11

DER

AUE AEC

Device Resource Usage

User Experience Energy Consumption

Figure 7 Exoplayer synthetic-MOS values per domain

Acknowledgments

The TRIANGLE project is funded by the European UnionrsquosHorizon 2020 Research and Innovation Programme (GrantAgreement no 688712)

References

[1] ETSI ldquoHuman factors quality of experience (QoE) require-ments for real-time communication servicesrdquo Tech Rep 102643 2010

[2] ITU-T ldquoP10G100 (2006) amendment 1 (0107) new appendixI - definition of quality of experience (QoE)rdquo 2007

[3] F Kozamernik V Steinmann P Sunna and E WyckensldquoSAMVIQ - A new EBUmethodology for video quality evalua-tions in multimediardquo SMPTE Motion Imaging Journal vol 114no 4 pp 152ndash160 2005

[4] ITU-T ldquoG107 the E-model a computational model for use intransmission planningrdquo 2015

[5] J De Vriendt D De Vleeschauwer and D C Robinson ldquoQoEmodel for video delivered over an LTE network using HTTPadaptive streamingrdquo Bell Labs Technical Journal vol 18 no 4pp 45ndash62 2014

[6] S Jelassi G Rubino H Melvin H Youssef and G PujolleldquoQuality of Experience of VoIP Service A Survey of AssessmentApproaches andOpen Issuesrdquo IEEECommunications Surveys ampTutorials vol 14 no 2 pp 491ndash513 2012

[7] M Li C-L Yeh and S-Y Lu ldquoReal-Time QoE MonitoringSystem forVideo Streaming ServiceswithAdaptiveMedia Play-outrdquo International Journal of Digital Multimedia Broadcastingvol 2018 Article ID 2619438 11 pages 2018

[8] S Barakovic and L Skorin-Kapov ldquoSurvey and Challengesof QoE Management Issues in Wireless Networksrdquo Journal ofComputer Networks and Communications vol 2013 Article ID165146 28 pages 2013

[9] ITU-T ldquoG1030 estimating end-to-end performance in IPnetworks for data applicationsrdquo 2014

[10] ITU-T ldquoG1031 QoE factors in web-browsingrdquo 2014[11] EU H2020 TRIANGLE Project Deliverable D22 Final report

on the formalization of the certification process requirementsanduse cases 2017 httpswwwtriangle-projecteuproject-olddeliverables

[12] Q A Chen H Luo S Rosen et al ldquoQoE doctor diagnosingmobile app QoE with automated UI control and cross-layeranalysisrdquo in Proceedings of the Conference on Internet Mea-surement Conference (IMC rsquo14) pp 151ndash164 ACM VancouverCanada November 2014

[13] M A Mehmood A Wundsam S Uhlig D Levin N Sarrarand A Feldmann ldquoQoE-Lab Towards Evaluating Quality ofExperience for Future Internet Conditionsrdquo in Testbeds andResearch Infrastructure Korakis T Li H Tran-Gia P and HS Park Eds vol 90 of TridentCom 2011 Lnicst pp 286ndash301Springer Development of Networks and Communities BerlinGermany 2012

[14] D Levin A Wundsam A Mehmood and A FeldmannldquoBerlin The Berlin Experimental Router Laboratory for Inno-vative Networkingrdquo in TridentCom 2010 Lnicst T MagedanzA Gavras N H Thanh and J S Chase Eds vol 46 of LectureNotes of the Institute for Computer Sciences Social Informaticsand Telecommunications Engineering pp 602ndash604 SpringerHeidelberg Germany 2011

12 Wireless Communications and Mobile Computing

[15] K De Moor I Ketyko W Joseph et al ldquoProposed frameworkfor evaluating quality of experience in a mobile testbed-oriented living lab settingrdquo Mobile Networks and Applicationsvol 15 no 3 pp 378ndash391 2010

[16] R Sanchez-Iborra M-D Cano J J P C Rodrigues and JGarcia-Haro ldquoAnExperimental QoE Performance Study for theEfficient Transmission of High Demanding Traffic over an AdHoc Network Using BATMANrdquo Mobile Information Systemsvol 2015 Article ID 217106 14 pages 2015

[17] P Oliver-Balsalobre M Toril S Luna-Ramırez and R GarcıaGaraluz ldquoA system testbed for modeling encrypted video-streaming service performance indicators based on TCPIPmetricsrdquo EURASIP Journal on Wireless Communications andNetworking vol 2017 no 1 2017

[18] M Solera M Toril I Palomo G Gomez and J Poncela ldquoATestbed for Evaluating Video Streaming Services in LTErdquoWireless Personal Communications vol 98 no 3 pp 2753ndash27732018

[19] A Alvarez A Dıaz P Merino and F J Rivas ldquoField mea-surements of mobile services with Android smartphonesrdquoin Proceedings of the IEEE Consumer Communications andNetworking Conference (CCNC rsquo12) pp 105ndash109 Las Vegas NevUSA January 2012

[20] NGMN Alliance ldquoNGMN 5G white paperrdquo 2015 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2015NGMN 5G White Paper V1 0pdf

[21] ldquoInfrastructure and Design for Adaptivity and Flexibilityrdquo inMobile Information Systems Springer 2006

[22] J Nielsen ldquoResponse Times The Three Important Limitsrdquo inUsability Engineering 1993

[23] NGMN Alliance ldquoDefinition of the testing framework for theNGMN 5G pre-commercial networks trialsrdquo 2018 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2018180220 NGMN PreCommTrials Framework definition v1 0pdf

[24] 3GPP TS 26246 ldquoTransparent end-to-end Packet-switchedStreaming Services (PSS) Progressive Download and DynamicAdaptive Streaming over HTTP (3GP-DASH)rdquo 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: QoE Evaluation: The TRIANGLE Testbed Approach

Wireless Communications and Mobile Computing 9

Table 5 Reference values for interpolation

Feature Domain KPI Synthetic MOS Calculation KPI min KPI maxNon-Interactive Playback AEC Average power consumption Type I 10 W 08 WNon-Interactive Playback AUE Time to load first media frame Type II KPI worst=20 msNon-Interactive Playback AUE Playback cut-off ratio Type I 50 0Non-Interactive Playback AUE Video resolution Type I 240p 720pNon-Interactive Playback RES Average CPU usage Type I 100 16Non-Interactive Playback RES Average memory usage Type I 100 40Play and Pause AEC Average power consumption Type I 10 W 08 WPlay and Pause AUE Pause operation success rate Type I 50 100Play and Pause RES Average CPU usage Type I 100 16Play and Pause RES Average memory usage Type I 100 40

The time to load first media picture measurement isobtained subtracting the timestamp of the measurementpoint ldquoMedia File Playback ndash Startrdquo from the measurementpoint ldquoMedia File Playback ndash First Picturerdquo

As specified in [11] all scenarios defined are applicableto the content streaming use case Therefore test cases inthe three domains currently supported by the testbed areexecuted in all the scenarios

Once the test campaign has finished the raw measure-ment results are processed to obtain the KPIs associated witheach test case average current consumption average time toload first media frame average CPU usage and so forth Theprocesses applied are detailed in Table 5 Based on previousexperiments performed by the authors the behaviour of thetime to load the first media frame KPI resembles the webresponse time KPI (ie the amount of time the user hasto wait for the service) and thus as recommended in theopinionmodel forweb search introduced in [9] a logarithmicinterpolation (type II) has been used for this metric

The results of the initial process that is the KPIs compu-tation are translated into synthetics-MOS values To computethese values reference benchmarking values for each of theKPIs need to be used according to the normalization andinterpolation process described in Section 5 Table 5 showswhat has been currently used by TRIANGLE for the AppUser Experience domain which is also used by NGMN asreference in their precommercial Trials document [23]

For example for the ldquotime to load first media framerdquo KPIshown in Table 5 the type of aggregation applied is averagingand the interpolation formula used is Type II

To achieve stable results each test case is executed 10times (10 iterations) in each network scenario The synthetic-MOS value in each domain is calculated by averaging themeasured synthetic-MOS values in the domain For examplesynthetic-MOS value is the RES domain obtained by aver-aging the synthetic-MOS value of ldquoaverage CPU usagerdquo andldquoaverage memory usagerdquo from the two test cases

Although Exoplayer supports several video streamingprotocols in this work only DASH [24] (Dynamic AdaptiveStreaming over HTTP) has been tested DASH clients shouldseamlessly adapt to changing network conditions by makingdecisions on which video segment to download (videosare encoded at multiple bitrates) The Exoplayerrsquos default

000

000

0

001

728

0

003

456

0

005

184

0

010

912

0

012

640

0

014

368

0

020

096

0

Timestamp

Video Resolution

0

200

400

600

800

1000

1200

Hor

izon

tal R

esol

utio

n

Figure 6 Video Resolution evolution in the Driving Urban Normalscenario

adaptation algorithm is basically throughput-based and someparameters control how often and when switching can occur

During the testing the testbed was configured with thedifferent network scenarios defined in [11] In these scenariosthe network configuration changes dynamically following arandom pattern resulting in different maximum throughputrates The expected behaviour of the application under testis that the video streaming client adapts to the availablethroughput by decreasing or increasing the resolution of thereceived video Figure 6 depicts how the client effectivelyadapts to the channel conditions

However the objective of the testing carried out in theTRIANGE testbed is not just to verify that the video stream-ing client actually adapts to the available maximum through-put but also to check whether this adaptation improves theusersrsquo experience quality

Table 6 shows a summary of the synthetic-MOS valuesobtained per scenario in one test case of each domain Thescores obtained in the RES andAECdomains are always highIn the AUE domain the synthetic MOS associated with theVideo Resolution shows low scores in some of the scenariosbecause the resolution decreases reasonable good scores inthe time to load first media and high scores in the time toplayback cut-off ratio Overall it can be concluded that the

10 Wireless Communications and Mobile Computing

Table 6 Synthetic MOS values per test case and scenario for the feature ldquoNoninteractive Playbackrdquo

AUE domain AEC domain RES domain

Test Case AUECS001 Test CaseAECCS001 Test Case RESCS001

ScenarioTime to loadfirst mediaframe

PlaybackCut-off ratio

VideoResolution

mode

AveragePower

Consumption

Average CPUUsage

AverageRAM Usage

HighSpeed DirectPassenger 21 31 23 47 43 42

Suburban Festival 38 47 31 48 43 41Suburban shopping mallbusy hours 37 37 13 48 44 41

Suburban shopping malloff-peak 36 31 23 48 43 41

Suburban stadium 38 29 21 47 44 41Urban Driving Normal 26 39 28 47 44 4Urban Driving TrafficJam 34 37 16 48 44 4

Urban Internet Cafe BusyHours 38 37 19 48 44 4

Urban Internet Cafe OffPeak 38 31 23 48 43 4

Urban Office 38 47 33 48 45 43Urban Pedestrian 39 26 2 47 44 4

35 36 23 47 44 41

DASH implementation of the video streaming client undertest is able to adapt to the changing conditions of the networkmaintaining an acceptable rate of video cut-off rebufferingtimes and resources usage

The final score in each domain is obtained by averagingthe synthetic-MOS values from all the tested network scenar-ios Figure 7 shows the spider diagram for the three domainstested In the User Experience domain the score obtained islower than the other domains due to the low synthetic-MOSvalues obtained for the video resolution

The final synthetic MOS for the use case Content Dis-tribution Streaming is obtained as a weighted average of thethree domains representing the overall QoE as perceived bythe userThefinal score for the Exoplayer version 1516 and thefeatures tested (Noninteractive Playback and Play and Pause)is 42 which means that the low score obtained in the videoresolution is compensated with the high scores in other KPIs

If an application under test has more than one use casethe next steps in the TRIANGLE mark project approachwould be the aggregation per use case and the aggregationover all use cases The final score the TRIANGLE mark is anestimation of the overall QoE as perceived by the user

In the current TRIANGLE implementation the weightsin all aggregations are the same Further research is neededto appropriately define the weights of each domain and eachuse case in the overall score of the applications

7 Conclusions

The main contribution of the TRIANGLE project is theprovision of a framework that generalizes QoE computation

and enables the execution of extensive and repeatable testcampaigns to obtainmeaningfulQoE scoresTheTRIANGLEproject has also defined amethodology which is based on thetransformation and aggregation of KPIs its transformationinto synthetic-MOS values and its aggregation over thedifferent domains and use cases

The TRIANGLE approach is a methodology flexibleenough to generalize the computation of QoE for any applica-tionservice Themethodology has been validated testing theDASH implementation in the Exoplayer App To confirm thesuitability of theweights used in the averaging process and theinterpolation parameters as well as to verify the correlationof the obtained MOS with that scored by users the authorshave started experiments with real users and initial results areencouraging

The process described produces a final TRIANGLEmarka single quality score which could eventually be used to cer-tify applications after achieving a consensus on the differentvalues of the process (weights limits etc) to use

Data Availability

Themethodology and results used to support the findings ofthis study are included within the article

Conflicts of Interest

The authors declare that they have no conflicts of inter-est

Wireless Communications and Mobile Computing 11

DER

AUE AEC

Device Resource Usage

User Experience Energy Consumption

Figure 7 Exoplayer synthetic-MOS values per domain

Acknowledgments

The TRIANGLE project is funded by the European UnionrsquosHorizon 2020 Research and Innovation Programme (GrantAgreement no 688712)

References

[1] ETSI ldquoHuman factors quality of experience (QoE) require-ments for real-time communication servicesrdquo Tech Rep 102643 2010

[2] ITU-T ldquoP10G100 (2006) amendment 1 (0107) new appendixI - definition of quality of experience (QoE)rdquo 2007

[3] F Kozamernik V Steinmann P Sunna and E WyckensldquoSAMVIQ - A new EBUmethodology for video quality evalua-tions in multimediardquo SMPTE Motion Imaging Journal vol 114no 4 pp 152ndash160 2005

[4] ITU-T ldquoG107 the E-model a computational model for use intransmission planningrdquo 2015

[5] J De Vriendt D De Vleeschauwer and D C Robinson ldquoQoEmodel for video delivered over an LTE network using HTTPadaptive streamingrdquo Bell Labs Technical Journal vol 18 no 4pp 45ndash62 2014

[6] S Jelassi G Rubino H Melvin H Youssef and G PujolleldquoQuality of Experience of VoIP Service A Survey of AssessmentApproaches andOpen Issuesrdquo IEEECommunications Surveys ampTutorials vol 14 no 2 pp 491ndash513 2012

[7] M Li C-L Yeh and S-Y Lu ldquoReal-Time QoE MonitoringSystem forVideo Streaming ServiceswithAdaptiveMedia Play-outrdquo International Journal of Digital Multimedia Broadcastingvol 2018 Article ID 2619438 11 pages 2018

[8] S Barakovic and L Skorin-Kapov ldquoSurvey and Challengesof QoE Management Issues in Wireless Networksrdquo Journal ofComputer Networks and Communications vol 2013 Article ID165146 28 pages 2013

[9] ITU-T ldquoG1030 estimating end-to-end performance in IPnetworks for data applicationsrdquo 2014

[10] ITU-T ldquoG1031 QoE factors in web-browsingrdquo 2014[11] EU H2020 TRIANGLE Project Deliverable D22 Final report

on the formalization of the certification process requirementsanduse cases 2017 httpswwwtriangle-projecteuproject-olddeliverables

[12] Q A Chen H Luo S Rosen et al ldquoQoE doctor diagnosingmobile app QoE with automated UI control and cross-layeranalysisrdquo in Proceedings of the Conference on Internet Mea-surement Conference (IMC rsquo14) pp 151ndash164 ACM VancouverCanada November 2014

[13] M A Mehmood A Wundsam S Uhlig D Levin N Sarrarand A Feldmann ldquoQoE-Lab Towards Evaluating Quality ofExperience for Future Internet Conditionsrdquo in Testbeds andResearch Infrastructure Korakis T Li H Tran-Gia P and HS Park Eds vol 90 of TridentCom 2011 Lnicst pp 286ndash301Springer Development of Networks and Communities BerlinGermany 2012

[14] D Levin A Wundsam A Mehmood and A FeldmannldquoBerlin The Berlin Experimental Router Laboratory for Inno-vative Networkingrdquo in TridentCom 2010 Lnicst T MagedanzA Gavras N H Thanh and J S Chase Eds vol 46 of LectureNotes of the Institute for Computer Sciences Social Informaticsand Telecommunications Engineering pp 602ndash604 SpringerHeidelberg Germany 2011

12 Wireless Communications and Mobile Computing

[15] K De Moor I Ketyko W Joseph et al ldquoProposed frameworkfor evaluating quality of experience in a mobile testbed-oriented living lab settingrdquo Mobile Networks and Applicationsvol 15 no 3 pp 378ndash391 2010

[16] R Sanchez-Iborra M-D Cano J J P C Rodrigues and JGarcia-Haro ldquoAnExperimental QoE Performance Study for theEfficient Transmission of High Demanding Traffic over an AdHoc Network Using BATMANrdquo Mobile Information Systemsvol 2015 Article ID 217106 14 pages 2015

[17] P Oliver-Balsalobre M Toril S Luna-Ramırez and R GarcıaGaraluz ldquoA system testbed for modeling encrypted video-streaming service performance indicators based on TCPIPmetricsrdquo EURASIP Journal on Wireless Communications andNetworking vol 2017 no 1 2017

[18] M Solera M Toril I Palomo G Gomez and J Poncela ldquoATestbed for Evaluating Video Streaming Services in LTErdquoWireless Personal Communications vol 98 no 3 pp 2753ndash27732018

[19] A Alvarez A Dıaz P Merino and F J Rivas ldquoField mea-surements of mobile services with Android smartphonesrdquoin Proceedings of the IEEE Consumer Communications andNetworking Conference (CCNC rsquo12) pp 105ndash109 Las Vegas NevUSA January 2012

[20] NGMN Alliance ldquoNGMN 5G white paperrdquo 2015 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2015NGMN 5G White Paper V1 0pdf

[21] ldquoInfrastructure and Design for Adaptivity and Flexibilityrdquo inMobile Information Systems Springer 2006

[22] J Nielsen ldquoResponse Times The Three Important Limitsrdquo inUsability Engineering 1993

[23] NGMN Alliance ldquoDefinition of the testing framework for theNGMN 5G pre-commercial networks trialsrdquo 2018 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2018180220 NGMN PreCommTrials Framework definition v1 0pdf

[24] 3GPP TS 26246 ldquoTransparent end-to-end Packet-switchedStreaming Services (PSS) Progressive Download and DynamicAdaptive Streaming over HTTP (3GP-DASH)rdquo 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: QoE Evaluation: The TRIANGLE Testbed Approach

10 Wireless Communications and Mobile Computing

Table 6 Synthetic MOS values per test case and scenario for the feature ldquoNoninteractive Playbackrdquo

AUE domain AEC domain RES domain

Test Case AUECS001 Test CaseAECCS001 Test Case RESCS001

ScenarioTime to loadfirst mediaframe

PlaybackCut-off ratio

VideoResolution

mode

AveragePower

Consumption

Average CPUUsage

AverageRAM Usage

HighSpeed DirectPassenger 21 31 23 47 43 42

Suburban Festival 38 47 31 48 43 41Suburban shopping mallbusy hours 37 37 13 48 44 41

Suburban shopping malloff-peak 36 31 23 48 43 41

Suburban stadium 38 29 21 47 44 41Urban Driving Normal 26 39 28 47 44 4Urban Driving TrafficJam 34 37 16 48 44 4

Urban Internet Cafe BusyHours 38 37 19 48 44 4

Urban Internet Cafe OffPeak 38 31 23 48 43 4

Urban Office 38 47 33 48 45 43Urban Pedestrian 39 26 2 47 44 4

35 36 23 47 44 41

DASH implementation of the video streaming client undertest is able to adapt to the changing conditions of the networkmaintaining an acceptable rate of video cut-off rebufferingtimes and resources usage

The final score in each domain is obtained by averagingthe synthetic-MOS values from all the tested network scenar-ios Figure 7 shows the spider diagram for the three domainstested In the User Experience domain the score obtained islower than the other domains due to the low synthetic-MOSvalues obtained for the video resolution

The final synthetic MOS for the use case Content Dis-tribution Streaming is obtained as a weighted average of thethree domains representing the overall QoE as perceived bythe userThefinal score for the Exoplayer version 1516 and thefeatures tested (Noninteractive Playback and Play and Pause)is 42 which means that the low score obtained in the videoresolution is compensated with the high scores in other KPIs

If an application under test has more than one use casethe next steps in the TRIANGLE mark project approachwould be the aggregation per use case and the aggregationover all use cases The final score the TRIANGLE mark is anestimation of the overall QoE as perceived by the user

In the current TRIANGLE implementation the weightsin all aggregations are the same Further research is neededto appropriately define the weights of each domain and eachuse case in the overall score of the applications

7 Conclusions

The main contribution of the TRIANGLE project is theprovision of a framework that generalizes QoE computation

and enables the execution of extensive and repeatable testcampaigns to obtainmeaningfulQoE scoresTheTRIANGLEproject has also defined amethodology which is based on thetransformation and aggregation of KPIs its transformationinto synthetic-MOS values and its aggregation over thedifferent domains and use cases

The TRIANGLE approach is a methodology flexibleenough to generalize the computation of QoE for any applica-tionservice Themethodology has been validated testing theDASH implementation in the Exoplayer App To confirm thesuitability of theweights used in the averaging process and theinterpolation parameters as well as to verify the correlationof the obtained MOS with that scored by users the authorshave started experiments with real users and initial results areencouraging

The process described produces a final TRIANGLEmarka single quality score which could eventually be used to cer-tify applications after achieving a consensus on the differentvalues of the process (weights limits etc) to use

Data Availability

Themethodology and results used to support the findings ofthis study are included within the article

Conflicts of Interest

The authors declare that they have no conflicts of inter-est

Wireless Communications and Mobile Computing 11

DER

AUE AEC

Device Resource Usage

User Experience Energy Consumption

Figure 7 Exoplayer synthetic-MOS values per domain

Acknowledgments

The TRIANGLE project is funded by the European UnionrsquosHorizon 2020 Research and Innovation Programme (GrantAgreement no 688712)

References

[1] ETSI ldquoHuman factors quality of experience (QoE) require-ments for real-time communication servicesrdquo Tech Rep 102643 2010

[2] ITU-T ldquoP10G100 (2006) amendment 1 (0107) new appendixI - definition of quality of experience (QoE)rdquo 2007

[3] F Kozamernik V Steinmann P Sunna and E WyckensldquoSAMVIQ - A new EBUmethodology for video quality evalua-tions in multimediardquo SMPTE Motion Imaging Journal vol 114no 4 pp 152ndash160 2005

[4] ITU-T ldquoG107 the E-model a computational model for use intransmission planningrdquo 2015

[5] J De Vriendt D De Vleeschauwer and D C Robinson ldquoQoEmodel for video delivered over an LTE network using HTTPadaptive streamingrdquo Bell Labs Technical Journal vol 18 no 4pp 45ndash62 2014

[6] S Jelassi G Rubino H Melvin H Youssef and G PujolleldquoQuality of Experience of VoIP Service A Survey of AssessmentApproaches andOpen Issuesrdquo IEEECommunications Surveys ampTutorials vol 14 no 2 pp 491ndash513 2012

[7] M Li C-L Yeh and S-Y Lu ldquoReal-Time QoE MonitoringSystem forVideo Streaming ServiceswithAdaptiveMedia Play-outrdquo International Journal of Digital Multimedia Broadcastingvol 2018 Article ID 2619438 11 pages 2018

[8] S Barakovic and L Skorin-Kapov ldquoSurvey and Challengesof QoE Management Issues in Wireless Networksrdquo Journal ofComputer Networks and Communications vol 2013 Article ID165146 28 pages 2013

[9] ITU-T ldquoG1030 estimating end-to-end performance in IPnetworks for data applicationsrdquo 2014

[10] ITU-T ldquoG1031 QoE factors in web-browsingrdquo 2014[11] EU H2020 TRIANGLE Project Deliverable D22 Final report

on the formalization of the certification process requirementsanduse cases 2017 httpswwwtriangle-projecteuproject-olddeliverables

[12] Q A Chen H Luo S Rosen et al ldquoQoE doctor diagnosingmobile app QoE with automated UI control and cross-layeranalysisrdquo in Proceedings of the Conference on Internet Mea-surement Conference (IMC rsquo14) pp 151ndash164 ACM VancouverCanada November 2014

[13] M A Mehmood A Wundsam S Uhlig D Levin N Sarrarand A Feldmann ldquoQoE-Lab Towards Evaluating Quality ofExperience for Future Internet Conditionsrdquo in Testbeds andResearch Infrastructure Korakis T Li H Tran-Gia P and HS Park Eds vol 90 of TridentCom 2011 Lnicst pp 286ndash301Springer Development of Networks and Communities BerlinGermany 2012

[14] D Levin A Wundsam A Mehmood and A FeldmannldquoBerlin The Berlin Experimental Router Laboratory for Inno-vative Networkingrdquo in TridentCom 2010 Lnicst T MagedanzA Gavras N H Thanh and J S Chase Eds vol 46 of LectureNotes of the Institute for Computer Sciences Social Informaticsand Telecommunications Engineering pp 602ndash604 SpringerHeidelberg Germany 2011

12 Wireless Communications and Mobile Computing

[15] K De Moor I Ketyko W Joseph et al ldquoProposed frameworkfor evaluating quality of experience in a mobile testbed-oriented living lab settingrdquo Mobile Networks and Applicationsvol 15 no 3 pp 378ndash391 2010

[16] R Sanchez-Iborra M-D Cano J J P C Rodrigues and JGarcia-Haro ldquoAnExperimental QoE Performance Study for theEfficient Transmission of High Demanding Traffic over an AdHoc Network Using BATMANrdquo Mobile Information Systemsvol 2015 Article ID 217106 14 pages 2015

[17] P Oliver-Balsalobre M Toril S Luna-Ramırez and R GarcıaGaraluz ldquoA system testbed for modeling encrypted video-streaming service performance indicators based on TCPIPmetricsrdquo EURASIP Journal on Wireless Communications andNetworking vol 2017 no 1 2017

[18] M Solera M Toril I Palomo G Gomez and J Poncela ldquoATestbed for Evaluating Video Streaming Services in LTErdquoWireless Personal Communications vol 98 no 3 pp 2753ndash27732018

[19] A Alvarez A Dıaz P Merino and F J Rivas ldquoField mea-surements of mobile services with Android smartphonesrdquoin Proceedings of the IEEE Consumer Communications andNetworking Conference (CCNC rsquo12) pp 105ndash109 Las Vegas NevUSA January 2012

[20] NGMN Alliance ldquoNGMN 5G white paperrdquo 2015 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2015NGMN 5G White Paper V1 0pdf

[21] ldquoInfrastructure and Design for Adaptivity and Flexibilityrdquo inMobile Information Systems Springer 2006

[22] J Nielsen ldquoResponse Times The Three Important Limitsrdquo inUsability Engineering 1993

[23] NGMN Alliance ldquoDefinition of the testing framework for theNGMN 5G pre-commercial networks trialsrdquo 2018 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2018180220 NGMN PreCommTrials Framework definition v1 0pdf

[24] 3GPP TS 26246 ldquoTransparent end-to-end Packet-switchedStreaming Services (PSS) Progressive Download and DynamicAdaptive Streaming over HTTP (3GP-DASH)rdquo 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: QoE Evaluation: The TRIANGLE Testbed Approach

Wireless Communications and Mobile Computing 11

DER

AUE AEC

Device Resource Usage

User Experience Energy Consumption

Figure 7 Exoplayer synthetic-MOS values per domain

Acknowledgments

The TRIANGLE project is funded by the European UnionrsquosHorizon 2020 Research and Innovation Programme (GrantAgreement no 688712)

References

[1] ETSI ldquoHuman factors quality of experience (QoE) require-ments for real-time communication servicesrdquo Tech Rep 102643 2010

[2] ITU-T ldquoP10G100 (2006) amendment 1 (0107) new appendixI - definition of quality of experience (QoE)rdquo 2007

[3] F Kozamernik V Steinmann P Sunna and E WyckensldquoSAMVIQ - A new EBUmethodology for video quality evalua-tions in multimediardquo SMPTE Motion Imaging Journal vol 114no 4 pp 152ndash160 2005

[4] ITU-T ldquoG107 the E-model a computational model for use intransmission planningrdquo 2015

[5] J De Vriendt D De Vleeschauwer and D C Robinson ldquoQoEmodel for video delivered over an LTE network using HTTPadaptive streamingrdquo Bell Labs Technical Journal vol 18 no 4pp 45ndash62 2014

[6] S Jelassi G Rubino H Melvin H Youssef and G PujolleldquoQuality of Experience of VoIP Service A Survey of AssessmentApproaches andOpen Issuesrdquo IEEECommunications Surveys ampTutorials vol 14 no 2 pp 491ndash513 2012

[7] M Li C-L Yeh and S-Y Lu ldquoReal-Time QoE MonitoringSystem forVideo Streaming ServiceswithAdaptiveMedia Play-outrdquo International Journal of Digital Multimedia Broadcastingvol 2018 Article ID 2619438 11 pages 2018

[8] S Barakovic and L Skorin-Kapov ldquoSurvey and Challengesof QoE Management Issues in Wireless Networksrdquo Journal ofComputer Networks and Communications vol 2013 Article ID165146 28 pages 2013

[9] ITU-T ldquoG1030 estimating end-to-end performance in IPnetworks for data applicationsrdquo 2014

[10] ITU-T ldquoG1031 QoE factors in web-browsingrdquo 2014[11] EU H2020 TRIANGLE Project Deliverable D22 Final report

on the formalization of the certification process requirementsanduse cases 2017 httpswwwtriangle-projecteuproject-olddeliverables

[12] Q A Chen H Luo S Rosen et al ldquoQoE doctor diagnosingmobile app QoE with automated UI control and cross-layeranalysisrdquo in Proceedings of the Conference on Internet Mea-surement Conference (IMC rsquo14) pp 151ndash164 ACM VancouverCanada November 2014

[13] M A Mehmood A Wundsam S Uhlig D Levin N Sarrarand A Feldmann ldquoQoE-Lab Towards Evaluating Quality ofExperience for Future Internet Conditionsrdquo in Testbeds andResearch Infrastructure Korakis T Li H Tran-Gia P and HS Park Eds vol 90 of TridentCom 2011 Lnicst pp 286ndash301Springer Development of Networks and Communities BerlinGermany 2012

[14] D Levin A Wundsam A Mehmood and A FeldmannldquoBerlin The Berlin Experimental Router Laboratory for Inno-vative Networkingrdquo in TridentCom 2010 Lnicst T MagedanzA Gavras N H Thanh and J S Chase Eds vol 46 of LectureNotes of the Institute for Computer Sciences Social Informaticsand Telecommunications Engineering pp 602ndash604 SpringerHeidelberg Germany 2011

12 Wireless Communications and Mobile Computing

[15] K De Moor I Ketyko W Joseph et al ldquoProposed frameworkfor evaluating quality of experience in a mobile testbed-oriented living lab settingrdquo Mobile Networks and Applicationsvol 15 no 3 pp 378ndash391 2010

[16] R Sanchez-Iborra M-D Cano J J P C Rodrigues and JGarcia-Haro ldquoAnExperimental QoE Performance Study for theEfficient Transmission of High Demanding Traffic over an AdHoc Network Using BATMANrdquo Mobile Information Systemsvol 2015 Article ID 217106 14 pages 2015

[17] P Oliver-Balsalobre M Toril S Luna-Ramırez and R GarcıaGaraluz ldquoA system testbed for modeling encrypted video-streaming service performance indicators based on TCPIPmetricsrdquo EURASIP Journal on Wireless Communications andNetworking vol 2017 no 1 2017

[18] M Solera M Toril I Palomo G Gomez and J Poncela ldquoATestbed for Evaluating Video Streaming Services in LTErdquoWireless Personal Communications vol 98 no 3 pp 2753ndash27732018

[19] A Alvarez A Dıaz P Merino and F J Rivas ldquoField mea-surements of mobile services with Android smartphonesrdquoin Proceedings of the IEEE Consumer Communications andNetworking Conference (CCNC rsquo12) pp 105ndash109 Las Vegas NevUSA January 2012

[20] NGMN Alliance ldquoNGMN 5G white paperrdquo 2015 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2015NGMN 5G White Paper V1 0pdf

[21] ldquoInfrastructure and Design for Adaptivity and Flexibilityrdquo inMobile Information Systems Springer 2006

[22] J Nielsen ldquoResponse Times The Three Important Limitsrdquo inUsability Engineering 1993

[23] NGMN Alliance ldquoDefinition of the testing framework for theNGMN 5G pre-commercial networks trialsrdquo 2018 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2018180220 NGMN PreCommTrials Framework definition v1 0pdf

[24] 3GPP TS 26246 ldquoTransparent end-to-end Packet-switchedStreaming Services (PSS) Progressive Download and DynamicAdaptive Streaming over HTTP (3GP-DASH)rdquo 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: QoE Evaluation: The TRIANGLE Testbed Approach

12 Wireless Communications and Mobile Computing

[15] K De Moor I Ketyko W Joseph et al ldquoProposed frameworkfor evaluating quality of experience in a mobile testbed-oriented living lab settingrdquo Mobile Networks and Applicationsvol 15 no 3 pp 378ndash391 2010

[16] R Sanchez-Iborra M-D Cano J J P C Rodrigues and JGarcia-Haro ldquoAnExperimental QoE Performance Study for theEfficient Transmission of High Demanding Traffic over an AdHoc Network Using BATMANrdquo Mobile Information Systemsvol 2015 Article ID 217106 14 pages 2015

[17] P Oliver-Balsalobre M Toril S Luna-Ramırez and R GarcıaGaraluz ldquoA system testbed for modeling encrypted video-streaming service performance indicators based on TCPIPmetricsrdquo EURASIP Journal on Wireless Communications andNetworking vol 2017 no 1 2017

[18] M Solera M Toril I Palomo G Gomez and J Poncela ldquoATestbed for Evaluating Video Streaming Services in LTErdquoWireless Personal Communications vol 98 no 3 pp 2753ndash27732018

[19] A Alvarez A Dıaz P Merino and F J Rivas ldquoField mea-surements of mobile services with Android smartphonesrdquoin Proceedings of the IEEE Consumer Communications andNetworking Conference (CCNC rsquo12) pp 105ndash109 Las Vegas NevUSA January 2012

[20] NGMN Alliance ldquoNGMN 5G white paperrdquo 2015 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2015NGMN 5G White Paper V1 0pdf

[21] ldquoInfrastructure and Design for Adaptivity and Flexibilityrdquo inMobile Information Systems Springer 2006

[22] J Nielsen ldquoResponse Times The Three Important Limitsrdquo inUsability Engineering 1993

[23] NGMN Alliance ldquoDefinition of the testing framework for theNGMN 5G pre-commercial networks trialsrdquo 2018 httpswwwngmnorgfileadminngmncontentdownloadsTechnical2018180220 NGMN PreCommTrials Framework definition v1 0pdf

[24] 3GPP TS 26246 ldquoTransparent end-to-end Packet-switchedStreaming Services (PSS) Progressive Download and DynamicAdaptive Streaming over HTTP (3GP-DASH)rdquo 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 13: QoE Evaluation: The TRIANGLE Testbed Approach

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom