Top Banner
Transcoding Live Adaptive Video Streams at a Massive Scale in the Cloud Ramon Aparicio-Pardo Telecom Bretagne, France Karine Pires Telecom Bretagne, France Alberto Blanc Telecom Bretagne, France Gwendal Simon Telecom Bretagne, France ABSTRACT More and more users are watching online videos produced by non-professional sources (e.g., gamers, teachers of online courses, witnesses of public events) by using an increasingly diverse set of devices to access the videos (e.g., smartphones, tablets, HDTV). Live streaming service providers can combine adaptive streaming technologies and cloud computing to satisfy this demand. In this paper, we study the problem of preparing live video streams for delivery using cloud computing infrastructure, e.g., how many representations to use and the corresponding parameters (resolution and bit-rate). We present an integer linear program (ILP) to maximize the average user quality of experience (QoE) and a heuristic algorithm that can scale to large number of videos and users. We also introduce two new datasets: one characterizing a popular live streaming provider (Twitch) and another characterizing the computing resources needed to transcode a video. They are used to set up realistic test scenarios. We compare the performance of the optimal ILP solution with current industry standards, showing that the latter are sub-optimal. The solution of the ILP also shows the importance of the type of video on the optimal streaming preparation. By taking advantage of this, the proposed heuristic can efficiently satisfy a time varying demand with an almost constant amount of computing resources. Categories and Subject Descriptors H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems—Video General Terms Algorithm, Design, Measurement Keywords Live streaming, cloud-computing, video encoding Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. MMSys ’15, March 18 - 20 2015, Portland, OR, USA Copyright 2015 ACM 978-1-4503-3351-1/15/03 ...$15.00 http://dx.doi.org/10.1145/2713168.2713177 1. INTRODUCTION The management of live video services is a complex task due to the demand for specialized resources and to real-time constraints. To guarantee the quality of experience (QoE) for end-users, live streaming service providers (e.g., TV operators and multimedia broadcasters) have traditionally relied on private data-centers (with dedicated hardware) and private networks. The widespread availability of cloud computing platforms, with ever decreasing prices, has changed the landscape [17]. Significant economies of scale can be obtained by using standard hardware, Virtual Machine (VM), and shared resources in large data-centers. As illustrated in Figure 1, live streaming providers use these services in combination with widely available content delivery network (CDN) to build an elastic and scalable platform that can adapt itself to the dynamics of viewer demand. The only condition is to be able to use the standardized cloud computing platforms to prepare the video for delivery. The emergence of cloud computing platforms has enabled some new trends, including: (i) the adoption of adaptive bit-rate (ABR) streaming technologies to address the heterogeneity of end-users. ABR streaming requires encoding multiple video representations, and thus increases the demand for hardware resources. Modern cloud computing platforms can meet this demand. And (ii) the growing diversity of live video streams to deliver. The popularity of services like Twitch [1] illustrates the emergence of new forms of live streaming services, where the video stream to be delivered comes from non-professional sources (e.g., gamers, teachers of online Data-Center CDN Figure 1: Live streaming in the cloud
12
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Transcoding Live Adaptive Video Streamsat a Massive Scale in the Cloud

    Ramon Aparicio-PardoTelecom Bretagne, France

    Karine PiresTelecom Bretagne, France

    Alberto BlancTelecom Bretagne, France

    Gwendal SimonTelecom Bretagne, France

    ABSTRACTMore and more users are watching online videos producedby non-professional sources (e.g., gamers, teachers of onlinecourses, witnesses of public events) by using anincreasingly diverse set of devices to access the videos (e.g.,smartphones, tablets, HDTV). Live streaming serviceproviders can combine adaptive streaming technologies andcloud computing to satisfy this demand. In this paper, westudy the problem of preparing live video streams fordelivery using cloud computing infrastructure, e.g., howmany representations to use and the correspondingparameters (resolution and bit-rate). We present an integerlinear program (ILP) to maximize the average user qualityof experience (QoE) and a heuristic algorithm that canscale to large number of videos and users. We alsointroduce two new datasets: one characterizing a popularlive streaming provider (Twitch) and anothercharacterizing the computing resources needed to transcodea video. They are used to set up realistic test scenarios.We compare the performance of the optimal ILP solutionwith current industry standards, showing that the latterare sub-optimal. The solution of the ILP also shows theimportance of the type of video on the optimal streamingpreparation. By taking advantage of this, the proposedheuristic can efficiently satisfy a time varying demand withan almost constant amount of computing resources.

    Categories and Subject DescriptorsH.5.1 [Information Interfaces and Presentation]:Multimedia Information SystemsVideo

    General TermsAlgorithm, Design, Measurement

    KeywordsLive streaming, cloud-computing, video encoding

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee. Request permissions from [email protected] 15, March 18 - 20 2015, Portland, OR, USACopyright 2015 ACM 978-1-4503-3351-1/15/03 ...$15.00http://dx.doi.org/10.1145/2713168.2713177

    1. INTRODUCTIONThe management of live video services is a complex task

    due to the demand for specialized resources and toreal-time constraints. To guarantee the quality ofexperience (QoE) for end-users, live streaming serviceproviders (e.g., TV operators and multimediabroadcasters) have traditionally relied on privatedata-centers (with dedicated hardware) and privatenetworks. The widespread availability of cloud computingplatforms, with ever decreasing prices, has changed thelandscape [17]. Significant economies of scale can beobtained by using standard hardware, VirtualMachine (VM), and shared resources in large data-centers.As illustrated in Figure 1, live streaming providers usethese services in combination with widely available contentdelivery network (CDN) to build an elastic and scalableplatform that can adapt itself to the dynamics of viewerdemand. The only condition is to be able to use thestandardized cloud computing platforms to prepare thevideo for delivery.

    The emergence of cloud computing platforms hasenabled some new trends, including: (i) the adoption ofadaptive bit-rate (ABR) streaming technologies to addressthe heterogeneity of end-users. ABR streaming requiresencoding multiple video representations, and thus increasesthe demand for hardware resources. Modern cloudcomputing platforms can meet this demand. And (ii) thegrowing diversity of live video streams to deliver. Thepopularity of services like Twitch [1] illustrates theemergence of new forms of live streaming services, wherethe video stream to be delivered comes fromnon-professional sources (e.g., gamers, teachers of online

    Data-Center

    CDN

    Figure 1: Live streaming in the cloud

  • courses, witnesses of public events). Instead of a fewhigh-quality well-defined video streams, live streamingproviders have now to deal with many low-qualityunreliable video streams.

    In comparison to the significance of the shift, relativelyfew academic studies have been published. The scientificliterature contains papers related to ABR streaming intoCDN (e.g., [2,14]). However, to the best of our knowledge,the preparation of the streams into data-centers has notbeen addressed by the scientific community. Thepreparation of a given video channel includes deciding thenumber of representations to encode, setting the encoderparameters, allocating the transcoding jobs to machines,and transcoding each raw video stream into multiple videorepresentations.

    Existing works address some of these problemsindividually. For instance, some papers [7, 9, 1113] presentalgorithms to schedule transcoding jobs on a group ofcomputers, typically in order to maximize CPUutilization [7, 11] or to minimize the finishing time [12, 13].Other researchers have analyzed the performance of videoencoding and the relationship between power, rate anddistortion [8, 21, 25] using analytical models and empiricalstudies. In each case the encoding parameters (i.e.,resolution and rate) are input parameters of thesealgorithms, and are assumed to be known. Yet, they canhave a significant impact on the QoE and on the totalbandwidth used, as discussed in [22].

    Even though solutions have already been proposed forthese subproblems, it is non-trivial to combine them toform a single solution and there is no guarantee that acombination of optimal solutions of each subproblem is anoptimal and feasible solution of the global problem. Forexample, selecting the available representations (resolutionand bit-rate) without considering the available computingresources is likely to lead to unfeasible solutions.

    In this paper we are interested in maximizing theaverage user QoE by selecting the optimal encodingparameters under given computing and CDN capacityconstraints. More specifically, we make three contributions:

    1. We provide three datasets to help the communitystudy the problem of preparing ABR video indata-centers. The first dataset is based on ameasurement campaign we have conducted on Twitchbetween January and April 2014. The seconddataset, based on [4], gives bandwidth measurementson real clients downloading ABR streams. The thirddataset is the result of a large number of transcodingoperations that we have done on a standard server,which is typical of commoditized data-centerhardware. Thanks to this dataset, it is possible todetermine the computing resources needed totranscode a video and the QoE of the output video.

    2. We formulate an optimization problem for themanagement of a data-center dealing with a largenumber of live video streams to prepare for delivery(section 5). Our goal is to maximize the QoE for theend-users subject to the number of availablemachines in the data-center. With this problem, wehighlight the complex interplay between thepopularity of channels, the required computingresources for video transcoding, and the QoE of

    end-users. We formulate this problem as an integerlinear program (ILP). We then use a generic solver tocompare the performances of standard streampreparation strategies (where all the channels use thesame encoding parameters for the transcodingoperation) to the optimal. Our results highlight thegap between the standard preparation strategies andthe optimal solution.

    3. We propose a heuristic algorithm for the preparationof live ABR video streams (section 6). This algorithmcan decide on-the-fly the encoding parameters. Ourresults (section 6) show that our proposal significantlyimproves the QoE of the end-users while using almostconstant computing resources even in the presence ofa time varying demand.

    2. RELATEDWORKSCloud-based transcoding has been the subject of several

    papers. Most of these works [7, 9, 1113] take advantage ofthe fact that some modern video compression techniquesdivide the video stream into non-overlapping Group ofPictures (GOPs) that can be treated independently of eachother. The encoding time of each GOP depends on itsduration and on the complexity of the corresponding scene.The algorithms exploit this fact to increase the utilizationof each computing node at the expense of an increasedcomplexity, including the time and resources needed tosplit the input video into appropriately sized GOP.

    One downside of these solutions is that they need toknow the transcoding time of each GOP in order to assignit to the most suitable computing node. Someauthors [7, 11] propose fairly complicated systems toestimate the encoding time of each GOP based onreal-time measurements, while others [9,12,13] assume thatthis information is directly available, for instance [9] byprofiling the encoding of a few representative videos ofdifferent types, similarly to what we have done (seesection 3.2). Another downside of a GOP-based solution isthat the encoding of each GOP can be completed out oforder and then need to be reordered before being deliveredto the users. This out-of-order problem is especiallyimportant when dealing with live content that requiresreal-time constraints. Only Huang et al. [9] explicitlyconsider real-time constraints in a GOP-based system.

    Lao et al. [12] and Lin et al. [13] deal only with batchesof videos to transcode and present different schedulingalgorithms to minimize the overall encoding time.

    Zhi et al. [23] propose to leverage underused CDNcomputing resources to jointly transcode and deliver videosby having CDN servers transcode and store the mostpopular video segments. Such a solution can offersignificant gains, especially for non-live popular streams,but it requires the cooperation of the CDN, which is notalways owned and operated by the cloud provider.

    Cheng et al. [6] present a framework for real-timecloud-based video transcoding in the context of mobilevideo conferencing. Depending on the number ofparticipants and their locations, every video conferencecorresponds to one or more transcoding jobs, each onelocated in a potentially different data center. They use asimple linear model to estimate the resources needed byeach transcoding job; if the currently running VMs have

  • enough spare capacity to handle the new job, they usethem, otherwise they start new VMs, without a constrainton the total number of VMs used. They assume a linearrelationship between the video encoding rate and CPUusage, based on some measurements, for which no detailsare given. As shown in section 3.2, this is not consistentwith our experiments using ffmpeg to encode H.264 videos.

    The literature on video encoding is vast. A few papershave studied the relationship between power consumption,rate and distortion (often abbreviated as P-R-D). The firstpaper to investigate the P-R-D model by He et al. [8]contains a detailed analysis and corresponding model ofthe video encoding process. The authors use this model todefine an algorithm that, given rate and power constraints,minimizes the distortion of the compressed video. Su etal. [21] use a different definition for the distortion andpropose a different algorithm to solve the sameoptimization problem. These works deal with a single videoflow and take the rate as an input parameter, they do notaddress how to chose its value.

    Yang et al. [25] present the results of an empirical studybased on the H.264 Scalable Video Coding (SVC) referencesoftware JSVM-9.19 [18]. While non-SVC H.264 can beconsidered as a special case consisting of only one layer,the authors emphasize the results related to the SVC part.Since the raw data is not publicly available, and since thefigures in the paper do not correspond to the inputs weneed, we run similar experiments leading to the datasetpresented in section 3.2.

    3. PROBLEM DEFINITION BY DATASETSIn this Section, we present the three datasets used

    throughout the paper. They will help us to introduce theparameters influencing the stream preparation. The threedatasets cover the chain of involved (directly or indirectly)actors: broadcasters, live service provider, and viewers.

    3.1 The BroadcastersWe will interchangeably use the terms channel and

    broadcaster to indicate the people using the live streamingsystem to deliver a video. At any given time, a channel canbe either online, when the broadcaster emits the videostream, or oine when the broadcaster is disconnected.Each online period is called a session. During a session, abroadcaster captures a video, encodes it, and uploads it tothe service provider. We say that this video stream is thesource or the raw video stream. The service provider isthen in charge of transcoding this video into one ormultiple video representations, and of delivering theserepresentations to the viewers, or end-users. The numberof viewers watching a session can change over time.Figure 2 shows the evolution of the popularity of a givenchannel over time, this channel containing two sessions.

    Todays channels in cloud-based live streaming servicesare mostly non-professional. We focus here on thethousands of broadcasters who use live streaming servicessuch as ustream,1 livestream,2 twitch,3 and dailymotion4

    to broadcast live an event that they are capturing from

    1http://www.ustream.tv/2http://new.livestream.com/3http://www.twitch.tv/4https://www.dmcloud.net/features/live-streaming

    online online

    nb. of viewers

    time

    session 1 session 2

    Figure 2: A life in a channel

    their connected video device (e.g., camera, smartphone,and game console). As opposed to the traditional TVproviders and the content owners from the entertainmentindustry, these broadcasters usually do not emit ultra-HDvideo streams (2160p also known as 4k) and they tolerate ashort lag in the delivery. However, these broadcasters areless reliable. First, a channel can switch from oine toonline and vice versa at any time. Second, the emittedstreams have various bit-rates and resolutions, as well asvarious encoding parameters. Third, the broadcasters donot give much information about their video streams.

    In this paper, we use a dataset based on Twitch, apopular live streaming systems. Twitch provides anApplication Programming Interface (API) that allowsanybody to fetch information. We used a set ofsynchronized computers to obtain a global state every fiveminutes (in compliance with API restrictions) betweenJanuary, 6th and April, 6th 2014. We fetched informationabout the total number of viewers, the total number ofconcurrent online channels, the number of viewers persession, and some channel metadata. We then filtered thebroadcasters having abnormal behavior (no viewer oronline for less than five minutes during the last threemonths). The dataset is publicly available.5 We summarizethe main statistics in Table 1.

    Data Statistics

    total nb. of channels 1,536,492total nb. of sessions 6,242,609online less than 5 min. overall channels 25%no viewer channels 11%filtered nb. of channels 1,068,138 (69%)filtered nb. of sessions 5,221,208 (83%)

    Table 1: The Twitch dataset

    Figure 3 shows the average number of concurrent onlinechannels, which is a useful metric to estimate thecomputing power needed and thus the data-centerdimensions. Between 4,000 and 8,000 concurrent sessionsalways require data-center processing.

    To illustrate the diversity of the raw videos, Figure 4shows the cumulative density function (CDF) of thebit-rates of sessions for the three most popular resolutions.The key observation is the wide range of the bit-rates, evenfor a given resolution. For example, the bit-rates of 360psources range from 200 kbps to more than 3 Mbps.

    5http://dash.ipv6.enstb.fr/dataset/twitch/

  • 0 10 20 30 40 50 60 70 80 900

    2 000

    4 000

    6 000

    8 000

    10 000

    Days

    Nb

    .of

    on

    lin

    ech

    ann

    els

    max min

    Figure 3: Number of concurrent sessions in Twitch (min andmax per day)

    100 1 000 10 0000

    0.2

    0.4

    0.6

    0.8

    1

    Video bit-rate (kbps)

    CD

    Fof

    the

    sess

    ions

    1080p

    720p

    360p

    Figure 4: CDF of the session source bit-rates

    3.2 The Live Streaming Service ProviderOne of the missions of the live streaming service is to

    transform the raw video into a multimedia object that canbe delivered to a large number of users. We callpreparation this phase. In this paper, we focus on the taskof transcoding the raw video stream into a set of ABRvideo streams for delivery, and we neglect other tasks suchas content sanity check and the implementation of thedigital right management policy. For each session, our goalis twofold: (i) to define the set of video representations tobe transcoded, which supposes to decide the number ofrepresentations, and, for each representation, the bit-rateand the resolution; and, (ii) to assign the transcoding jobsto the data-center machines.

    A key information for our study is the amount of centralprocessing unit (CPU) cycles that are required to transcodeone raw video into a video stream in a different format. Thisquantity depends on various parameters, but mostly on (i)the bit-rate and the resolution of the source, (ii) the typeof the source, and (iii) the bit-rate and the resolution ofthe target video stream. To obtain a realistic estimate, wehave performed a set of transcoding operations from multipletypes of sources encoded at different resolutions and rates toa wide range of target resolutions and rates. For each one ofthe transcoding operations, we have estimated the QoE ofthe transcoded video, and measured the CPU cycles requiredto perform it. This dataset as well is publicly available.6

    Source Types. We consider four types of video content,corresponding to four test sequences available at [24]. Eachof these four test sequences corresponds to a representativevideo type as given in Table 2.

    6http://dash.ipv6.enstb.fr/dataset/transcoding/

    Video Type Video Name

    Documentary Aspen, Snow MountainSport Touchdown Pass, Rush Field CutsCartoon Big Buck Bunny, Sintel TrailerVideo Old Town Cross

    Table 2: Test videos and corresponding type.

    Source Encoding. In current live streaming systems,the encoding of the source is done at the broadcaster side.As shown in Figure 4, the raw video that is emitted by thebroadcaster can be encoded with different parameters.Based on our analysis of the Twitch dataset, we consideronly four resolutions, from 224p to 1080p, and we let theanalysis of 4k raw videos for future works. We also restrictthe video bit-rates to be in ranges covering 90% of thesources that we observe in the Twitch dataset. See Table 6in the Appendix for more details.

    Target Videos. The format of the target videos dependson the source video. For each input video we consider all theresolutions that are smaller than or equal to the input videoand for each resolution we consider all the rates that aresmaller or equal to the rate of the input video (see Table 6in the Appendix for more details).

    Transcoding. We perform the transcoding on a standardserver, similar to what can be found in most publicdata-centers. The debate about whether graphicsprocessing unit (GPU) can be used in a public cloud is stillacute today. Those who do not believe in a wideavailability of GPU in the cloud emphasize the poorperformance of standard virtualization tools on GPU [19]and the preferences of the main cloud providers for low-endservers (the so-called wimpy servers) in data-centers [3].On the other hand, new middleware have been developedto improve GPU sharing and VM-GPU matching indata-centers [16], so it may be possible to envision a widerdeployment of GPU in a near future. Nevertheless, in thispaper, we stick to a conservative position, which is the oneadopted by todays live streaming service providers, and weconsider only the availability of CPU in the servers.

    As for the physical aspect of the CPU cyclesmeasurements, we consider that the virtualization has noimpact on the performances, i.e. a transcoder running in aVM on a shared physical machine is as fast as if it randirectly on the physical machine. The server that we usedis an Intel Xeon CPU E5640 at 2.67GHz with 24 GB ofRAM using Linux 3.2 with Ubuntu 12.04.

    Figure 5 shows the experimental results for all the targetvideos generated from a source of type movie, 1080presolution and encoded at 2,750 kbps. The empirical CPUcycles measurements are depicted as marks. Section A.2 inthe Appendix, gives more details on how these curves havebeen generated. Overall, 588 curves similar to these oneswere prepared to cover the 12,168 transcoding operations.For the sake of brevity, we show only these four. Theinterested reader can consult the full set of curves in thepublicly available dataset, as mentioned above.

    Estimating QoE. We evaluate the QoE by means of thePeak Signal to Noise Ratio (PSNR) score [20], which is a full-reference metric commonly used due to its simplicity. We

  • apply the PSNR filter7 provided by ffmpeg in two differentcases illustrated in Figure 7.

    The first case, depicted on top of Figure 7, correspondsto the scenario where a target (transcoded) video at agiven spatial resolution is watched on a display of the samesize. The PSNR filter compares the target video against areference video. The reference is the source encoded at thesame resolution as the target but with the largest encodingbit-rate considered in this study (3,000 kbps). We repeatthis measurement as many times as target videos, i.e.,12,168 times. As in the case of the live-transcoding CPUcurves, we only depict in Figure 6 the PSNR curvescorresponding to one example. We provide the remainingset of curves in the public site hosting the dataset.

    The second scenario, shown on bottom of the Figure 7,refers to the situation when a target (transcoded) video ata given resolution need to be upscaled to be watched on adisplay with a higher size. This up-scaling introduces apenalty on the final QoE for the viewer. To estimate thesepenalties, we carry out a new battery of transcodingoperations, using the same ffmpeg command as before, butthe input and output video are the target and the upscaledvideo, respectively. The upscaled video is comparedagainst a reference with an encoding rate of 3,000 kbps butwith the same resolution as the upscaled target. Thepenalty, using the example of up-scaling from 360p to 720pin Figure 7, can simply be computed by subtracting fromthe PSNR measure on top of the Figure the PSNRmeasurement on the bottom. In Figure 8, we depict theup-scaling penalties for a 224p source of type movie.

    500 1000 1500 2000 2500

    0.6

    0.8

    1

    1.2

    1.4

    1.6

    Target rate (in kbps)

    CP

    U(i

    nG

    Hz)

    224p

    360p

    720p

    1080p

    Figure 5: Transcoding CPU curves. Source: 1080p,2,750 kbps, type movie. Target resolutions in the legend.

    500 1000 1500 2000 2500

    30

    32

    34

    36

    38

    40

    Target rate (in kbps)

    PSN

    R(i

    ndB

    )

    1080p

    720p

    360p

    224p

    Figure 6: Transcoding QoE curves. Source: 1080p,2,750 kbps, type movie. Target resolutions in the legend.

    7https://www.ffmpeg.org/ffmpeg-filters.html#psnr

    source360p

    3 Mbps

    source720p

    3 Mbps

    target360p

    1.6 Mbps

    upscaled720p

    1.6 Mbps

    estimating

    QoE

    estimating

    QoE

    up-scaling

    Figure 7: Estimating the QoE for a target video. On top,a video at 360p and 1.6 Mbps. On the bottom, the samevideo upscaled to be watched on a 720p display.

    500 1000 1500 2000 2500 300012108642

    Rate (in kbps)

    PSN

    R(i

    ndB

    )

    360p

    720p

    1080p

    Figure 8: Up-scaling penalties curves. Sources: rate in x-axis, 224p, type movie. Target resolutions in the legend.Target rates equal to source rates

    3.3 The ViewersFinally, we need a dataset that captures the

    characteristics of a real population of viewers, and inparticular its heterogeneity.

    This dataset comes from [4]. Since the Twitch API doesnot provide any information about the viewers (neithertheir geographic positions, nor the devices and the networkconnections), we need real data to set the download ratesfor the population of viewers. The dataset presented in [4]gives multiple measurements over a large number of30s-long DASH sessions from thousands of geographicallydistributed IP addresses. From their measurements, weinfer the download rate of each IP address for every chunkof the session and thus obtain 500,000 samples of realisticdownload rates. After filtering this set to remove outliers,we randomly associate one download rate to one of theviewers watching a channel from the Twitch datasetsnapshot.

    4. CURRENT INDUSTRIAL STRATEGIESTodays live service provider have to implement a strategy

    for stream preparation. To the best of our knowledge, noprovider has yet implemented an optimal strategy. Typicallyone of the following two options is implemented. In thefirst one, used by Twitch, ABR streaming is only offered tosome premium broadcasters. That is, only a small subsetof channels is transcoded into multiple representations. Forthe other broadcasters, the raw video is forwarded to theviewers without transcoding. The problem of this solutionis that many viewers of standard broadcasters cannot watchthe stream because their downloading rate is too low. Thisproblem has been recently discussed in [15].

  • The second option consists in delivering all channels withABR streaming. This is the option we study in the paper.To the best of our knowledge, the live streaming providersapply the same transcoder settings for all channels althoughit has been shown in [22] that such a strategy is sub-optimal.In this paper, we consider two possible strategies.

    Full-Cover Strategy. This corresponds to a strategywith one representation for each resolution smaller than orequal to the source resolution. The bit-rate is chosen suchas to be the lowest possible for the former resolution(100 kbps for low resolutions and 1000 kbps for the highones). With this strategy, viewers with low bandwidthconnections and display sizes smaller than or equal to thesource resolution are guaranteed to find one representation.Moreover, since the CPU requirements are low for lowbit-rates, this strategy is the least CPU-hungry possiblestrategy (among the strategies with at least onerepresentation per resolution).

    Zencoder Strategy. We follow here therecommendations of one of the main cloud transcodingproviders, namely Zencoder. The recommendations aregiven on their public website.8 We give in Table 3 thecharacteristics of the set of representations. Again, onlyrepresentations with a bit-rate and a resolution smallerthan or equal to the video source are produced.

    Video Resolution Bit-rates (in kbps)

    224p 200, 400, 600360p 1000, 1500720p 20001080p 2750

    Table 3: Zencoder encoding recommendations for livestreaming (adapted to our bit-rate ranges).

    5. OPTIMIZING STREAM PREPARATIONWe first address the problem of live video stream

    preparation with an optimization approach. As previouslysaid, the preparation includes both the decision about theencoding parameters of the video representations and theassignment of transcoding jobs to the machines. Our goalis to maximize the QoE of viewers subject to theavailability of hardware resources. In the following we firstprovide a formal formulation of the problem, and then wepresent the ILP model that we use to solve theoptimization problem. Finally, we compare theperformance of the industry-standard strategies with theoptimal.

    5.1 NotationsLet I be the set of raw video streams encoded at the

    broadcaster side. Each video stream i I is characterizedby a type of video content vi V, an encoding bit-rateri R and a spatial resolution si S, where V, R and Sare the sets of video types, the set of encoding bit-rates (inkbps) and the set of spatial resolutions, respectively. Wehave shown in Section 3.1 the diversity of raw videos.

    Let O be the set of the possible video representations thatare generated from the source by transcoding jobs. Each

    8http://zencoder.com/en/hls-guide

    Name Description

    fiou R+ QoE level for representation o transcoded fromstream i watched on a display of size su

    fio R+ QoE level for representation o transcoded fromstream i if display size su matches resolution so

    qou R+ Penalty due to up-scaling from resolution so todisplay size su

    diu Z+ Number of viewers of type u watching a stream iro R+ Value in kbps of the bit-rate of the representation ocu R+ Connection capacity in kbps of viewer type uvu V Video stream requested by viewer type usu S Display size (spatial resolution) of viewer type uN Z+ Overall number of viewersR [0, 1] Minimum fraction of viewers that must be servedpio R+ CPU requirement to perform the live transcoding

    from stream i to representation o in GHz

    Pm R+ CPU capacity of a machine m in GHz

    Table 4: ILP notation.

    representation o O corresponds to a triple (vo, ro, so),that is, to a video representation of type of content vo Vencoded at the resolution so S and at the bit-rate ro R.

    Let M be the set of physical machines where thetranscoding tasks should be performed. Each machinem M can accommodate transcoding jobs up to amaximum CPU load of Pm GHz.

    To reduce the size of the problem, and make it moretractable, we introduce the notion of viewer type. Let U bethe set of viewers types. All viewers in a given viewer typeu U have the same display resolution (i.e., the spatialresolution at which the video is displayed on the device)su S, request the same video type vu V, and use anInternet connection with the same bandwidth of cu kbps.However, viewers of the same type u watch differentchannels. We denote by diu the number of viewers of typeu watching a given channel i. Note that a viewer of a giventype u can play segments encoded at resolutions lower thanits display size su by performing spatial up-sampling beforerendering.

    A viewer from viewer type u watching a videorepresentation o transcoded from a stream i experiences aQoE level of fiou, which is an increasing function of thebit-rate ro. Based on the dataset presented in Section 3.2,we know that the QoE function depends on the videocontent type vo, the resolution so and the original rawvideo stream i. As previously said, the QoE level fiou alsodepends on whether the video should be up-scaled or not,since up-scaling introduces a penalty on the final QoEvalue. We incorporate this up-scaling penalty into the QoEcomputation by the following definition of fiou:

    fiou =

    {fio, if so = sufio qou, if so < su i I, o O, u U (1)

    where fio is the QoE level when the display resolution andthe target video resolution match, and qou is the penalty ofthe up-scaling process from resolution so to the viewerdisplay size su. Table 4 summarizes the notation usedthroughout the paper.

  • Integer Linear Programming formulation

    max{,}

    iI

    oO

    uU

    fiou iou (2a)

    s.t. iou N mM

    iom, i I, o O, u U (2b)mM

    iom uU

    iou, i I, o O (2c)

    iou {N, if su so0, otherwise

    i I, o O, u U (2d)oO

    iou diu, i I, u U (2e)iI

    oO

    (ro cu

    ) iou 0, u U (2f)iI

    oO

    uU

    iou R N, (2g)

    iom

    1, if (vi = vo &si = so &bi > bo)

    (vi = vo &si > so &bi bo)

    0, otherwise

    i I, o O,m M (2h)

    mM

    iom 1, i I, o O (2i)iI

    oO

    pio iom Pm, m M (2j)

    iou [0, N ], i I, o O, u U (2k)iom {0, 1}, i I, o O,m M (2l)

    5.2 ILP ModelWe now describe the ILP. The decision variables in the

    model are:iou Z0 : Number of viewers of type u watching a

    representation o transcoded from a stream i.

    iom =

    1, if machine m transcodes stream i intorepresentation o0, otherwise.With these definitions, the optimization problem can be

    formulated as shown in (2).The objective function (2a) maximizes the average

    viewer QoE. The constraints (2b) and (2c) set up aconsistent relation between the decision variables and .The constraint (2d) establishes that a viewer of type u canplay only the transcoded representations o with spatialresolutions equal or smaller than the viewer display size su,that is, those susceptible to experience an up-samplingoperations at the rendering. The constraints (2e) ensuresthat the sum of all the viewers of type u watching anyrepresentation o transcoded from a given stream i does notexceed the number of viewers of type u originally watchingthe stream i. The constraint (2f) limits the viewer linkcapacity. The constraint (2g) force us to serve at least acertain fraction R of viewers. The constraint (2h) forcesthat only transcoding operations defined over the samevideo content type are allowed and it forbids senselesstranscoding operations, like transcoding to higher bit-rates

    or higher resolutions or transcoding to the samerate-resolution pair. The constraint (2i) guarantees that agiven transcoding task (i, o) is performed in one uniquemachine m. Finally, (2j) sets the CPU capacity of eachmachine m.

    5.3 Settings for Performance EvaluationTo find the exact solution of the optimal problem, we use

    the generic solver IBM ILOG CPLEX [10] on a set ofinstances. Unfortunately, this approach does not allowsolving instances as large as the ones that live serviceproviders face today. Thus, we have built probleminstances based on the datasets introduced in Section 3 butof a smaller size.

    Incoming Videos from Broadcasters. We restrict thesize of the set of sources by picking only the 50 mostpopular channels from the Twitch dataset (see Section 3.1).More precisely, we take 66 snapshots from the dataset,corresponding to those ones extracted every 4 hours along11 days since April, 10th 2014 at 00:00. For each snapshot,we use the channel information (bit-rate and resolution),which we modify slightly to match the spatial resolutionsand bit-rates from Table 6. Each channel is randomlyassigned to one of the four video types given in Table 2.

    QoE for Target Videos. We use the dataset presented inSection 3.2 to obtain the QoE (estimated as a PSNR score)fio of a target video o obtained from transcoding a source i.The up-scaling penalties qou are fixed using PSNR measuresfrom the situation shown on bottom of the Figure 7 (targetresolution lower than display one).

    CPU for the Transcoding Tasks. Still to reduce the sizeof the instances, and thus the complexity of the problem, wefit an exponential function to the set of CPU measurements:

    p = a rb (3)where p is the number of GHz required to transcode asource into a target, a and b are the parameters used in thecurve fitting and the parameter r is the bit-rate in Mbps ofthe target video. The values of the parameters a and bdepend on (i) the source video (content type, bit-rate andresolution), and (ii) the resolution of the target video. Thefitting curves are identified by continuous lines in Figure 5.Table 5 gives the parameters a and b used in the curvesshown in Figure 5.

    Target Resol a b

    224p 0.673091 0.024642360p 0.827912 0.033306720p 1.341512 0.0602221080p 1.547002 0.080571

    Table 5: Parameters of the fitting model of the transcodingCPU curves. Source stream: 1080p, 2,750 kbps, movie

    Viewers. The viewers set U is based on the dataset [4]presented in Section 3.3. However, the number of viewersis too large and we implement the concept of user type. Tobuild the types, we divide the range of bandwidth intobins, whose limits are selected so that each bin contains anequal number of viewers. A viewer type corresponds to abin, with a display spatial resolution set according to thelower bandwidth in the bin, and the downloading rate of

  • the viewer type is equal to the lower bandwidth in the bin.The number of viewers diu watching a raw video i isproportionally set up according to the popularity of thechannel in the Twitch dataset.

    5.4 Numerical ResultsWe now show the results of our analysis. We use the

    previous settings and we also fix the CPU capacity Pm ofall the machines to 2.8 GHz, the speed clock of thephysical processors used by the amazon cloud computingC3 instances9. Our motivation is to determine how farfrom the optimal are current industry-standard strategies.In Figure 9, we represent the average QoE, expressed asthe PSNR in dB, as a function of the number of machines.The line represents the results obtained from solving theoptimization problem with CPLEX. We show with greypins the results for both industry-standard strategies. Theresults are the average over all the snapshots we took fromthe Twitch datasets.

    We first emphasize that the amount of hardwareresources in the data-center has a significant impact on theQoE for the viewers. The difference of PSNR reaches 4 dBbetween 10 and 100 machines. This remark mattersbecause it highlights the need of being able to reserve theright amount of resources in the data-center. However, theability to forecast the load and to reserve the resources isnot trivial for elastic live streaming services such asTwitch.

    Our second main observation is that, on our datasets,the Full-Cover strategy is more efficient than the Zencoderone in terms of trade-off QoE-CPU. The Full-Coverstrategy is close to the optimal, and thus represents anefficient implementation with respect to its simplicity.Note however that Full-Cover needs 48 machines, whilethere exists a solution with the same QoE but with only 35machines. Therefore, a significant reduction of resources toreserve can be obtained. The Zencoder strategy isoutperformed by the Full-cover one, as it consumes nearlytwice the CPU cycles for a tiny increase of the QoE. For asimilar amount of CPU, the QoE gap between theZencoder strategy and the optimal is more than 0.9 dB,which is significant.

    20 40 60 80 100

    29

    30

    31

    32

    33

    Full-CoverZencoder

    number of machines

    Avg.

    PSN

    R(i

    ndB

    )

    Figure 9: Optimal average QoE for the viewers vs. thenumber of machines that are used in the data-centers. The50 most popular channels from several snapshots of theTwitch datasets are transcoded.

    9http://aws.amazon.com/ec2/instance-types/

    To complete this study, we provide another view of thechoices to be taken in Figure 10. Here, we show the ratioof served users and the amount of delivery bandwidthrequired to serve the users. In our ILP, we optimize theaverage QoE so the solutions found by CPLEX are notoptimal on other aspects. In Figure 10b, we see that thedelivery bandwidth of the optimal solution is significantlyhigher than the Full-Cover, which may annihilate the gainsobtained by using fewer machines. Please note that bothparameters of Figure 10 can also be the objective of theILP. In the same vein, the ILP can also be re-written sothat the parameter to be optimized is the amount of CPUneeded, subject to a given QoE value.

    20 40 60 80 1000.9

    0.95

    1

    Full-Cover

    Zencoder

    nb. machinesR

    ati

    oSati

    sfied

    Use

    rs

    (a) Ratio of satisfied users

    20 40 60 80 1000

    5

    10

    15

    20

    Full Cover

    Zencoder

    nb. machines

    Delivery

    Bandw

    idth

    (in

    Mbps)

    (b) Delivery Bandwidth

    Figure 10: Other views on the optimal solution of Figure 9

    6. A HEURISTIC ALGORITHMWe now present and evaluate an algorithm for massive

    live streaming preparation. Our goal is to design a fast,light, adaptive algorithm, which can be implemented inproduction environments. This algorithm should inparticular be able to absorb the variations of the streamingservice demand while using a fixed data-centerinfrastructure.

    6.1 Algorithm DescriptionThe purpose of the algorithm is to update the set of

    transcoded representations with respect to thecharacteristics and the popularity of the incoming rawvideos. The algorithm is executed on a regular basis (forexample every five minutes to stick to the Twitch API) bythe live streaming service provider in charge of thedata-center. You can find in Appendix B the pseudo-codeof the algorithms and some additional details.

    Algorithm in a Nutshell. We process each channeliteratively in a decreasing order of their popularity. For agiven channel, the algorithm has two phases: First, wedecide a CPU budget for this channel. Second, wedetermine a set of representations with respect to the CPUbudget computed during the first phase.

    Set a CPU Budget Per Channel. We base ouralgorithm on the observations of the optimal solutionsfound by CPLEX. Four main observations are illustrated inFigure 14 in the Appendix: (i) the ratio of the overall CPUbudget of a given channel is roughly proportional to theratio of viewers watching this channel; (ii) the CPU budgetper channel is less than 10 GHz; (iii) some video types

  • (e.g., sport) require more CPU budget than others (e.g.,cartoon); and (iv) the higher is the resolution of thesource, the bigger the CPU budget.

    We derive from these four observations the algorithmshown in Algorithm 1. We start with the most popularchannel. We first set a nominal CPU budget according tothe ratio of viewers and the maximal allowable budget.Then we adjust this nominal budget based on a video typeweight and a resolution weight (interested readers can findin the Appendix details about the values we chose for theseweights, based on observations from the optimal solutions).We obtain a CPU budget for this channel.

    Decide the Representations for a Given CPUBudget. The pseudo-code is detailed in Algorithm 2.This algorithm builds the set of representations byiteratively adding the best representation. At each step,the needed CPU budget to transcode the chosenrepresentation should not exhaust the remaining channelbudget. To decide among the possible representations, weneed to estimate the QoE gain that every possiblerepresentation can provide if it is chosen. To do so, weestimate the assignment between the representations andthe viewers in Algorithm 3. In short, this algorithmrequires a basic knowledge on the distribution ofdownloading rates in the population of viewers, which isusually the case for service providers. (In this work, westudy the worst case scenario where service providers donot have this information and they assume a uniformdistribution in the range between 100 kbps and3,000 kbps). The idea is then to assign subsets of thepopulation to representations and to evaluate the overallQoE. Details are given in the Appendix.

    6.2 Simulation SettingsOur simulator is based on the datasets presented in

    Section 3 and the extra settings given in Section 5.3.However, in contrast to the ILP, our heuristic is expectedto scale. Therefore we evaluate the heuristic and theaforementioned industry-standard strategies on thecomplete dataset containing all online broadcasters at eachsnapshot. Regarding the viewers, we consider now eachviewer, to which we randomly assign a bandwidth value aspresented in Section 3.3 and a display resolutionaccordingly. We use the actual number of viewers watchingchannels according to the Twitch dataset.

    Please note also that we focus here on the decision aboutthe representations (number of representations andtranscoding parameters), and we neglect the assignment oftranscoding jobs to machines. This does not impact theevaluation since all tested strategies (our heuristic andboth industry-standard strategies) can be evaluatedwithout regard to this assignment. We let for future worksthe integration of the VM assignment policy intomiddleware such as OpenStack.10

    6.3 Performance EvaluationsIn the following, we present the same set of results from

    two different perspectives; first, we show how theperformances evolve throughout the eleven days weconsider. Then, we present the results in order to highlightthe main features of the algorithms.

    10http://www.openstack.org/

    In Figure 11 we show the three main metrics during our11-days dataset. The combination of the three figuresreveals the main characteristics of the strategies. The mainpoint we would like to highlight is that our heuristic keepsa relatively constant, low CPU consumption withoutregard to the traffic load in input. Our heuristic alsosucceeds in maintaining a high QoE. To achieve thisexcellent trade-off, our heuristic adjusts with the ratio ofserved viewers. Yet, this ratio maintains a high value sinceit is always greater than 95%. Our heuristic thusdemonstrates the benefits from having differentrepresentation sets for the different channels according totheir popularity. The industry-standard strategies are lesscapable of absorbing the changing demand. In particular,the CPU needs of the Zencoder strategy ranges from1,000 GHz to 18,000 GHz while the average QoE is alwayslower than for our heuristic.

    To highlight the relationship between CPU needs and QoEfor the population, we represent in Figure 12 a cloud ofpoints for each snapshot. The Full-Cover has most points inthe southwest area of the Figure, which corresponds to a lowCPU utilization but also a low QoE. We also note that thedistance between two points can be high, which emphasizesan inability to absorb load variations. This inability is evenstronger for the Zencoder strategy, for which the points arefar from each other, covering all areas of the Figure. Onthe contrary, our heuristic absorbs well elastic services, withpoints that are concentrated in the northwest part of theFigure, which means low CPU and high QoE.

    7. CONCLUSIONSThis paper studies the management of new live adaptive

    streaming services in the cloud from the point of view ofstreaming providers using cloud computing platforms. Allthe simulations conducted in the paper make use of realdata from three datasets covering all the actors in thesystem. The study is focused on the interactions betweenthe optimal video encoding parameters, the available CPUresources and the QoE perceived by the end-viewers. Weuse an ILP to model this system and we compare itsoptimal solution to current industry-standard solutions,highlighting the gap between the two. Due to the ILPcomputational limitations, we propose a practicalalgorithm to solve problems of real size, thanks to keyinsights gathered from the optimal solution. Thisalgorithm finds representations beating theindustry-standard approaches in terms of the trade-offbetween viewers QoE and CPU resources needed.Furthermore, it uses an almost-constant amount ofcomputing resources even in the presence of a time varyingdemand.

    8. REFERENCES[1] Twitch.tv: 2013 retrospective, Jan. 2014.

    http://twitch.tv/year/2013.

    [2] M. Adler, R. K. Sitaraman, and H. Venkataramani.Algorithms for optimizing the bandwidth cost ofcontent delivery. Computer Networks,55(18):40074020, 2011.

    [3] L. A. Barroso, J. Clidaras, and U. Holzle. TheDatacenter as a Computer: An Introduction to theDesign of Warehouse-Scale Machines. Morgan &Claypool, 2013. Second Edition.

  • Full-Cover Zencoder Heuristic

    1 2 3 4 5 6 7 8 9 10 11

    31.5

    32

    32.5

    33

    Days

    Avg.ViewerPSNR(DB)

    (a) Average Viewer PSNR

    1 2 3 4 5 6 7 8 9 10 110.9

    0.95

    1

    Days

    Ratio

    SatisfiedViewers

    (b) Ratio of satisfied viewers

    1 2 3 4 5 6 7 8 9 10 110

    5 000

    10 000

    15 000

    20 000

    Days

    TotalCPU(GHz)

    (c) Total CPU needs

    Figure 11: Different metric results over time for the distinct solutions: Full-Cover strategy, Zencoder encodingrecommendations and our Heuristic.

    Satisfied Viewers Ratio: 95-96% 97-98% 99-100%

    0 5 000 10 000 15 000 20 00031

    31.5

    32

    32.5

    33

    Total CPU (GHz)

    Avg.PSNR(dB)

    (a) Full-Cover

    0 5 000 10 000 15 000 20 00031

    31.5

    32

    32.5

    33

    Total CPU (GHz)

    (b) Heuristic

    0 5 000 10 000 15 000 20 00031

    31.5

    32

    32.5

    33

    Total CPU (GHz)

    (c) Zencoder

    Figure 12: Total required CPU (in GHz) by the perceived QoE for strategies Full-Cover, Zencoder encoding recommendationsand Heuristic. The markes shape indicates the percentage of satisfied viewers. Each point corresponds to one of the 66snapshots.

    [4] S. Basso, A. Servetti, E. Masala, and J. C. D. Martin.Measuring DASH streaming performance from the endusers perspective using neubot. In ACM MMSys, 2014.

    [5] N. Bouzakaria, C. Concolato, and J. L. Feuvre.Overhead and performance of low latency livestreaming using MPEG-DASH. In IEEE IISA, 2014.

    [6] R. Cheng, W. Wu, Y. Lou, and Y. Chen. Acloud-based transcoding framework for real-timemobile video conferencing system. In IEEEMobileCloud, 2014.

    [7] J. Guo and L. Bhuyan. Load balancing in acluster-based web server for multimedia applications.IEEE Transactions on Parallel and DistributedSystems, 17(11):13211334, Nov. 2006.

    [8] Z. He, Y. Liang, L. Chen, I. Ahmad, and D. Wu.Power-rate-distortion analysis for wireless videocommunication under energy constraints. IEEETransactions on Circuits and Systems for VideoTechnology, 15(5):645658, 2005.

    [9] Z. Huang, C. Mei, L. Li, and T. Woo. CloudStream:Delivering high-quality streaming videos through acloud-based SVC proxy. In IEEE INFOCOM, 2011.

    [10] IBM. Ilog cplex optimization studio.http://is.gd/3GGOFp.

    [11] F. Jokhio, A. Ashraf, S. Lafond, I. Porres, and

    J. Lilius. Prediction-based dynamic resource allocationfor video transcoding in cloud computing. InEuromicro PDP, 2013.

    [12] F. Lao, X. Zhang, and Z. Guo. Parallelizing videotranscoding using map-reduce-based cloud computing.In IEEE ISCAS, 2012.

    [13] S. Lin, X. Zhang, Q. Yu, H. Qi, and S. Ma.Parallelizing video transcoding with load balancing oncloud computing. In IEEE ISCAS, 2013.

    [14] J. Liu, G. Simon, C. Rosenberg, and G. Texier.Optimal delivery of rate-adaptive streams inunderprovisioned networks. IEEE Journal on SelectedAreas in Communications, 2014.

    [15] K. Pires and G. Simon. DASH in Twitch: AdaptiveBitrate Streaming in Live Game Streaming Platforms.In ACM VideoNext Conext Workshop, 2014.

    [16] C. Reano Gonzalez. CU2rCU: A CUDA-to-rCUDAConverter. Masters thesis, Universitat Politecnica deValencia, 2013.

    [17] H. Reddy. Adapt Or Die: Why Pay-TV OperatorsMust Evolve Their Video Architectures, July 2014.Videonet white paper.

    [18] J. Reichel, H. Schwarz, and M. Wien. Joint scalablevideo model 11 (jsvm 11). Joint Video Team, Doc.JVT-X202, 2007.

  • [19] R. Shea and J. Liu. On GPU pass-throughperformance for cloud gaming: Experiments andanalysis. In ACM Netgames, 2013.

    [20] H. Sohn, H. Yoo, W. De Neve, C. S. Kim, and Y.-M.Ro. Full-reference video quality metric for fullyscalable and mobile svc content. IEEE Transactionson Broadcasting, 56(3):269280, Sept 2010.

    [21] L. Su, Y. Lu, F. Wu, S. Li, and W. Gao.Complexity-constrained h.264 video encoding. IEEETransactions on Circuits and Systems for VideoTechnology, 19(4):477490, Apr. 2009.

    [22] L. Toni, R. Aparicio-Pardo, G. Simon, A. Blanc, andP. Frossard. Optimal set of video representations inadaptive streaming. In ACM MMSys, 2014.

    [23] Z. Wang, L. Sun, C. Wu, W. Zhu, and S. Yang. Jointonline transcoding and geo-distributed delivery fordynamic adaptive streaming. In IEEE INFOCOM,2014.

    [24] XIPH. :xiph.org video test media.

    [25] M. Yang, J. Cai, Y. Wen, and C. H. Foh.Complexity-rate-distortion evaluation of videoencoding for cloud media computing. In IEEE ICON,2011.

    APPENDIXA. TRANSCODING CPU USAGE

    In this Appendix we give more details about theexperiments that we have used to estimate the CPU usageof different transcoding operations.

    A.1 Input and Output RatesAs discussed in Section 3.2, we consider only the rates

    covering 90% of the sources that we observe in the Twitchdataset, as detailed in Table 6. For low resolutions (224pand 360p), the set of bit-rates ranges from 100 kbps up to3000 kbps with steps of 100 kbps, while for high resolutions(720p and 1080p), the set of bit-rates ranges from1000 kbps up to 3000 kbps with steps of 250 kbps. Thus,each video sequence from Table 2 can be encoded into 78different combinations of rates and resolutions. To obtainthese 78 sources, we took the original full-quality decodedvideos from Table 2 and we encoded them into each of the78 videos that we consider as possible raw videos.

    Resol. Width x Height MinMax Rates Rate Steps

    224p 400 x 224 1003000 kbps 100 kbps360p 640 x 360 1003000 kbps 100 kbps720p 1280 x 720 10003000 kbps 250 kbps1080p 1920 x 1080 10003000 kbps 250 kbps

    Table 6: Resolutions and ranges of rates for the raw videos.

    A.2 TranscodingThe transcoding operation that we have performed is

    summarized in Figure 13. This operation has been done12,168 times in total. This corresponds to 4 (the number ofvideo types) multiplied by 78 (the number of possiblesources) multiplied by 39 (the average number of possibletarget videos). Recall that, for each input video, weproduced only videos with resolutions and bit-rates lowerthan or equal to those of the input. That is, only a subset

    of the 78 possible representations are created from a givenraw video.

    We have used ffmpeg for the transcoding with the sameparameters as in [5], which is a study conducted by theleading developers of the popular GPAC video encoder.The command is

    f fmpeg i source name vcodec l i b x 2 6 4 p r e s e tu l t r a f a s t tune z e r o l a t e n c y

    s t a r g e t r e s o l u t i o n r 30 b t a r g e t r a t e anta rge t name

    originalvideoyuv

    source720p

    2.25 Mbps

    target360p

    1.6 Mbps

    encoding transcoding

    measure CPU cycles

    Figure 13: Measuring the CPU cycles for the transcodingof any source to any target video. Here an example with asource at 720p and 2.25 Mbps and a target video at 360pand 1.6 Mbps.

    Measuring CPU cycles. As discussed above and inSection 3.2, we have run many transcoding operationsusing an Intel Xeon CPU E5640 at 2.67GHz with 24 GB ofRAM using Linux 3.2 with Ubuntu 12.04. To measure thenumber of used CPU cycles, we use the perf tool11, aprofiler for Linux 2.6+ based systems, with the nextcommand:

    p e r f s t a t x e CYCLES

    This command provides access to the counter collectingthe number of CPU cycles at the Performance MonitoringUnit (PMU) of the processor. Then, this number is dividedby the duration in s of the video sequences to obtain thefrequency of CPU (in GHz) required to perform thetranscoding during a running time equal to the play timeof the video, that is, the frequency of CPU required to do alive transcoding.

    B. HEURISTIC

    B.1 SettingsCPU Budget Per Channel. The values we used forboth video type weight and resolution weight are given inTables 7. These values correspond the average of thecurves plotted in Figure 14, which we obtain from theoptimal solutions computed by CPLEX on the 50 mostpopular channels and 100 machines. The x-axis is the rankof the channels according to their popularity. On the topwe show the average distribution of CPU budget perchannel. On the bottom, we show the average differencebetween the average CPU budget when a channel from agiven type (respectively resolution) is at a given rank andthe average CPU budget for channels at this rank. Thisdifference allows us to compute the adjustment of CPUbudget according to the video type and the resolution.

    11https://perf.wiki.kernel.org

  • Video Type Weight

    Cartoon -0.176Documentary 0.072

    Sport 0.190Video -0.076

    Resolution Weight

    224p -0.917360p -0.657720p -0.1081080p 0.432

    Table 7: Video type and resolution weights

    0.02

    0.04

    CPU%

    0.4

    0.2

    0

    0.2

    0.4

    CPUDiff.

    Cartoon Documentary Sport Video

    0 10 20 30 40 50

    1

    0.5

    0

    0.5

    Channel Rank Position by its Popularity

    CPUDiff.

    224p 360p 720p 1080p

    Figure 14: CPU information for a given channel popularityrank. Data comes from the average optimal solution with100 machines. From the average total CPU % accordingto the rank (top figure), we derive the difference betweendistinct video types (middle figure) and between distinctchannel input resolutions (bottom figure).

    Estimation Viewers QoE. To estimate the QoE gainwhen choosing one representation, we need to consider allthe assignments representations-viewers. In Algorithm 3,we evaluate the representations in the set in descendingorder of their bit-rates. At each iteration, we identify thefraction of viewers whose bandwidth is between the rate ofthe considered representation and the closestrepresentation with superior bit-rate. Then, we also haveto take into account the display sizes of the viewers (it alsodepends on the knowledge of the service provider; we againconsidered it a minimal knowledge of the population).Therefore, this fraction of viewers is again split into one ormore sub-fractions corresponding to their displayresolutions. A different value of PSNR is then computedfor each sub-fraction. This value is multiplied by the ratioof viewers belonging to the sub-fraction. When all therepresentations have been assessed, the sum of the PSNRcontributions of all the sub-fractions of viewers is returnedas the estimated QoE of the set.

    B.2 Algorithms in Pseudo-codeThis section includes the pseudo-codes of the heuristic

    described in Section 6.

    Algorithm 1: Main routine

    Data: channelsSet: Channels metadata (e.g. number ofviewers, id) sorted by decreasing channel popularity.

    Data: totalCPU: total CPU Budget in GHz.1 representations emptySet()2 foreach channel channelsSet do3 cpu channel.viewersRatio totalCPU4 cpu min(cpu,MAX CPU)5 w video getV ideoTypeWeight(channel.video)6 w resol getResolutionWeight(channel.resolution)7 cpu max(cpu (1 + w video + w resol), 0)8 representations.append(findReps(channel, cpu))

    9 return representations

    Algorithm 2: findReps Find the channels representations

    set meeting a budget

    Data: channel: Channel metadata (e.g. number ofviewers, id).

    Data: CPU: calculated channel CPU Budget in GHz.1 representations emptySet()2 freeCPU CPU3 repeat4 newRep false5 foreach

    resolution channel.resolution resolutionsSet do6 foreach bitRate bitRatesSet[resolution] do7 thisRep (resolution, bitRate)8 if bitRate channel.bitRate and

    thisRep / representations then9 thisRep.cpu getCPU(thisRep, channel)10 if thisRep.cpu freeCPU then11 reps representations + thisRep12 thisRep.qoe getQoE(reps, channel)13 if not newRep or

    newRep.qoe < thisRep.qoe then14 newRep thisRep

    15 if newRep then16 representations.append(newRep)17 freeCPU = newRep.cpu18 until not newRep19 return representations

    Algorithm 3: getQoE Obtain an estimation of PSNR for a

    given representation set of one channel

    Data: channel: Channel metadata (e.g. number ofviewers, id).

    Data: repSet: Set of representations for a given channel.1 totalPSNR 02 foreach rep repSet do3 ranges getResolutionsRanges(rep)4 foreach viewersRange ranges do5 v ratio getV iewersRatio(viewersRange)6 v resol getV iewersResolution(viewersRange)7 partialPSNR calcPSNR(rep, channel, v resol)8 totalPSNR+ = v ratio partialPSNR9 return totalPSNR