-
Transcoding Live Adaptive Video Streamsat a Massive Scale in the
Cloud
Ramon Aparicio-PardoTelecom Bretagne, France
Karine PiresTelecom Bretagne, France
Alberto BlancTelecom Bretagne, France
Gwendal SimonTelecom Bretagne, France
ABSTRACTMore and more users are watching online videos
producedby non-professional sources (e.g., gamers, teachers of
onlinecourses, witnesses of public events) by using anincreasingly
diverse set of devices to access the videos (e.g.,smartphones,
tablets, HDTV). Live streaming serviceproviders can combine
adaptive streaming technologies andcloud computing to satisfy this
demand. In this paper, westudy the problem of preparing live video
streams fordelivery using cloud computing infrastructure, e.g.,
howmany representations to use and the correspondingparameters
(resolution and bit-rate). We present an integerlinear program
(ILP) to maximize the average user qualityof experience (QoE) and a
heuristic algorithm that canscale to large number of videos and
users. We alsointroduce two new datasets: one characterizing a
popularlive streaming provider (Twitch) and anothercharacterizing
the computing resources needed to transcodea video. They are used
to set up realistic test scenarios.We compare the performance of
the optimal ILP solutionwith current industry standards, showing
that the latterare sub-optimal. The solution of the ILP also shows
theimportance of the type of video on the optimal
streamingpreparation. By taking advantage of this, the
proposedheuristic can efficiently satisfy a time varying demand
withan almost constant amount of computing resources.
Categories and Subject DescriptorsH.5.1 [Information Interfaces
and Presentation]:Multimedia Information SystemsVideo
General TermsAlgorithm, Design, Measurement
KeywordsLive streaming, cloud-computing, video encoding
Permission to make digital or hard copies of all or part of this
work forpersonal or classroom use is granted without fee provided
that copies arenot made or distributed for profit or commercial
advantage and that copiesbear this notice and the full citation on
the first page. To copy otherwise, torepublish, to post on servers
or to redistribute to lists, requires prior specificpermission
and/or a fee. Request permissions from [email protected]
15, March 18 - 20 2015, Portland, OR, USACopyright 2015 ACM
978-1-4503-3351-1/15/03
...$15.00http://dx.doi.org/10.1145/2713168.2713177
1. INTRODUCTIONThe management of live video services is a
complex task
due to the demand for specialized resources and toreal-time
constraints. To guarantee the quality ofexperience (QoE) for
end-users, live streaming serviceproviders (e.g., TV operators and
multimediabroadcasters) have traditionally relied on
privatedata-centers (with dedicated hardware) and privatenetworks.
The widespread availability of cloud computingplatforms, with ever
decreasing prices, has changed thelandscape [17]. Significant
economies of scale can beobtained by using standard hardware,
VirtualMachine (VM), and shared resources in large data-centers.As
illustrated in Figure 1, live streaming providers usethese services
in combination with widely available contentdelivery network (CDN)
to build an elastic and scalableplatform that can adapt itself to
the dynamics of viewerdemand. The only condition is to be able to
use thestandardized cloud computing platforms to prepare thevideo
for delivery.
The emergence of cloud computing platforms hasenabled some new
trends, including: (i) the adoption ofadaptive bit-rate (ABR)
streaming technologies to addressthe heterogeneity of end-users.
ABR streaming requiresencoding multiple video representations, and
thus increasesthe demand for hardware resources. Modern
cloudcomputing platforms can meet this demand. And (ii) thegrowing
diversity of live video streams to deliver. Thepopularity of
services like Twitch [1] illustrates theemergence of new forms of
live streaming services, wherethe video stream to be delivered
comes fromnon-professional sources (e.g., gamers, teachers of
online
Data-Center
CDN
Figure 1: Live streaming in the cloud
-
courses, witnesses of public events). Instead of a
fewhigh-quality well-defined video streams, live streamingproviders
have now to deal with many low-qualityunreliable video streams.
In comparison to the significance of the shift, relativelyfew
academic studies have been published. The scientificliterature
contains papers related to ABR streaming intoCDN (e.g., [2,14]).
However, to the best of our knowledge,the preparation of the
streams into data-centers has notbeen addressed by the scientific
community. Thepreparation of a given video channel includes
deciding thenumber of representations to encode, setting the
encoderparameters, allocating the transcoding jobs to machines,and
transcoding each raw video stream into multiple
videorepresentations.
Existing works address some of these problemsindividually. For
instance, some papers [7, 9, 1113] presentalgorithms to schedule
transcoding jobs on a group ofcomputers, typically in order to
maximize CPUutilization [7, 11] or to minimize the finishing time
[12, 13].Other researchers have analyzed the performance of
videoencoding and the relationship between power, rate
anddistortion [8, 21, 25] using analytical models and
empiricalstudies. In each case the encoding parameters
(i.e.,resolution and rate) are input parameters of thesealgorithms,
and are assumed to be known. Yet, they canhave a significant impact
on the QoE and on the totalbandwidth used, as discussed in
[22].
Even though solutions have already been proposed forthese
subproblems, it is non-trivial to combine them toform a single
solution and there is no guarantee that acombination of optimal
solutions of each subproblem is anoptimal and feasible solution of
the global problem. Forexample, selecting the available
representations (resolutionand bit-rate) without considering the
available computingresources is likely to lead to unfeasible
solutions.
In this paper we are interested in maximizing theaverage user
QoE by selecting the optimal encodingparameters under given
computing and CDN capacityconstraints. More specifically, we make
three contributions:
1. We provide three datasets to help the communitystudy the
problem of preparing ABR video indata-centers. The first dataset is
based on ameasurement campaign we have conducted on Twitchbetween
January and April 2014. The seconddataset, based on [4], gives
bandwidth measurementson real clients downloading ABR streams. The
thirddataset is the result of a large number of
transcodingoperations that we have done on a standard server,which
is typical of commoditized data-centerhardware. Thanks to this
dataset, it is possible todetermine the computing resources needed
totranscode a video and the QoE of the output video.
2. We formulate an optimization problem for themanagement of a
data-center dealing with a largenumber of live video streams to
prepare for delivery(section 5). Our goal is to maximize the QoE
for theend-users subject to the number of availablemachines in the
data-center. With this problem, wehighlight the complex interplay
between thepopularity of channels, the required computingresources
for video transcoding, and the QoE of
end-users. We formulate this problem as an integerlinear program
(ILP). We then use a generic solver tocompare the performances of
standard streampreparation strategies (where all the channels use
thesame encoding parameters for the transcodingoperation) to the
optimal. Our results highlight thegap between the standard
preparation strategies andthe optimal solution.
3. We propose a heuristic algorithm for the preparationof live
ABR video streams (section 6). This algorithmcan decide on-the-fly
the encoding parameters. Ourresults (section 6) show that our
proposal significantlyimproves the QoE of the end-users while using
almostconstant computing resources even in the presence ofa time
varying demand.
2. RELATEDWORKSCloud-based transcoding has been the subject of
several
papers. Most of these works [7, 9, 1113] take advantage ofthe
fact that some modern video compression techniquesdivide the video
stream into non-overlapping Group ofPictures (GOPs) that can be
treated independently of eachother. The encoding time of each GOP
depends on itsduration and on the complexity of the corresponding
scene.The algorithms exploit this fact to increase the
utilizationof each computing node at the expense of an
increasedcomplexity, including the time and resources needed
tosplit the input video into appropriately sized GOP.
One downside of these solutions is that they need toknow the
transcoding time of each GOP in order to assignit to the most
suitable computing node. Someauthors [7, 11] propose fairly
complicated systems toestimate the encoding time of each GOP based
onreal-time measurements, while others [9,12,13] assume thatthis
information is directly available, for instance [9] byprofiling the
encoding of a few representative videos ofdifferent types,
similarly to what we have done (seesection 3.2). Another downside
of a GOP-based solution isthat the encoding of each GOP can be
completed out oforder and then need to be reordered before being
deliveredto the users. This out-of-order problem is
especiallyimportant when dealing with live content that
requiresreal-time constraints. Only Huang et al. [9]
explicitlyconsider real-time constraints in a GOP-based system.
Lao et al. [12] and Lin et al. [13] deal only with batchesof
videos to transcode and present different schedulingalgorithms to
minimize the overall encoding time.
Zhi et al. [23] propose to leverage underused CDNcomputing
resources to jointly transcode and deliver videosby having CDN
servers transcode and store the mostpopular video segments. Such a
solution can offersignificant gains, especially for non-live
popular streams,but it requires the cooperation of the CDN, which
is notalways owned and operated by the cloud provider.
Cheng et al. [6] present a framework for real-timecloud-based
video transcoding in the context of mobilevideo conferencing.
Depending on the number ofparticipants and their locations, every
video conferencecorresponds to one or more transcoding jobs, each
onelocated in a potentially different data center. They use asimple
linear model to estimate the resources needed byeach transcoding
job; if the currently running VMs have
-
enough spare capacity to handle the new job, they usethem,
otherwise they start new VMs, without a constrainton the total
number of VMs used. They assume a linearrelationship between the
video encoding rate and CPUusage, based on some measurements, for
which no detailsare given. As shown in section 3.2, this is not
consistentwith our experiments using ffmpeg to encode H.264
videos.
The literature on video encoding is vast. A few papershave
studied the relationship between power consumption,rate and
distortion (often abbreviated as P-R-D). The firstpaper to
investigate the P-R-D model by He et al. [8]contains a detailed
analysis and corresponding model ofthe video encoding process. The
authors use this model todefine an algorithm that, given rate and
power constraints,minimizes the distortion of the compressed video.
Su etal. [21] use a different definition for the distortion
andpropose a different algorithm to solve the sameoptimization
problem. These works deal with a single videoflow and take the rate
as an input parameter, they do notaddress how to chose its
value.
Yang et al. [25] present the results of an empirical studybased
on the H.264 Scalable Video Coding (SVC) referencesoftware
JSVM-9.19 [18]. While non-SVC H.264 can beconsidered as a special
case consisting of only one layer,the authors emphasize the results
related to the SVC part.Since the raw data is not publicly
available, and since thefigures in the paper do not correspond to
the inputs weneed, we run similar experiments leading to the
datasetpresented in section 3.2.
3. PROBLEM DEFINITION BY DATASETSIn this Section, we present the
three datasets used
throughout the paper. They will help us to introduce
theparameters influencing the stream preparation. The threedatasets
cover the chain of involved (directly or indirectly)actors:
broadcasters, live service provider, and viewers.
3.1 The BroadcastersWe will interchangeably use the terms
channel and
broadcaster to indicate the people using the live
streamingsystem to deliver a video. At any given time, a channel
canbe either online, when the broadcaster emits the videostream, or
oine when the broadcaster is disconnected.Each online period is
called a session. During a session, abroadcaster captures a video,
encodes it, and uploads it tothe service provider. We say that this
video stream is thesource or the raw video stream. The service
provider isthen in charge of transcoding this video into one
ormultiple video representations, and of delivering
theserepresentations to the viewers, or end-users. The numberof
viewers watching a session can change over time.Figure 2 shows the
evolution of the popularity of a givenchannel over time, this
channel containing two sessions.
Todays channels in cloud-based live streaming servicesare mostly
non-professional. We focus here on thethousands of broadcasters who
use live streaming servicessuch as ustream,1 livestream,2 twitch,3
and dailymotion4
to broadcast live an event that they are capturing from
1http://www.ustream.tv/2http://new.livestream.com/3http://www.twitch.tv/4https://www.dmcloud.net/features/live-streaming
online online
nb. of viewers
time
session 1 session 2
Figure 2: A life in a channel
their connected video device (e.g., camera, smartphone,and game
console). As opposed to the traditional TVproviders and the content
owners from the entertainmentindustry, these broadcasters usually
do not emit ultra-HDvideo streams (2160p also known as 4k) and they
tolerate ashort lag in the delivery. However, these broadcasters
areless reliable. First, a channel can switch from oine toonline
and vice versa at any time. Second, the emittedstreams have various
bit-rates and resolutions, as well asvarious encoding parameters.
Third, the broadcasters donot give much information about their
video streams.
In this paper, we use a dataset based on Twitch, apopular live
streaming systems. Twitch provides anApplication Programming
Interface (API) that allowsanybody to fetch information. We used a
set ofsynchronized computers to obtain a global state every
fiveminutes (in compliance with API restrictions) betweenJanuary,
6th and April, 6th 2014. We fetched informationabout the total
number of viewers, the total number ofconcurrent online channels,
the number of viewers persession, and some channel metadata. We
then filtered thebroadcasters having abnormal behavior (no viewer
oronline for less than five minutes during the last threemonths).
The dataset is publicly available.5 We summarizethe main statistics
in Table 1.
Data Statistics
total nb. of channels 1,536,492total nb. of sessions
6,242,609online less than 5 min. overall channels 25%no viewer
channels 11%filtered nb. of channels 1,068,138 (69%)filtered nb. of
sessions 5,221,208 (83%)
Table 1: The Twitch dataset
Figure 3 shows the average number of concurrent onlinechannels,
which is a useful metric to estimate thecomputing power needed and
thus the data-centerdimensions. Between 4,000 and 8,000 concurrent
sessionsalways require data-center processing.
To illustrate the diversity of the raw videos, Figure 4shows the
cumulative density function (CDF) of thebit-rates of sessions for
the three most popular resolutions.The key observation is the wide
range of the bit-rates, evenfor a given resolution. For example,
the bit-rates of 360psources range from 200 kbps to more than 3
Mbps.
5http://dash.ipv6.enstb.fr/dataset/twitch/
-
0 10 20 30 40 50 60 70 80 900
2 000
4 000
6 000
8 000
10 000
Days
Nb
.of
on
lin
ech
ann
els
max min
Figure 3: Number of concurrent sessions in Twitch (min andmax
per day)
100 1 000 10 0000
0.2
0.4
0.6
0.8
1
Video bit-rate (kbps)
CD
Fof
the
sess
ions
1080p
720p
360p
Figure 4: CDF of the session source bit-rates
3.2 The Live Streaming Service ProviderOne of the missions of
the live streaming service is to
transform the raw video into a multimedia object that canbe
delivered to a large number of users. We callpreparation this
phase. In this paper, we focus on the taskof transcoding the raw
video stream into a set of ABRvideo streams for delivery, and we
neglect other tasks suchas content sanity check and the
implementation of thedigital right management policy. For each
session, our goalis twofold: (i) to define the set of video
representations tobe transcoded, which supposes to decide the
number ofrepresentations, and, for each representation, the
bit-rateand the resolution; and, (ii) to assign the transcoding
jobsto the data-center machines.
A key information for our study is the amount of
centralprocessing unit (CPU) cycles that are required to
transcodeone raw video into a video stream in a different format.
Thisquantity depends on various parameters, but mostly on (i)the
bit-rate and the resolution of the source, (ii) the typeof the
source, and (iii) the bit-rate and the resolution ofthe target
video stream. To obtain a realistic estimate, wehave performed a
set of transcoding operations from multipletypes of sources encoded
at different resolutions and rates toa wide range of target
resolutions and rates. For each one ofthe transcoding operations,
we have estimated the QoE ofthe transcoded video, and measured the
CPU cycles requiredto perform it. This dataset as well is publicly
available.6
Source Types. We consider four types of video
content,corresponding to four test sequences available at [24].
Eachof these four test sequences corresponds to a
representativevideo type as given in Table 2.
6http://dash.ipv6.enstb.fr/dataset/transcoding/
Video Type Video Name
Documentary Aspen, Snow MountainSport Touchdown Pass, Rush Field
CutsCartoon Big Buck Bunny, Sintel TrailerVideo Old Town Cross
Table 2: Test videos and corresponding type.
Source Encoding. In current live streaming systems,the encoding
of the source is done at the broadcaster side.As shown in Figure 4,
the raw video that is emitted by thebroadcaster can be encoded with
different parameters.Based on our analysis of the Twitch dataset,
we consideronly four resolutions, from 224p to 1080p, and we let
theanalysis of 4k raw videos for future works. We also restrictthe
video bit-rates to be in ranges covering 90% of thesources that we
observe in the Twitch dataset. See Table 6in the Appendix for more
details.
Target Videos. The format of the target videos dependson the
source video. For each input video we consider all theresolutions
that are smaller than or equal to the input videoand for each
resolution we consider all the rates that aresmaller or equal to
the rate of the input video (see Table 6in the Appendix for more
details).
Transcoding. We perform the transcoding on a standardserver,
similar to what can be found in most publicdata-centers. The debate
about whether graphicsprocessing unit (GPU) can be used in a public
cloud is stillacute today. Those who do not believe in a
wideavailability of GPU in the cloud emphasize the poorperformance
of standard virtualization tools on GPU [19]and the preferences of
the main cloud providers for low-endservers (the so-called wimpy
servers) in data-centers [3].On the other hand, new middleware have
been developedto improve GPU sharing and VM-GPU matching
indata-centers [16], so it may be possible to envision a
widerdeployment of GPU in a near future. Nevertheless, in
thispaper, we stick to a conservative position, which is the
oneadopted by todays live streaming service providers, and
weconsider only the availability of CPU in the servers.
As for the physical aspect of the CPU cyclesmeasurements, we
consider that the virtualization has noimpact on the performances,
i.e. a transcoder running in aVM on a shared physical machine is as
fast as if it randirectly on the physical machine. The server that
we usedis an Intel Xeon CPU E5640 at 2.67GHz with 24 GB ofRAM using
Linux 3.2 with Ubuntu 12.04.
Figure 5 shows the experimental results for all the targetvideos
generated from a source of type movie, 1080presolution and encoded
at 2,750 kbps. The empirical CPUcycles measurements are depicted as
marks. Section A.2 inthe Appendix, gives more details on how these
curves havebeen generated. Overall, 588 curves similar to these
oneswere prepared to cover the 12,168 transcoding operations.For
the sake of brevity, we show only these four. Theinterested reader
can consult the full set of curves in thepublicly available
dataset, as mentioned above.
Estimating QoE. We evaluate the QoE by means of thePeak Signal
to Noise Ratio (PSNR) score [20], which is a full-reference metric
commonly used due to its simplicity. We
-
apply the PSNR filter7 provided by ffmpeg in two differentcases
illustrated in Figure 7.
The first case, depicted on top of Figure 7, correspondsto the
scenario where a target (transcoded) video at agiven spatial
resolution is watched on a display of the samesize. The PSNR filter
compares the target video against areference video. The reference
is the source encoded at thesame resolution as the target but with
the largest encodingbit-rate considered in this study (3,000 kbps).
We repeatthis measurement as many times as target videos,
i.e.,12,168 times. As in the case of the live-transcoding
CPUcurves, we only depict in Figure 6 the PSNR curvescorresponding
to one example. We provide the remainingset of curves in the public
site hosting the dataset.
The second scenario, shown on bottom of the Figure 7,refers to
the situation when a target (transcoded) video ata given resolution
need to be upscaled to be watched on adisplay with a higher size.
This up-scaling introduces apenalty on the final QoE for the
viewer. To estimate thesepenalties, we carry out a new battery of
transcodingoperations, using the same ffmpeg command as before,
butthe input and output video are the target and the upscaledvideo,
respectively. The upscaled video is comparedagainst a reference
with an encoding rate of 3,000 kbps butwith the same resolution as
the upscaled target. Thepenalty, using the example of up-scaling
from 360p to 720pin Figure 7, can simply be computed by subtracting
fromthe PSNR measure on top of the Figure the PSNRmeasurement on
the bottom. In Figure 8, we depict theup-scaling penalties for a
224p source of type movie.
500 1000 1500 2000 2500
0.6
0.8
1
1.2
1.4
1.6
Target rate (in kbps)
CP
U(i
nG
Hz)
224p
360p
720p
1080p
Figure 5: Transcoding CPU curves. Source: 1080p,2,750 kbps, type
movie. Target resolutions in the legend.
500 1000 1500 2000 2500
30
32
34
36
38
40
Target rate (in kbps)
PSN
R(i
ndB
)
1080p
720p
360p
224p
Figure 6: Transcoding QoE curves. Source: 1080p,2,750 kbps, type
movie. Target resolutions in the legend.
7https://www.ffmpeg.org/ffmpeg-filters.html#psnr
source360p
3 Mbps
source720p
3 Mbps
target360p
1.6 Mbps
upscaled720p
1.6 Mbps
estimating
QoE
estimating
QoE
up-scaling
Figure 7: Estimating the QoE for a target video. On top,a video
at 360p and 1.6 Mbps. On the bottom, the samevideo upscaled to be
watched on a 720p display.
500 1000 1500 2000 2500 300012108642
Rate (in kbps)
PSN
R(i
ndB
)
360p
720p
1080p
Figure 8: Up-scaling penalties curves. Sources: rate in x-axis,
224p, type movie. Target resolutions in the legend.Target rates
equal to source rates
3.3 The ViewersFinally, we need a dataset that captures the
characteristics of a real population of viewers, and
inparticular its heterogeneity.
This dataset comes from [4]. Since the Twitch API doesnot
provide any information about the viewers (neithertheir geographic
positions, nor the devices and the networkconnections), we need
real data to set the download ratesfor the population of viewers.
The dataset presented in [4]gives multiple measurements over a
large number of30s-long DASH sessions from thousands of
geographicallydistributed IP addresses. From their measurements,
weinfer the download rate of each IP address for every chunkof the
session and thus obtain 500,000 samples of realisticdownload rates.
After filtering this set to remove outliers,we randomly associate
one download rate to one of theviewers watching a channel from the
Twitch datasetsnapshot.
4. CURRENT INDUSTRIAL STRATEGIESTodays live service provider
have to implement a strategy
for stream preparation. To the best of our knowledge, noprovider
has yet implemented an optimal strategy. Typicallyone of the
following two options is implemented. In thefirst one, used by
Twitch, ABR streaming is only offered tosome premium broadcasters.
That is, only a small subsetof channels is transcoded into multiple
representations. Forthe other broadcasters, the raw video is
forwarded to theviewers without transcoding. The problem of this
solutionis that many viewers of standard broadcasters cannot
watchthe stream because their downloading rate is too low.
Thisproblem has been recently discussed in [15].
-
The second option consists in delivering all channels withABR
streaming. This is the option we study in the paper.To the best of
our knowledge, the live streaming providersapply the same
transcoder settings for all channels althoughit has been shown in
[22] that such a strategy is sub-optimal.In this paper, we consider
two possible strategies.
Full-Cover Strategy. This corresponds to a strategywith one
representation for each resolution smaller than orequal to the
source resolution. The bit-rate is chosen suchas to be the lowest
possible for the former resolution(100 kbps for low resolutions and
1000 kbps for the highones). With this strategy, viewers with low
bandwidthconnections and display sizes smaller than or equal to
thesource resolution are guaranteed to find one
representation.Moreover, since the CPU requirements are low for
lowbit-rates, this strategy is the least CPU-hungry
possiblestrategy (among the strategies with at least
onerepresentation per resolution).
Zencoder Strategy. We follow here therecommendations of one of
the main cloud transcodingproviders, namely Zencoder. The
recommendations aregiven on their public website.8 We give in Table
3 thecharacteristics of the set of representations. Again,
onlyrepresentations with a bit-rate and a resolution smallerthan or
equal to the video source are produced.
Video Resolution Bit-rates (in kbps)
224p 200, 400, 600360p 1000, 1500720p 20001080p 2750
Table 3: Zencoder encoding recommendations for livestreaming
(adapted to our bit-rate ranges).
5. OPTIMIZING STREAM PREPARATIONWe first address the problem of
live video stream
preparation with an optimization approach. As previouslysaid,
the preparation includes both the decision about theencoding
parameters of the video representations and theassignment of
transcoding jobs to the machines. Our goalis to maximize the QoE of
viewers subject to theavailability of hardware resources. In the
following we firstprovide a formal formulation of the problem, and
then wepresent the ILP model that we use to solve theoptimization
problem. Finally, we compare theperformance of the
industry-standard strategies with theoptimal.
5.1 NotationsLet I be the set of raw video streams encoded at
the
broadcaster side. Each video stream i I is characterizedby a
type of video content vi V, an encoding bit-rateri R and a spatial
resolution si S, where V, R and Sare the sets of video types, the
set of encoding bit-rates (inkbps) and the set of spatial
resolutions, respectively. Wehave shown in Section 3.1 the
diversity of raw videos.
Let O be the set of the possible video representations thatare
generated from the source by transcoding jobs. Each
8http://zencoder.com/en/hls-guide
Name Description
fiou R+ QoE level for representation o transcoded fromstream i
watched on a display of size su
fio R+ QoE level for representation o transcoded fromstream i if
display size su matches resolution so
qou R+ Penalty due to up-scaling from resolution so todisplay
size su
diu Z+ Number of viewers of type u watching a stream iro R+
Value in kbps of the bit-rate of the representation ocu R+
Connection capacity in kbps of viewer type uvu V Video stream
requested by viewer type usu S Display size (spatial resolution) of
viewer type uN Z+ Overall number of viewersR [0, 1] Minimum
fraction of viewers that must be servedpio R+ CPU requirement to
perform the live transcoding
from stream i to representation o in GHz
Pm R+ CPU capacity of a machine m in GHz
Table 4: ILP notation.
representation o O corresponds to a triple (vo, ro, so),that is,
to a video representation of type of content vo Vencoded at the
resolution so S and at the bit-rate ro R.
Let M be the set of physical machines where thetranscoding tasks
should be performed. Each machinem M can accommodate transcoding
jobs up to amaximum CPU load of Pm GHz.
To reduce the size of the problem, and make it moretractable, we
introduce the notion of viewer type. Let U bethe set of viewers
types. All viewers in a given viewer typeu U have the same display
resolution (i.e., the spatialresolution at which the video is
displayed on the device)su S, request the same video type vu V, and
use anInternet connection with the same bandwidth of cu
kbps.However, viewers of the same type u watch differentchannels.
We denote by diu the number of viewers of typeu watching a given
channel i. Note that a viewer of a giventype u can play segments
encoded at resolutions lower thanits display size su by performing
spatial up-sampling beforerendering.
A viewer from viewer type u watching a videorepresentation o
transcoded from a stream i experiences aQoE level of fiou, which is
an increasing function of thebit-rate ro. Based on the dataset
presented in Section 3.2,we know that the QoE function depends on
the videocontent type vo, the resolution so and the original
rawvideo stream i. As previously said, the QoE level fiou
alsodepends on whether the video should be up-scaled or not,since
up-scaling introduces a penalty on the final QoEvalue. We
incorporate this up-scaling penalty into the QoEcomputation by the
following definition of fiou:
fiou =
{fio, if so = sufio qou, if so < su i I, o O, u U (1)
where fio is the QoE level when the display resolution andthe
target video resolution match, and qou is the penalty ofthe
up-scaling process from resolution so to the viewerdisplay size su.
Table 4 summarizes the notation usedthroughout the paper.
-
Integer Linear Programming formulation
max{,}
iI
oO
uU
fiou iou (2a)
s.t. iou N mM
iom, i I, o O, u U (2b)mM
iom uU
iou, i I, o O (2c)
iou {N, if su so0, otherwise
i I, o O, u U (2d)oO
iou diu, i I, u U (2e)iI
oO
(ro cu
) iou 0, u U (2f)iI
oO
uU
iou R N, (2g)
iom
1, if (vi = vo &si = so &bi > bo)
(vi = vo &si > so &bi bo)
0, otherwise
i I, o O,m M (2h)
mM
iom 1, i I, o O (2i)iI
oO
pio iom Pm, m M (2j)
iou [0, N ], i I, o O, u U (2k)iom {0, 1}, i I, o O,m M (2l)
5.2 ILP ModelWe now describe the ILP. The decision variables in
the
model are:iou Z0 : Number of viewers of type u watching a
representation o transcoded from a stream i.
iom =
1, if machine m transcodes stream i intorepresentation o0,
otherwise.With these definitions, the optimization problem can
be
formulated as shown in (2).The objective function (2a) maximizes
the average
viewer QoE. The constraints (2b) and (2c) set up aconsistent
relation between the decision variables and .The constraint (2d)
establishes that a viewer of type u canplay only the transcoded
representations o with spatialresolutions equal or smaller than the
viewer display size su,that is, those susceptible to experience an
up-samplingoperations at the rendering. The constraints (2e)
ensuresthat the sum of all the viewers of type u watching
anyrepresentation o transcoded from a given stream i does notexceed
the number of viewers of type u originally watchingthe stream i.
The constraint (2f) limits the viewer linkcapacity. The constraint
(2g) force us to serve at least acertain fraction R of viewers. The
constraint (2h) forcesthat only transcoding operations defined over
the samevideo content type are allowed and it forbids
senselesstranscoding operations, like transcoding to higher
bit-rates
or higher resolutions or transcoding to the samerate-resolution
pair. The constraint (2i) guarantees that agiven transcoding task
(i, o) is performed in one uniquemachine m. Finally, (2j) sets the
CPU capacity of eachmachine m.
5.3 Settings for Performance EvaluationTo find the exact
solution of the optimal problem, we use
the generic solver IBM ILOG CPLEX [10] on a set ofinstances.
Unfortunately, this approach does not allowsolving instances as
large as the ones that live serviceproviders face today. Thus, we
have built probleminstances based on the datasets introduced in
Section 3 butof a smaller size.
Incoming Videos from Broadcasters. We restrict thesize of the
set of sources by picking only the 50 mostpopular channels from the
Twitch dataset (see Section 3.1).More precisely, we take 66
snapshots from the dataset,corresponding to those ones extracted
every 4 hours along11 days since April, 10th 2014 at 00:00. For
each snapshot,we use the channel information (bit-rate and
resolution),which we modify slightly to match the spatial
resolutionsand bit-rates from Table 6. Each channel is
randomlyassigned to one of the four video types given in Table
2.
QoE for Target Videos. We use the dataset presented inSection
3.2 to obtain the QoE (estimated as a PSNR score)fio of a target
video o obtained from transcoding a source i.The up-scaling
penalties qou are fixed using PSNR measuresfrom the situation shown
on bottom of the Figure 7 (targetresolution lower than display
one).
CPU for the Transcoding Tasks. Still to reduce the sizeof the
instances, and thus the complexity of the problem, wefit an
exponential function to the set of CPU measurements:
p = a rb (3)where p is the number of GHz required to transcode
asource into a target, a and b are the parameters used in thecurve
fitting and the parameter r is the bit-rate in Mbps ofthe target
video. The values of the parameters a and bdepend on (i) the source
video (content type, bit-rate andresolution), and (ii) the
resolution of the target video. Thefitting curves are identified by
continuous lines in Figure 5.Table 5 gives the parameters a and b
used in the curvesshown in Figure 5.
Target Resol a b
224p 0.673091 0.024642360p 0.827912 0.033306720p 1.341512
0.0602221080p 1.547002 0.080571
Table 5: Parameters of the fitting model of the transcodingCPU
curves. Source stream: 1080p, 2,750 kbps, movie
Viewers. The viewers set U is based on the dataset [4]presented
in Section 3.3. However, the number of viewersis too large and we
implement the concept of user type. Tobuild the types, we divide
the range of bandwidth intobins, whose limits are selected so that
each bin contains anequal number of viewers. A viewer type
corresponds to abin, with a display spatial resolution set
according to thelower bandwidth in the bin, and the downloading
rate of
-
the viewer type is equal to the lower bandwidth in the bin.The
number of viewers diu watching a raw video i isproportionally set
up according to the popularity of thechannel in the Twitch
dataset.
5.4 Numerical ResultsWe now show the results of our analysis. We
use the
previous settings and we also fix the CPU capacity Pm ofall the
machines to 2.8 GHz, the speed clock of thephysical processors used
by the amazon cloud computingC3 instances9. Our motivation is to
determine how farfrom the optimal are current industry-standard
strategies.In Figure 9, we represent the average QoE, expressed
asthe PSNR in dB, as a function of the number of machines.The line
represents the results obtained from solving theoptimization
problem with CPLEX. We show with greypins the results for both
industry-standard strategies. Theresults are the average over all
the snapshots we took fromthe Twitch datasets.
We first emphasize that the amount of hardwareresources in the
data-center has a significant impact on theQoE for the viewers. The
difference of PSNR reaches 4 dBbetween 10 and 100 machines. This
remark mattersbecause it highlights the need of being able to
reserve theright amount of resources in the data-center. However,
theability to forecast the load and to reserve the resources isnot
trivial for elastic live streaming services such asTwitch.
Our second main observation is that, on our datasets,the
Full-Cover strategy is more efficient than the Zencoderone in terms
of trade-off QoE-CPU. The Full-Coverstrategy is close to the
optimal, and thus represents anefficient implementation with
respect to its simplicity.Note however that Full-Cover needs 48
machines, whilethere exists a solution with the same QoE but with
only 35machines. Therefore, a significant reduction of resources
toreserve can be obtained. The Zencoder strategy isoutperformed by
the Full-cover one, as it consumes nearlytwice the CPU cycles for a
tiny increase of the QoE. For asimilar amount of CPU, the QoE gap
between theZencoder strategy and the optimal is more than 0.9
dB,which is significant.
20 40 60 80 100
29
30
31
32
33
Full-CoverZencoder
number of machines
Avg.
PSN
R(i
ndB
)
Figure 9: Optimal average QoE for the viewers vs. thenumber of
machines that are used in the data-centers. The50 most popular
channels from several snapshots of theTwitch datasets are
transcoded.
9http://aws.amazon.com/ec2/instance-types/
To complete this study, we provide another view of thechoices to
be taken in Figure 10. Here, we show the ratioof served users and
the amount of delivery bandwidthrequired to serve the users. In our
ILP, we optimize theaverage QoE so the solutions found by CPLEX are
notoptimal on other aspects. In Figure 10b, we see that thedelivery
bandwidth of the optimal solution is significantlyhigher than the
Full-Cover, which may annihilate the gainsobtained by using fewer
machines. Please note that bothparameters of Figure 10 can also be
the objective of theILP. In the same vein, the ILP can also be
re-written sothat the parameter to be optimized is the amount of
CPUneeded, subject to a given QoE value.
20 40 60 80 1000.9
0.95
1
Full-Cover
Zencoder
nb. machinesR
ati
oSati
sfied
Use
rs
(a) Ratio of satisfied users
20 40 60 80 1000
5
10
15
20
Full Cover
Zencoder
nb. machines
Delivery
Bandw
idth
(in
Mbps)
(b) Delivery Bandwidth
Figure 10: Other views on the optimal solution of Figure 9
6. A HEURISTIC ALGORITHMWe now present and evaluate an algorithm
for massive
live streaming preparation. Our goal is to design a fast,light,
adaptive algorithm, which can be implemented inproduction
environments. This algorithm should inparticular be able to absorb
the variations of the streamingservice demand while using a fixed
data-centerinfrastructure.
6.1 Algorithm DescriptionThe purpose of the algorithm is to
update the set of
transcoded representations with respect to thecharacteristics
and the popularity of the incoming rawvideos. The algorithm is
executed on a regular basis (forexample every five minutes to stick
to the Twitch API) bythe live streaming service provider in charge
of thedata-center. You can find in Appendix B the pseudo-codeof the
algorithms and some additional details.
Algorithm in a Nutshell. We process each channeliteratively in a
decreasing order of their popularity. For agiven channel, the
algorithm has two phases: First, wedecide a CPU budget for this
channel. Second, wedetermine a set of representations with respect
to the CPUbudget computed during the first phase.
Set a CPU Budget Per Channel. We base ouralgorithm on the
observations of the optimal solutionsfound by CPLEX. Four main
observations are illustrated inFigure 14 in the Appendix: (i) the
ratio of the overall CPUbudget of a given channel is roughly
proportional to theratio of viewers watching this channel; (ii) the
CPU budgetper channel is less than 10 GHz; (iii) some video
types
-
(e.g., sport) require more CPU budget than others
(e.g.,cartoon); and (iv) the higher is the resolution of thesource,
the bigger the CPU budget.
We derive from these four observations the algorithmshown in
Algorithm 1. We start with the most popularchannel. We first set a
nominal CPU budget according tothe ratio of viewers and the maximal
allowable budget.Then we adjust this nominal budget based on a
video typeweight and a resolution weight (interested readers can
findin the Appendix details about the values we chose for
theseweights, based on observations from the optimal solutions).We
obtain a CPU budget for this channel.
Decide the Representations for a Given CPUBudget. The
pseudo-code is detailed in Algorithm 2.This algorithm builds the
set of representations byiteratively adding the best
representation. At each step,the needed CPU budget to transcode the
chosenrepresentation should not exhaust the remaining
channelbudget. To decide among the possible representations, weneed
to estimate the QoE gain that every possiblerepresentation can
provide if it is chosen. To do so, weestimate the assignment
between the representations andthe viewers in Algorithm 3. In
short, this algorithmrequires a basic knowledge on the distribution
ofdownloading rates in the population of viewers, which isusually
the case for service providers. (In this work, westudy the worst
case scenario where service providers donot have this information
and they assume a uniformdistribution in the range between 100 kbps
and3,000 kbps). The idea is then to assign subsets of thepopulation
to representations and to evaluate the overallQoE. Details are
given in the Appendix.
6.2 Simulation SettingsOur simulator is based on the datasets
presented in
Section 3 and the extra settings given in Section 5.3.However,
in contrast to the ILP, our heuristic is expectedto scale.
Therefore we evaluate the heuristic and theaforementioned
industry-standard strategies on thecomplete dataset containing all
online broadcasters at eachsnapshot. Regarding the viewers, we
consider now eachviewer, to which we randomly assign a bandwidth
value aspresented in Section 3.3 and a display
resolutionaccordingly. We use the actual number of viewers
watchingchannels according to the Twitch dataset.
Please note also that we focus here on the decision aboutthe
representations (number of representations andtranscoding
parameters), and we neglect the assignment oftranscoding jobs to
machines. This does not impact theevaluation since all tested
strategies (our heuristic andboth industry-standard strategies) can
be evaluatedwithout regard to this assignment. We let for future
worksthe integration of the VM assignment policy intomiddleware
such as OpenStack.10
6.3 Performance EvaluationsIn the following, we present the same
set of results from
two different perspectives; first, we show how theperformances
evolve throughout the eleven days weconsider. Then, we present the
results in order to highlightthe main features of the
algorithms.
10http://www.openstack.org/
In Figure 11 we show the three main metrics during our11-days
dataset. The combination of the three figuresreveals the main
characteristics of the strategies. The mainpoint we would like to
highlight is that our heuristic keepsa relatively constant, low CPU
consumption withoutregard to the traffic load in input. Our
heuristic alsosucceeds in maintaining a high QoE. To achieve
thisexcellent trade-off, our heuristic adjusts with the ratio
ofserved viewers. Yet, this ratio maintains a high value sinceit is
always greater than 95%. Our heuristic thusdemonstrates the
benefits from having differentrepresentation sets for the different
channels according totheir popularity. The industry-standard
strategies are lesscapable of absorbing the changing demand. In
particular,the CPU needs of the Zencoder strategy ranges from1,000
GHz to 18,000 GHz while the average QoE is alwayslower than for our
heuristic.
To highlight the relationship between CPU needs and QoEfor the
population, we represent in Figure 12 a cloud ofpoints for each
snapshot. The Full-Cover has most points inthe southwest area of
the Figure, which corresponds to a lowCPU utilization but also a
low QoE. We also note that thedistance between two points can be
high, which emphasizesan inability to absorb load variations. This
inability is evenstronger for the Zencoder strategy, for which the
points arefar from each other, covering all areas of the Figure.
Onthe contrary, our heuristic absorbs well elastic services,
withpoints that are concentrated in the northwest part of
theFigure, which means low CPU and high QoE.
7. CONCLUSIONSThis paper studies the management of new live
adaptive
streaming services in the cloud from the point of view
ofstreaming providers using cloud computing platforms. Allthe
simulations conducted in the paper make use of realdata from three
datasets covering all the actors in thesystem. The study is focused
on the interactions betweenthe optimal video encoding parameters,
the available CPUresources and the QoE perceived by the
end-viewers. Weuse an ILP to model this system and we compare
itsoptimal solution to current industry-standard
solutions,highlighting the gap between the two. Due to the
ILPcomputational limitations, we propose a practicalalgorithm to
solve problems of real size, thanks to keyinsights gathered from
the optimal solution. Thisalgorithm finds representations beating
theindustry-standard approaches in terms of the trade-offbetween
viewers QoE and CPU resources needed.Furthermore, it uses an
almost-constant amount ofcomputing resources even in the presence
of a time varyingdemand.
8. REFERENCES[1] Twitch.tv: 2013 retrospective, Jan. 2014.
http://twitch.tv/year/2013.
[2] M. Adler, R. K. Sitaraman, and H. Venkataramani.Algorithms
for optimizing the bandwidth cost ofcontent delivery. Computer
Networks,55(18):40074020, 2011.
[3] L. A. Barroso, J. Clidaras, and U. Holzle. TheDatacenter as
a Computer: An Introduction to theDesign of Warehouse-Scale
Machines. Morgan &Claypool, 2013. Second Edition.
-
Full-Cover Zencoder Heuristic
1 2 3 4 5 6 7 8 9 10 11
31.5
32
32.5
33
Days
Avg.ViewerPSNR(DB)
(a) Average Viewer PSNR
1 2 3 4 5 6 7 8 9 10 110.9
0.95
1
Days
Ratio
SatisfiedViewers
(b) Ratio of satisfied viewers
1 2 3 4 5 6 7 8 9 10 110
5 000
10 000
15 000
20 000
Days
TotalCPU(GHz)
(c) Total CPU needs
Figure 11: Different metric results over time for the distinct
solutions: Full-Cover strategy, Zencoder encodingrecommendations
and our Heuristic.
Satisfied Viewers Ratio: 95-96% 97-98% 99-100%
0 5 000 10 000 15 000 20 00031
31.5
32
32.5
33
Total CPU (GHz)
Avg.PSNR(dB)
(a) Full-Cover
0 5 000 10 000 15 000 20 00031
31.5
32
32.5
33
Total CPU (GHz)
(b) Heuristic
0 5 000 10 000 15 000 20 00031
31.5
32
32.5
33
Total CPU (GHz)
(c) Zencoder
Figure 12: Total required CPU (in GHz) by the perceived QoE for
strategies Full-Cover, Zencoder encoding recommendationsand
Heuristic. The markes shape indicates the percentage of satisfied
viewers. Each point corresponds to one of the 66snapshots.
[4] S. Basso, A. Servetti, E. Masala, and J. C. D.
Martin.Measuring DASH streaming performance from the endusers
perspective using neubot. In ACM MMSys, 2014.
[5] N. Bouzakaria, C. Concolato, and J. L. Feuvre.Overhead and
performance of low latency livestreaming using MPEG-DASH. In IEEE
IISA, 2014.
[6] R. Cheng, W. Wu, Y. Lou, and Y. Chen. Acloud-based
transcoding framework for real-timemobile video conferencing
system. In IEEEMobileCloud, 2014.
[7] J. Guo and L. Bhuyan. Load balancing in acluster-based web
server for multimedia applications.IEEE Transactions on Parallel
and DistributedSystems, 17(11):13211334, Nov. 2006.
[8] Z. He, Y. Liang, L. Chen, I. Ahmad, and D.
Wu.Power-rate-distortion analysis for wireless videocommunication
under energy constraints. IEEETransactions on Circuits and Systems
for VideoTechnology, 15(5):645658, 2005.
[9] Z. Huang, C. Mei, L. Li, and T. Woo. CloudStream:Delivering
high-quality streaming videos through acloud-based SVC proxy. In
IEEE INFOCOM, 2011.
[10] IBM. Ilog cplex optimization
studio.http://is.gd/3GGOFp.
[11] F. Jokhio, A. Ashraf, S. Lafond, I. Porres, and
J. Lilius. Prediction-based dynamic resource allocationfor video
transcoding in cloud computing. InEuromicro PDP, 2013.
[12] F. Lao, X. Zhang, and Z. Guo. Parallelizing
videotranscoding using map-reduce-based cloud computing.In IEEE
ISCAS, 2012.
[13] S. Lin, X. Zhang, Q. Yu, H. Qi, and S. Ma.Parallelizing
video transcoding with load balancing oncloud computing. In IEEE
ISCAS, 2013.
[14] J. Liu, G. Simon, C. Rosenberg, and G. Texier.Optimal
delivery of rate-adaptive streams inunderprovisioned networks. IEEE
Journal on SelectedAreas in Communications, 2014.
[15] K. Pires and G. Simon. DASH in Twitch: AdaptiveBitrate
Streaming in Live Game Streaming Platforms.In ACM VideoNext Conext
Workshop, 2014.
[16] C. Reano Gonzalez. CU2rCU: A CUDA-to-rCUDAConverter.
Masters thesis, Universitat Politecnica deValencia, 2013.
[17] H. Reddy. Adapt Or Die: Why Pay-TV OperatorsMust Evolve
Their Video Architectures, July 2014.Videonet white paper.
[18] J. Reichel, H. Schwarz, and M. Wien. Joint scalablevideo
model 11 (jsvm 11). Joint Video Team, Doc.JVT-X202, 2007.
-
[19] R. Shea and J. Liu. On GPU pass-throughperformance for
cloud gaming: Experiments andanalysis. In ACM Netgames, 2013.
[20] H. Sohn, H. Yoo, W. De Neve, C. S. Kim, and Y.-M.Ro.
Full-reference video quality metric for fullyscalable and mobile
svc content. IEEE Transactionson Broadcasting, 56(3):269280, Sept
2010.
[21] L. Su, Y. Lu, F. Wu, S. Li, and W.
Gao.Complexity-constrained h.264 video encoding. IEEETransactions
on Circuits and Systems for VideoTechnology, 19(4):477490, Apr.
2009.
[22] L. Toni, R. Aparicio-Pardo, G. Simon, A. Blanc, andP.
Frossard. Optimal set of video representations inadaptive
streaming. In ACM MMSys, 2014.
[23] Z. Wang, L. Sun, C. Wu, W. Zhu, and S. Yang. Jointonline
transcoding and geo-distributed delivery fordynamic adaptive
streaming. In IEEE INFOCOM,2014.
[24] XIPH. :xiph.org video test media.
[25] M. Yang, J. Cai, Y. Wen, and C. H.
Foh.Complexity-rate-distortion evaluation of videoencoding for
cloud media computing. In IEEE ICON,2011.
APPENDIXA. TRANSCODING CPU USAGE
In this Appendix we give more details about theexperiments that
we have used to estimate the CPU usageof different transcoding
operations.
A.1 Input and Output RatesAs discussed in Section 3.2, we
consider only the rates
covering 90% of the sources that we observe in the
Twitchdataset, as detailed in Table 6. For low resolutions (224pand
360p), the set of bit-rates ranges from 100 kbps up to3000 kbps
with steps of 100 kbps, while for high resolutions(720p and 1080p),
the set of bit-rates ranges from1000 kbps up to 3000 kbps with
steps of 250 kbps. Thus,each video sequence from Table 2 can be
encoded into 78different combinations of rates and resolutions. To
obtainthese 78 sources, we took the original full-quality
decodedvideos from Table 2 and we encoded them into each of the78
videos that we consider as possible raw videos.
Resol. Width x Height MinMax Rates Rate Steps
224p 400 x 224 1003000 kbps 100 kbps360p 640 x 360 1003000 kbps
100 kbps720p 1280 x 720 10003000 kbps 250 kbps1080p 1920 x 1080
10003000 kbps 250 kbps
Table 6: Resolutions and ranges of rates for the raw videos.
A.2 TranscodingThe transcoding operation that we have performed
is
summarized in Figure 13. This operation has been done12,168
times in total. This corresponds to 4 (the number ofvideo types)
multiplied by 78 (the number of possiblesources) multiplied by 39
(the average number of possibletarget videos). Recall that, for
each input video, weproduced only videos with resolutions and
bit-rates lowerthan or equal to those of the input. That is, only a
subset
of the 78 possible representations are created from a givenraw
video.
We have used ffmpeg for the transcoding with the sameparameters
as in [5], which is a study conducted by theleading developers of
the popular GPAC video encoder.The command is
f fmpeg i source name vcodec l i b x 2 6 4 p r e s e tu l t r a
f a s t tune z e r o l a t e n c y
s t a r g e t r e s o l u t i o n r 30 b t a r g e t r a t e
anta rge t name
originalvideoyuv
source720p
2.25 Mbps
target360p
1.6 Mbps
encoding transcoding
measure CPU cycles
Figure 13: Measuring the CPU cycles for the transcodingof any
source to any target video. Here an example with asource at 720p
and 2.25 Mbps and a target video at 360pand 1.6 Mbps.
Measuring CPU cycles. As discussed above and inSection 3.2, we
have run many transcoding operationsusing an Intel Xeon CPU E5640
at 2.67GHz with 24 GB ofRAM using Linux 3.2 with Ubuntu 12.04. To
measure thenumber of used CPU cycles, we use the perf tool11,
aprofiler for Linux 2.6+ based systems, with the nextcommand:
p e r f s t a t x e CYCLES
This command provides access to the counter collectingthe number
of CPU cycles at the Performance MonitoringUnit (PMU) of the
processor. Then, this number is dividedby the duration in s of the
video sequences to obtain thefrequency of CPU (in GHz) required to
perform thetranscoding during a running time equal to the play
timeof the video, that is, the frequency of CPU required to do
alive transcoding.
B. HEURISTIC
B.1 SettingsCPU Budget Per Channel. The values we used forboth
video type weight and resolution weight are given inTables 7. These
values correspond the average of thecurves plotted in Figure 14,
which we obtain from theoptimal solutions computed by CPLEX on the
50 mostpopular channels and 100 machines. The x-axis is the rankof
the channels according to their popularity. On the topwe show the
average distribution of CPU budget perchannel. On the bottom, we
show the average differencebetween the average CPU budget when a
channel from agiven type (respectively resolution) is at a given
rank andthe average CPU budget for channels at this rank.
Thisdifference allows us to compute the adjustment of CPUbudget
according to the video type and the resolution.
11https://perf.wiki.kernel.org
-
Video Type Weight
Cartoon -0.176Documentary 0.072
Sport 0.190Video -0.076
Resolution Weight
224p -0.917360p -0.657720p -0.1081080p 0.432
Table 7: Video type and resolution weights
0.02
0.04
CPU%
0.4
0.2
0
0.2
0.4
CPUDiff.
Cartoon Documentary Sport Video
0 10 20 30 40 50
1
0.5
0
0.5
Channel Rank Position by its Popularity
CPUDiff.
224p 360p 720p 1080p
Figure 14: CPU information for a given channel popularityrank.
Data comes from the average optimal solution with100 machines. From
the average total CPU % accordingto the rank (top figure), we
derive the difference betweendistinct video types (middle figure)
and between distinctchannel input resolutions (bottom figure).
Estimation Viewers QoE. To estimate the QoE gainwhen choosing
one representation, we need to consider allthe assignments
representations-viewers. In Algorithm 3,we evaluate the
representations in the set in descendingorder of their bit-rates.
At each iteration, we identify thefraction of viewers whose
bandwidth is between the rate ofthe considered representation and
the closestrepresentation with superior bit-rate. Then, we also
haveto take into account the display sizes of the viewers (it
alsodepends on the knowledge of the service provider; we
againconsidered it a minimal knowledge of the
population).Therefore, this fraction of viewers is again split into
one ormore sub-fractions corresponding to their displayresolutions.
A different value of PSNR is then computedfor each sub-fraction.
This value is multiplied by the ratioof viewers belonging to the
sub-fraction. When all therepresentations have been assessed, the
sum of the PSNRcontributions of all the sub-fractions of viewers is
returnedas the estimated QoE of the set.
B.2 Algorithms in Pseudo-codeThis section includes the
pseudo-codes of the heuristic
described in Section 6.
Algorithm 1: Main routine
Data: channelsSet: Channels metadata (e.g. number ofviewers, id)
sorted by decreasing channel popularity.
Data: totalCPU: total CPU Budget in GHz.1 representations
emptySet()2 foreach channel channelsSet do3 cpu
channel.viewersRatio totalCPU4 cpu min(cpu,MAX CPU)5 w video getV
ideoTypeWeight(channel.video)6 w resol
getResolutionWeight(channel.resolution)7 cpu max(cpu (1 + w video +
w resol), 0)8 representations.append(findReps(channel, cpu))
9 return representations
Algorithm 2: findReps Find the channels representations
set meeting a budget
Data: channel: Channel metadata (e.g. number ofviewers, id).
Data: CPU: calculated channel CPU Budget in GHz.1
representations emptySet()2 freeCPU CPU3 repeat4 newRep false5
foreach
resolution channel.resolution resolutionsSet do6 foreach bitRate
bitRatesSet[resolution] do7 thisRep (resolution, bitRate)8 if
bitRate channel.bitRate and
thisRep / representations then9 thisRep.cpu getCPU(thisRep,
channel)10 if thisRep.cpu freeCPU then11 reps representations +
thisRep12 thisRep.qoe getQoE(reps, channel)13 if not newRep or
newRep.qoe < thisRep.qoe then14 newRep thisRep
15 if newRep then16 representations.append(newRep)17 freeCPU =
newRep.cpu18 until not newRep19 return representations
Algorithm 3: getQoE Obtain an estimation of PSNR for a
given representation set of one channel
Data: channel: Channel metadata (e.g. number ofviewers, id).
Data: repSet: Set of representations for a given channel.1
totalPSNR 02 foreach rep repSet do3 ranges
getResolutionsRanges(rep)4 foreach viewersRange ranges do5 v ratio
getV iewersRatio(viewersRange)6 v resol getV
iewersResolution(viewersRange)7 partialPSNR calcPSNR(rep, channel,
v resol)8 totalPSNR+ = v ratio partialPSNR9 return totalPSNR