Object Placement in Video Content Distribution Networks Mohammad Faraji, Kianoosh Mokhtarian Department of Electrical and Computer Engineering University of Toronto December 2011
Dec 22, 2014
Object Placement in Video Content Distribution Networks
Object Placement in Video Content Distribution Networks
Mohammad Faraji, Kianoosh Mokhtarian
Department of Electrical and Computer EngineeringUniversity of Toronto
December 2011
BackgroundBackground
8 years of video content added to YouTube every day
Terabytes a day; Petabytes a year
Trend is to further accelerate
Higher-quality video streams (currently only 10% are HD)
Content distribution infrastructure
Several datacentres around the world
User request sent to closest datacentre (DNS/HTTP redirect)
MotivationMotivation
Store video files across datacentres (DCs)
Generously replicate all videos on DCs?
Not viable
Growth of data volume >> storage cost
Good News from Measurement Studies
Good News from Measurement Studies
Popularity of video depends on geographical location
More than half of the time, only a fraction from the beginning of video is downloaded
=> Place (partial) video files in selected locations
ModelingModeling
Input: history of user requests (video v for IP address i)
Distance of i to any of datacentres?
Use an Internet Coordinate System (ICS)
Delay(i, j) = Eucledian_distance[ ICS(i), ICS(j) ]
Make tracking of requests scalable
Cluster user IPs into regions in the Eucledian space of ICS
Popularity matrix P[region, video]
Distance matrix D[region, datacentre]
Partial Video FilesPartial Video Files
First minute of video downloaded many more times
Store partial video files
More effective caching
Lower start-up delay
Partial popularity assumed independent of region
Download reports: (v, 1MB), (v, 2.3MB), (v, 0.5 MB), ...
Compress into a few entries for each video (dynamic alg)
PP[v] = (0...1MB, 100 times), (1MB...end, 50 times)
Problem StatementProblem Statement
Assign (part of) each video to one or more DC
Minimize distance of video to user (region), given:
The distance matrix D[region, datacentre]
The expected download pattern P[region, video]
Partial popularity PP[video]
The storage limitation of each DC
Problem HardnessProblem Hardness
Simpler alternatives
Store one video file on a few selected DCs
NP-Complete (min set cover, max coverage)
Store multiple video files on one DC
NP-Complete (knapsack)
SolutionSolution
Maintain a utility matirx U[v, d]
Utility of replicating "the next chunk of" video v on DC d
Auxiliary priority queues
1. Find the highest-utility video v*:
2. Place the next chunk of v* on the best DC d*
3. Update row v* of U, and what the next chunk is for v*
Complexity: O[ (total video replicas) x
(log[# videos] + log[# DCs] + log[max
chunks/video]) ]
Evaluation (in Progress): DataEvaluation (in Progress): Data
File size and length of ~200K videos from [Cheng 2010]
Distances in Internet
Pairwise delay between 2500 nodes from [Wong 2005]
Video popularities
Global: Zipf-distributed (as repeatedly reported)
Local: synthetic
Partial video popularities
Generated according to [Qiu 2010]
Evaluation (in Progress): Results
Evaluation (in Progress): Results
Total delay, given our placement
Delay w/ and wo/ partial file storage
Comparison to simple threshold based distributed caching
Running time
Estimated communication overhead
Take-AwayTake-Away
Benefits of storing partial video files on selected DCs
Future work
Sevral further details for a complete working system ...
Low-overhead collection of (sub-samples of) downloads
Estimate near-future download patterns
Carefully cluster users in a limited num of regions
Solving video placement by multiple nodes
Incremental algorithm; can't shuffle everything every night
Appendix: Previous WorksAppendix: Previous Works
Cooperative web caching
Hierarchical, distributed, hybrid
CDN design (various flavors)
Video caching
On a single cache
To optimize for VCR-like functions