Top Banner
TV and video on the Internet How we created a CDN to host video clips and live broadcasts.
18
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tv and video on the Internet

TV and video on the Internet

How we created a CDN to host video clips and live broadcasts.

Page 2: Tv and video on the Internet

Presentation plan

•Our motivation

•A little bit about what we wanted to achieve

•How did we do it?

– the project and objectives

– edge and origin nodes

– redirector

– traffic modeling (sounds smart, doesn’t it? )

– replication and file management

– statistics

– live-streaming

•What does already work,

•… and what doesn’t and what are the plans

Page 3: Tv and video on the Internet

Our motivation

•We’ve created a free CDN before - videosteam.pl

– which made us aware that creating such systems is not so

simple

•We were given a chance to create a large video portal - TiVi.pl

•We simply wanted to create something interesting

Page 4: Tv and video on the Internet

What we wanted to achieve

•Distribution network (edge) and storage (origin) based on ordinary PCs - if possible - keeping prices low for customers

•Effective division of traffic, matching routes to the customer's network - as much as possible

+ use of multiple datacenters

•Data replication instead of backups

+ automatic shutdown of broken servers and replacing them with new ones

•Supporting not only static files but also live broadcasts

•We wanted the service to be consistent with the standards (e.g. HTTP, WebDAV, RTMP …) – making it easier to use

Why ordinary PCs? http://www.manageability.org/blog/stuff/economics-google-hardware-infrastructure

Page 5: Tv and video on the Internet

Assumptions

•To ensure redundancy wherever possible

•To use proven software and possibly convert/adjust it

– Varnish, nginx, Apache, MogileFS

•Write only the missing components

– the more code, the more errors

– redirector, file management (WebDAV), statistics, billings

– Java, Python + WSGI – only when necessary, rewrite the

key elements in C

Page 6: Tv and video on the Internet

The project

Secure Archive of

the Origin servers

is composed of

clusters (which

may be in different

DataCenters)

equipped with a

replicated system

of files. The

clusters are

independent. The

system stores a

minimum of 2-3

copies of each

uploaded file.

Edge nodes

located in different

networks (TPSA,

PLIX, WIX…)

service the traffic

outgoing to

customers

Redirector selects

the best node

position according

to load and

network location of

a customer

Page 7: Tv and video on the Internet

Edge and origin nodes

•We decided to separate the nodes – there may be less storage nodes (origin) - traffic from/to the end customer reaches only the edge nodes that are simpler and there can be more of them

•Edge nodes act as a proxy using the processed Varnish and nginx retrieving data from the origin nodes

– we needed software which redirects the user to an operating and not overloaded node

•On the origin nodes we use nginx and MogileFS, which replicates data and automatically renews copies - doing a lot of good work,

– we needed software to manage the files for customers

•Initially, edge and origin nodes may be the same machine

Page 8: Tv and video on the Internet

Redirector

•Accepts all read requests (both files and live)

– HTTP redirection – updates faster than DNS, simpler than BGP – live broadcast operated separately in the player

•Has information of edge nodes status - state, load, bandwidth limit, bandwidth usage, system load and its placement (datacenter/ASN / network)

•Has information about customers' networks and "distance" between them (ping, hops, the amount of ASNs) from DC - knows which network a given customer comes from by IP address

– on this basis it can choose the most favorable server to the customer and redirect their question there

•It runs on a minimum of 2 nodes (+ hardware load balancer dividing traffic between redirectors), heavily uses cache and is written in python + wsgi - we have achieved approx. 2000 req/s per server

Page 9: Tv and video on the Internet

Traffic modelling

• Redirector’s main task

• For each request:

– it takes the customer's address and checks which network is the customer from

– it checks the weight/distance of the network from the particular DC and selects the best - the weights are updated every 5 minutes. By separate applications running on the nodes, additional weight to manual modeling of traffic according to network policies

– we take into account hops, route / amount of ASNs along the way (more significant only), we initially counted distances between servers, but the distance from DC is enough

– it selects a group of servers that support a particular request (livestreaming, pseudo-streaming, static files/buffered video)

– it selects the least loaded server from the group (random with weights) and provides it to the customer + caches it for less than a minute

Page 10: Tv and video on the Internet

Replication and management

• Replication is provided by MogileFS - each file must be in 2-3

replicas (different classes of replication) - if a node fails, the file is

replicated to other servers

• File management - software written in Python to provide

WebDAV interface with so-called Bridge. In the future, we would

like to add support for S3 API. The bridge mediates between the

customer and MogileFS Tracker and MogileFS HTTP interface

(nginx in our case)

Page 11: Tv and video on the Internet

Replication and management

• The Bridge runs on two servers (+ hardware load balancer) and

a MySQL base (master + several slaves)

• In addition to the redirector, the Bridge is a key element, we

test it automatically with scripts performing basic operations of

through curl immediately after the deployment (through SVN)

• The Bridge also protects files – it provides edge nodes with

information if a particular file is available (tokens, expired

customer accounts, in future also access rights to files)

Page 12: Tv and video on the Internet

Replication and management

Page 13: Tv and video on the Internet

Statistics

•There’s really a lot of logs (for the time being approx. 50 req/s per server - everything goes to access logs) - a simple map/reduce

•We collect statistics from edge servers and the Bridge - wear for customers (transfer, storage, hits), stream wear (number of emissions, number of customers)

•We had to write several small programs:

– Statistics are initially processed through logalyzer (with rotating logs) and uploaded to SimpleStorage in CSV format,

– logcollector downloads log packs and uploads them to the base in bulk,

– statscalc aggregates data in subtotals, non-aggregated data is removed over time.

•As a result – a user sees changes in statistics on a page updated hourly

Page 14: Tv and video on the Internet

Livestreaming

•The system has been running productively

•Compatible with RTMP/RTMPT/RTMPE – already created Java server with the changes (statistics) installed on all the nodes

•Traffic division done by using proxy on the level of video application (a code in Java sends a video from the server the customer is broadcasting from to target servers – we can dynamically choose servers for a given customer)

•We've added broadcaster authorization (tokens - supported by the Bridge)

Page 15: Tv and video on the Internet

Livestreaming

•… as well as watcher authorization (tokens can be handled individually by the webservice of a particular service, e.g. VoD)

•Redirector manages redirection to streaming (support was necessary in the player) - RTMP doesn’t support redirects

•It would be great to pack also in HTTP – buffering, among other things, would work then and there would be no problems with firewalls

– soon, we would like to add support for smooth-streaming and streaming on iPhones (mpeg-ts) - we can do it by transcoding on the fly (we’ve already conducted trials) or Wowza-type dedicated servers, IIS - heterogeneous environment is a disadvantage

Page 16: Tv and video on the Internet

Livestreaming

Page 17: Tv and video on the Internet

Plans for development?

• file authorization

• optimization and improving traffic modeling algorithm

• streaming through HTTP and smooth-streaming

• streaming for iPhones

• introducing more nodes – e.g. to PL-IX

• expansion and acceleration of statistics

• …

Page 18: Tv and video on the Internet

Contact Piotr Karwatka

[email protected]

THANK YOU FOR ATTENTION