THOMSON REUTERS WEBCASTING IN THE CLOUD MULTIMEDIA SOLUTIONS SIMON BALL, THOMSON REUTERS ADRIAN ROE, ID3AS 14th JUNE 2012
Jan 14, 2017
THOMSON REUTERS WEBCASTING IN THE CLOUD MULTIMEDIA SOLUTIONS
SIMON BALL, THOMSON REUTERS ADRIAN ROE, ID3AS
14th JUNE 2012
Intro to Thomson Reuters
• Multimedia Solutions is part of Corporate Services
which is part of the Financial and Risk business
segment within Thomson Reuters.
• Provides multimedia communications solutions
which address the needs of professional
communicators, including content creation, vertical
and workflow specialization, distribution and reach,
and actionable analytics.
• As the only truly global provider in the industry, we
offer a unique single vendor solution for multi-
national firms.
The Business
• “Fair Disclosure” legislation demands that:
– Companies distribute quarterly results in a timely manner
– Releases to financial markets are made available to the
public at the same time
• Webcasting is a cost-effective way of doing this
• 25,000+ live events per year
– Very spiky - 4 high-volume periods, lots of quiet ones
– Average usage is less than 5% of peak
– Around 300 concurrent events on a busy day
• Audiences in thousands
How Did We Deliver to Customers Before?
• Service Vendors
– Conversion to web stream (encoder)
– Teleconference service
• Regional encoding centers
– Manual capture from telephone device
– Encoding Hardware
Motivation for change
• Previous Platform:
– Update technology and improve quality
– Did not allow the business to scale
– Was expensive to run
• Motivators were (in order):
– Improve customer experience
– Global platform consolation
– Allow business to scale
– Reduce cost
Evaluation Process
• Buy vs. Build
– No off-the-shelf product that offered required functionality
without significant customisation
• Private data center vs. cloud-based
– Private data center lacked flexibility
– Significant up-front capex was not attractive
• Tested multiple cloud vendors
– Explicitly wanted a multi-vendor strategy
• Resilience
• Avoid lock-in
System Schematic
How Do We Do It Now?
What do we do
• Webcast
Intro to id3as
• Elastic solutions for the broadcast, multimedia and
finance sectors
• Specialist in:
– Custom solutions
• Creation of lean, innovative, high-density solutions
– Large-scale
• Going beyond “simple” website clusters of a few machines to
systems needing highly distributed compute or data
requirements, where “traditional” tools are not necessarily
appropriate
– Highly available
• No single points of failure
• Zero downtime maintenance
Technical Challenges
• Delivering quality SLA from a commodity platform
• Scalability
– On-demand management of ~1000 servers
• Resilience
– No webcast to have a single point of failure
• Support
– support of ~1000 servers distributed around the world
– Need for (simple) tools (web UI, scripts etc)
Architecture
• Lightweight Management Layer
– Distributed database, distributed application
– Across 2 or 3 servers
– Across multiple availability zones
• Encoders launched and destroyed on demand
– 2 encoders in different availability zones per webcast
– “crossed streams” for PSTN recovery
– System is self-healing
– Crashes detected almost instantly, and recovery initiated
– New encoders commissioned in < 70 seconds
– US-East Outage. We barely noticed.
Architecture (2)
• Communication with TR internal services
through simple ReST API / file transfers
– Reduces coupling between systems
– Makes future changes easy to implement
– Keep things simple!
Architecture (3)
• Choice of language important
– “Simple” websites - Java, C#, Ruby etc. are fine
– When resilience / distributed computing is important, then
these are less appropriate
• We are big fans of Erlang. Happy to talk about this later...
• Initial deployment on Windows due to audio toolchain
• Recent port to Linux platform
– Reduced costs
• Removal of “overweight” 3rd party tools allowed smaller instance
size
– Improved performance (particularly boot-time)
– ReST interface meant zero changes to other systems
Why was Amazon on the short-list
• Multiple globally-distributed locations
• They were the number one provider
• Great API capability
• Supported Windows VMs with Admin access
– Not some higher-level PaaS model
– Nothing wrong with that, but we needed custom device
driver support for the audio tool chain
• Cost was competitive
What we learnt about Cloud
• “Cloud” is an abused buzzword
• We’ve always considered Cloud to be about the
elasticity
• Some consider Cloud to be “just” virtualisation. We
don’t.
• Turned out that most vendors are not as focused on
elasticity
– And hence have significant issues if you use them in that
way
– Which was a surprise, and cost quite a lot
What we learnt about Cloud (2)
• Cost model is not as simple as we first thought
– It not just compute hours
• Need to consider network traffic, EBS data and I/O charges,
long-term S3 storage etc. etc.
– And forgetting to turn off machines in the test stack gets
expensive!
• Get Lean
– Keep software stack as small as possible
• Smaller server instances => lower CPU and EBS costs
– When running many 1000’s of hours, this really adds up
– Therefore use of large third-party products can have hidden
costs
What we learnt about Amazon
• They understand their business
– No scope for negotiation; it’s a commodity product
• Handle elasticity vastly better than other vendors
• Support model has evolved
– Premium model for enterprise customers
• Well thought through API
– And we’ve never (yet) been hit by API maintenance
windows
• Admin UI is good
– Some other vendors’ UIs are unusable for this scale
Elasticity Demo
Quick to Market
• Proof of concept – May 2010
• Funding approval August 2010
• “Full” project start October 2010
• Launch September 2011
Outcomes
• Day one:
– Improved audio quality
– Improved resiliency
– Cost reduction
– Single biggest cause of customer issues (PSTN drops) now
resolved in ~20ms
• Ongoing:
– Ability to scale business has vastly improved
– Global flexibility, ability to control from anywhere in the
world
What would we like to see from Amazon
• Ability to share AMIs across availability zones
• Commercial grade SLAs
• Support for all instance types in at least two
availability zones
• Improved usage reporting for invoice reconciliation
• More flexibility in reserved instances
• Not bothered about a common API
– Easy to adopt a new API (assuming it’s been thought
through)
– Common API restricts innovation