Top Banner
29

The Megasite: Infrastructure for Internet Scale

Nov 28, 2014

Download

Technology

goodfriday

Come hear MySpace share its experiences using Microsoft technologies to run Web applications for the most visited site on the Web. MySpace discusses its best practices for a massively scalable, federated application environment, and how it matured its deployment processes. An open Q&A session lets you pick the brains of engineers from both MySpace and Microsoft.com
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Megasite: Infrastructure for Internet Scale
Page 2: The Megasite: Infrastructure for Internet Scale

MySpace.com MegaSite v2

Aber Whitcomb – Chief Technology OfficerJim Benedetto – Vice President of TechnologyAllen Hurff – Vice President of Engineering

Page 3: The Megasite: Infrastructure for Internet Scale

Previous Myspace Scaling Landmarks

First Megasite64+ MM Registered Users38 MM Unique Users260,000 New Registered Users Per Day23 Trillion Page* Views/Month50.2% Female / 49.8% MalePrimary Age Demo: 14-34

185 M

70 M6 M1 M100K

Page 4: The Megasite: Infrastructure for Internet Scale

MySpace Company OverviewToday

As of April 2007185+ MM Registered Users

90 MM Unique Users

Demographics50.2% Female / 49.8% Male

Primary Age Demo: 14-34

Internet Rank Page views in ‘000s

MySpace #1 43,723

Yahoo #2 35,576

MSN #3 13,672

Google #4 12,476

facebook #5 12,179

AOL #6 10,609

Source: comScore Media Metrix March - 2007

Page 5: The Megasite: Infrastructure for Internet Scale

Total Pages Viewed - Last 5 Months

Source: comScore Media Metrix April 2007

Nov 2006 Dec 2006 Jan 2007 Feb 2007 Mar 20070

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

45,000

50,000

MySpaceYahooMSNGoogleEbayFacebook

MM

Page 6: The Megasite: Infrastructure for Internet Scale

Site Trends350,000 new user registrations/day

1 Billion+ total images

Millions of new images/day

Millions of songs streamed/day

4.5 Million concurrent users

Localized and launched in 14 countries

Launched China and Latin America last week

Page 7: The Megasite: Infrastructure for Internet Scale

Technical Stats

7 Datacenters

6000 Web Servers

250 Cache Servers 16gb RAM

650 Ad servers

250 DB Servers

400 Media Processing servers

7000 disks in SAN architecture

70,000 mb/s bandwidth

35,000 mb/s on CDN

Page 8: The Megasite: Infrastructure for Internet Scale

MySpace Cache

Page 9: The Megasite: Infrastructure for Internet Scale

Relay System Deployment

Typically used for caching MySpace user data.

Online status, hit counters, profiles, mail.

Provides a transparent client API for caching C# objects.

ClusteringServers divided into "Groups" of one or more "Clusters".

Clusters keep themselves up to date.

Multiple load balancing schemes based on expected load.

Heavy write environmentMust scale past 20k redundant writes per second on a 15 server redundant cluster.

Page 10: The Megasite: Infrastructure for Internet Scale

Relay SystemPlatform for middle tier messaging.

Up to 100k request messages per second per server in prod.

Purely asynchronous—no thread blocking. Concurrency and Coordination Runtime

Bulk message processing.

Custom unidirectional connection pooling.

Custom wire format.

Gzip compression for larger messages.

Data center aware.

Configurable components

Relay ServiceIRelayComponents

Berkeley DB

Non-locking Memory Buckets

Fixed Alloc Shared

Interlocked Int Storage for Hit

Counters

Message Forwarding

CCR

Message Orchestration

CCR

RelayClient

RelayClient

Socket Server

Page 11: The Megasite: Infrastructure for Internet Scale

Code Management:Team Foundation Server, Team System, Team Plain, and Team Test Edition

Page 12: The Megasite: Infrastructure for Internet Scale

Code Management

MySpace embraced Team Foundation Server and Team System during Beta 3MySpace was also one of the early beta testers of BizDev’s Team Plain (now owned by Microsoft).Team Foundation initially supported 32 MySpace developers and now supports 110 developers on it's way to over 230 developersMySpace is able to branch and shelve more effectively with TFS and Team System

Page 13: The Megasite: Infrastructure for Internet Scale

Code Management (continued)

MySpace uses Team Foundation Server as a source repository for it's .NET, C++, Flash, and Cold Fusion codebasesMySpace uses Team Plain for Product Managers and other non-development roles

Page 14: The Megasite: Infrastructure for Internet Scale

Code Management: Team Test EditionMySpace is a member of the Strategic Design

Review committee for the Team System suiteMySpace chose Team Test Edition which reduced cost and kept it’s Quality Assurance Staff on the same suite as the development teamsMySpace using MSSCCI providers and customization of Team Foundation Server (including the upcoming K2 Blackperl) was able to extend TFS to have better workflow and defect tracking based on our specific needs

Page 15: The Megasite: Infrastructure for Internet Scale

Server Farm ManagementCodespew

Page 16: The Megasite: Infrastructure for Internet Scale

CodeSpew

Maintaining consistent, always changing code base and configs across thousands of servers proved very difficultCode rolls began to take a very long timeCodeSpew – Code deployment and maintenance utility

Two tier applicationCentral management server – C#Light agent on every production server – C#

Tightly integrated with Windows Powershell

Page 17: The Megasite: Infrastructure for Internet Scale

CodeSpew

UDP out, TCP/IP inMassively parallel – able to update hundreds of servers at a time. File modifications are determined on a per server basis based on CRCsSecurity model for code deployment authorizationAble to execute remote powershell scripts across server farm

Page 18: The Megasite: Infrastructure for Internet Scale

Media Encoding/Delivery

Page 19: The Megasite: Infrastructure for Internet Scale

Media Statistics

Videos60TB storage15,000 concurrent streams60,000 new videos/day

Music25 Million songs142 TB of space250,000 concurrent streams

Images1 Billion+ images80 TB of space150,000 req/s8 Gigabits/sec

Page 20: The Megasite: Infrastructure for Internet Scale

4th Generation Media EncodingMillions of MP3, Video and Image Uploads Every Day

Ability to design custom encoding profiles (bitrate, width, height, letterbox, etc.) for a variety of deployment scenarios.

Job broker engine to maximize encoding resources and provide a level of QoS.

Abandonment of database connectivity in favor of a web service layer

XML based workflow definition to provide extensibility to the encoding engine.

Coded entirely in C#

Page 21: The Megasite: Infrastructure for Internet Scale

4th Generation Encoding Workflow

DFS 2.0

CDNFTP Server

MediaProcessor

Filmstrip for Image Review

Web Service Communication

Layer

(Any Application)

Upload

Job Broker

User Content

Thumbnails for Categorization

Page 22: The Megasite: Infrastructure for Internet Scale

MySpace Distributed File System

Page 23: The Megasite: Infrastructure for Internet Scale

MySpace Distributed File System

Provides an object-oriented file store

Scales linearly to near-infinite capacity on commodity hardware

High-throughput distribution architecture

Simple cross-platform storage API

Designed exclusively for long-tail content

Demand

Acc

ess

es

Page 24: The Megasite: Infrastructure for Internet Scale

Sledgehammer

Custom high-performance event-driven web server coreWritten in C++ as a shared libraryIntegrated content cache engineIntegrates with storage layer over HTTPCapable of more than 1Gbit/s throughput on a dual-processor hostCapable of tens of thousands of concurrent streams

Page 25: The Megasite: Infrastructure for Internet Scale

DFS Interesting Facts

DFS uses a generic “file pointer” data type for identifying files, allowing us to change URL formats and distribution mechanisms without altering data.

Compatible with traditional CDNs like Akamai

Can be scaled at any granularity, from single nodes to complete clusters

Provides a uniform method for developers to access any media content on MySpace

Page 26: The Megasite: Infrastructure for Internet Scale

Appendix

Page 27: The Megasite: Infrastructure for Internet Scale

Operational Wins

Pages/Sec0

50

100

150

200

250

300

2005 Server

Page 28: The Megasite: Infrastructure for Internet Scale

MySpace Disaster Recovery Overview

Distribute MySpace servers over 3 geographically dispersed co-location sites

Maintain presence in Los Angeles

Add a Phoenix site for active/active configuration

Add a Seattle site for active/active/active with Site Failover capability

Page 29: The Megasite: Infrastructure for Internet Scale

Distributed File System Architecture

Storage Cluster

Users

DFS Cache Daemon

BusinessLogic

Sledgehammer

Cache Engine

Server Accelerator Engine