ARM & Disaggregated Rack: Facebook’s approach to smaller processors
Jason Taylor, PhD
Director, Capacity Engineering & Analysis
Agenda
1 Facebook Scale & Infrastructure
2 Mobile Processors
3 Disaggregated Rack
82
%
of users are
outside of
the U.S
4 domestic regions today. Europe region will come online later this year.
Facebook Scale
Facebook Stats
• 1 billion users
• 350+ million photos added per day
• 4.2 billion likes, posts and comments per day
• 140+ billion friend connections
• 240+ billion photos
• 17 billion check-ins
Cost and Efficiency
•From our 10-Q filed with the SEC in October 2012:
• “The first nine months of 2012 ... $1.0 billion for capital expenditures
related to the purchase of servers, networking equipment, storage
infrastructure, and the construction of data centers.”
•At this size, we spend a lot of time thinking about efficiency
and costs.
Architecture
Service Cluster Back-End Cluster
Front-End Cluster
Web 250 racks
Ads 30 racks
Cache (~144TB)
Search Photos Msg Others UDB ADS-DB Tao Leader
Multifeed 9 racks
Other small services
Lots of “vanity free” servers.
Multifeed rack
• The rack is our unit of capacity
• All 40 servers work together
• Leaf + agg code runs on all servers
• Leaf has most of the the RAM
• Aggregator uses most of the CPU
• Lots of network BW within the rack
Leaf Aggregator
A L
A L
A L
.
.
.
.
Front end Back
end
Life of a “hit” Front-End Back-End
Web
MC
MC
MC
MC
Ads
Database
L
Feed agg
request starts
Time
request completes
L L L L L
Standard
Systems
I
Web
III
Database
IV
Hadoop
V
Haystack
VI
Feed
CPU High
2 x E5-2670 Med
2 x X5650 Low
1 x L5630 High
2 x E5-2660
Memory Low
16GB High
144GB Medium
48GB Low
18GB High
144GB
Disk Low
250GB High IOPS
3.2 TB Flash High
12 x 3TB SATA High
12 x 3TB SATA Medium 2TB SATA
Services Web, Chat Database Hadoop Photos, Video Multifeed,
Search, Ads
Five Standard Servers
Five Server Types
Advantages:
• Volume pricing
• Re-purposing
• Easier operations - simpler repairs, drivers, DC headcount
• New servers allocated in hours rather than months
Drawbacks:
• 40 major services; 200 minor ones - not all fit perfectly
• Service needs change over time.
Agenda
1 Facebook Scale & Infrastructure
2 Mobile Processors
3 Disaggregated Rack
Server Processors
• Servers in datacenters use processors that were designed for desktop
computers.
• Intel and AMD have dominated this market with big x86 processors.
Mobile Processors
• Smaller processors for smart phones will pass two criteria by 2014:
• 64 bit instructions
•High clock speed - ~2.4 GHz
• It is now reasonable to consider ARM, Atom and even MIPS processors
for big compute jobs.
Compute Power
Cores Required
Watts Required
The Problem
• Big processors provide a cost advantage by amortizing fixed costs in the
servers.
• If all other costs remain the same then wimpy cores (ARM, MIPS, Atom)
will effectively triple the price of fixed resources:
•Rack, chassis, disk, RAM, NIC, etc.
Our Solution: Group Hug
•Facebook is driving a solutions through the Open Compute initiative:
•Group Hug server board:
•Allows up to 10 individual compute boards.
•Single Processor PCIE-like cards
•A 1GB interfaces mux’ed up to a 10GB NIC
•No drives, flash, or prehephrials
• ==> 3 to 5x the processors compared to a dual-socket system
• ==> About the same throughput and power.
Agenda
1 Facebook Scale & Infrastructure
2 Mobile Processors
3 Disaggregated Rack
Disaggregated Rack Challenge
•Can we build hardware that will fit more services and still do
well in terms of serviceability and cost?
•Can we build hardware that will grow with services over time?
•What might it look like to support Group Hug?
Server/Service Fit - across services
TYPE-6 server
CPU
Other Service A
RAM
MultiFeed
CPU
RAM
WASTED CPU RESOURCE
TYPE-6 server
Server/Service Fit - over time
TYPE-6 server
CPU
Year 2 - more CPU needed
RAM
Year 1
CPU
RAM
NOT ENOUGH CPU
TYPE-6 server
Building blocks:
• CPU
• RAM (key/value pairs)
• Disk IOPS
• Disk space
• Flash IOPS
• Flash space
Common resource pairs:
• CPU vs RAM
• RAM vs Disk IOPS
• RAM vs Flash IOPS
Growth resources:
• RAM
• Disk space
• Flash space
In-Rack Resources
Disaggregated Rack
How can we build hardware that is highly configurable
and re-configurable but still cost effective?
A rack of multifeed servers...
COMPUTE
RAM
STORAGE
Type-6 Server
Network Switch
Type-6 Server
Type-6 Server
Type-6 Server
=>
40 Feed servers per rack each server with: 2 x E5-2660 144GB RAM 2TB hard drives 760GB of flash * We assume full line-rate network within the rack.
5.8 TB
80 TB
.
.
.
FLASH 30 TB
Type-6 Server
80 processors 640 cores
Compute • Standard Server
• 2 processors
• 8 or 16 DIMM slots
• no hard drive - small flash boot
partition.
• big NIC - 10 Gbps or more
• Group Hug
• 10 individual single-proc servers
• A few DIMMS
• no hard drive - small flash boot
partition.
• smaller NICs to 10 GBps
Ram Sled
•Hardware
• 128GB to 512GB
• compute: FPGA, ASIC, mobile processor or desktop processor
•Performance
• 450k to 1 million key/value gets/sec
•Cost
• Excluding RAM cost: $500 to $700 or a few dollars per GB
Storage Sled (Knox)
•Hardware
• 15 drives
• Replace SAS expander w/ small server
•Performance
• 3k IOPS
•Cost
• Excluding drives: $500 to $700 or less
than $0.01 per GB
Flash Sled
•Hardware
• 175GB to 18TB of flash
•Performance
• 600k IOPS
•Cost
• Excluding flash cost: $500 to
$700
NIC at 70%
utilization IOPS Capacity
1 Gbps 21k 175 GB
10 Gb 210k 1.75 TB
25 Gb 525k 4.4 TB
40 Gb 840k 7.7 TB
50 Gb 1.05M 8.8 TB
100 Gb 2.1M 17.5 TB
A disaggregated rack for graph search...
Compute
Network Switch
Compute
Storage Sled
RAM Sled
=>
.
.
Flash Sled
.
.
COMPUTE
RAM
STORAGE
3.1 TB
60 TB
FLASH 30 TB
40 processors 320 cores
20 Compute Servers 8 Flash Sleds 2 RAM Sleds 1 Storage Sled => 1:10 RAM:Flash ratio * Add 4 more flash sleds in 2014 to get to a 1:15 RAM:Flash ratio *
Facebook Colors
Facebook related Charts
and Graphic Colors
Accent colors for callouts or
background shapes
Accent Colors
Disaggregated Rack
Strengths:
• Volume pricing, serviceability, etc.
• Custom Configurations
• Hardware evolves with service
• Smarter Technology Refreshes
• Speed of Innovation
Potential issues:
• Physical changes required
• Interface overhead
Questions?