Top Banner
28

JEDEC Jason Taylor 2014 October 27

Mar 15, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: JEDEC Jason Taylor 2014 October 27
Page 2: JEDEC Jason Taylor 2014 October 27

DIMM-Based Interconnect Disaggregated Rack & Ideas for an Interconnect.

Jason Taylor, PhDVP, Infrastructure Foundation

JEDEC, October 27th, 2014

Vijay RaoDirector, Technology Strategy

Page 3: JEDEC Jason Taylor 2014 October 27

Data Centers in 5 regions.

Facebook Scale

84%of monthly active users are outside of the U.S. (as of Q2’13)

Page 4: JEDEC Jason Taylor 2014 October 27

Facebook Stats

• 1.32 billion users (6/2014)

• 829 million people use Facebook daily !

• 350+ million photos added per day (1/2013)

!

• 4.5 billion likes, posts and comments per day (5/2013)

Page 5: JEDEC Jason Taylor 2014 October 27

Lots of “vanity free” servers.

Page 6: JEDEC Jason Taylor 2014 October 27

Architecture

Service Cluster Back-End Cluster

Front-End Cluster

Web

250 racks

Ads 30 racks

Cache (~144TB)

Search Photos Msg Others UDB ADS-DB Tao Leader

Multifeed 9 racks

Other small services

Page 7: JEDEC Jason Taylor 2014 October 27

News Feed rack

• The rack is our unit of capacity

• All 40 servers work together

!

• Leaf + agg code runs on all servers

• Leaf has most of the RAM

• Aggregator uses most of the CPU

!

• Lots of network BW within the rack

Leaf Aggregator

AL

AL

AL

.

.

.

.

Page 8: JEDEC Jason Taylor 2014 October 27

Standard Systems

I Web

III Database

IV Hadoop

V Photos

VI Feed

CPUHigh  

2  x  E5-­‐2670High  

2  x  E5-­‐2660High  

2  x  E5-­‐2660 Low High  2  x  E5-­‐2660

Memory Low High  144GB

Medium  64GB Low High  

144GB

Disk Low High  IOPS  3.2  TB  Flash

High  15  x  4TB  SAS

High  15  x  4TB  SAS Medium

Services Web,  Chat Database Hadoop  (big  data) Photos,  Video MulPfeed,  

Search,  Ads

Five Standard Servers/Rack

Page 9: JEDEC Jason Taylor 2014 October 27

Five Server Types• Advantages:

• Volume pricing

• Re-purposing

• Easier operations - simpler repairs, drivers, DC headcount

• New servers allocated in hours rather than months

!

• Drawbacks:

• 40 major services; 200 minor ones - not all fit perfectly

• The needs of the service change over time.

Page 10: JEDEC Jason Taylor 2014 October 27

@FB the Rack is the Computer.

The application lives on a rack of equipment--not a single server.

!

Most of the data required to produce a response comes from off-host.

— Application servers are not common.

— A tier of servers has persistent state.

Page 11: JEDEC Jason Taylor 2014 October 27

Disaggregated Rack

• Better component/service fit

• Extending component useful life

!

Developing New Components

• CPU, RAM, Disk & Flash

@FB the Rack is the Computer.

Page 12: JEDEC Jason Taylor 2014 October 27

A rack of news feed servers...

COMPUTE

RAM

STORAGE

Type-6 Server

Network Switch

Type-6 Server

Type-6 Server

Type-6 Server

=>5.8 TB

80 TB

.

.

.

FLASH30 TB

Type-6 Server

80 processors 640 cores

Leaf Aggregator

AL

AL

AL

.

.

.

.

The application lives on a rack of equipment--not a single server.

Page 13: JEDEC Jason Taylor 2014 October 27

Compute

• Standard Server

• 2 processors (or many)

• 8 or 16 DIMM slots

• no hard drive - small flash boot partition.

• big NIC - 10 Gbps or more

Page 14: JEDEC Jason Taylor 2014 October 27

Ram Sled•Hardware

• 128GB to 512GB

• compute: FPGA, ASIC, mobile processor or desktop processor

!

•Performance

• 450k to 1 million key/value gets/sec

!

•Cost

• Excluding RAM cost: $500 to $700 or a few dollars per GB

Page 15: JEDEC Jason Taylor 2014 October 27

Storage Sled (Knox)•Hardware

• 15 - 30 drives

• Replace SAS expander w/ small server

!

•Performance

• 3k IOPS

!

•Cost

• Excluding drives: $500 to $700 or less than $0.01 per GB

Page 16: JEDEC Jason Taylor 2014 October 27

Flash Sled•Hardware

• 30TB to 512TB of flash

!

•Performance

• 5k - 10k IOPS/TB

• 500 w/e cycles = ~TLC

!

•Interface

• SAS, Ethernet, DIMM, or other.

NIC at 70% utilization

IOPS(7.5k/TB)

Capacity

10 Gbps 225k 30 TB

15 Gbps 300k 40  TB

25 Gbps 480k 64  TB

30 Gbps 600k 80  TB

90 Gbps 1.9M 256 TB

180 Gbps 3.8M 512 TB

Page 17: JEDEC Jason Taylor 2014 October 27

Three Disaggregated Rack Wins!

• Server/Service Fit - across services !

• Server/Service Fit - over time

!

• Longer useful life through smarter hardware refreshes.

Page 18: JEDEC Jason Taylor 2014 October 27

Server/Service Fit - across services

TYPE-6 server

CPU

Other Service A

RAM

MultiFeed (news feed)

CPU

RAM

WASTED CPU RESOURCE

TYPE-6 server

Page 19: JEDEC Jason Taylor 2014 October 27

Server/Service Fit - over time

TYPE-6 server

CPU

Year 2 - more RAM needed

RAM

Year 1

CPU

RAM

NOT ENOUGH RAM

TYPE-6 server

Page 20: JEDEC Jason Taylor 2014 October 27

Longer Useful LifeToday servers are typically kept in production for about 3 years.

!

With disaggregated rack:

• Compute - 3 to 6 years

• RAM sled - 5 years or more

• Disk sled - 4 to 5 years depending on usage

• Flash sled - 6 years depending on write volume

Page 21: JEDEC Jason Taylor 2014 October 27

A Disaggregated Rack for Graph Search...

Compute

Network Switch

Compute

Storage Sled

RAM Sled

=>

.

.

Flash Sled

.

.

COMPUTE

RAM

STORAGE

3.1 TB

60 TB

FLASH30 TB

40 processors 320 cores

20 Compute Servers !8 Flash Sleds !2 RAM Sleds !1 Storage Sled !!=> 1:10 RAM:Flash ratio !* Add 4 more flash sleds in 2014 to get to a 1:15 RAM:Flash ratio *

Page 22: JEDEC Jason Taylor 2014 October 27

Disaggregated Rack•Strengths:

• Volume pricing, serviceability, etc.

• Custom Configurations

• Hardware evolves with service

• Smarter Technology Refreshes

• Speed of Innovation

!

•Potential issues:

• Physical changes required

• Interface overhead

Page 23: JEDEC Jason Taylor 2014 October 27

Approximate Win Estimates!

Conservative assumptions show a 12% to 20% opex savings.

!

More aggressive assumptions promise between 14% and 30% opex savings.

!

* These are reasonable savings estimates of what may be possible across several use cases.

Page 24: JEDEC Jason Taylor 2014 October 27

DIMM-Based Interconnect

D  I  MM

D  I  MM

D  I  MM

CH1  DIMM1

CH1  DIMM2

CH2  DIMM1

Specialized  DIMM  Connector

CPU

Connect  to  a  DIMM  slot  to  a  Flash  sled,  or  another  server  bypassing  traditional  network  stack  of  Nic,TOR,  cluster  switch

D  I  MM

Page 25: JEDEC Jason Taylor 2014 October 27

CPU1  

Server  1

D  I  MM

CH1  DIMM1

CPU2  

Server  2

D  I  MM

CH1  DIMM1

Specialized  Dimm  Connector

RAM-­‐to-­‐RAM  Copy?

D  I  MM

D  I  MM

DIMM-Based Interconnect

Page 26: JEDEC Jason Taylor 2014 October 27

CPU1  

Server  1

D  I  MM

CH1  DIMM1 Network  

Switch

Specialized  Dimm  Connector

Network-­‐to-­‐RAM  Copy?

N  I  C

D  I  MM

DIMM-Based Interconnect

Page 27: JEDEC Jason Taylor 2014 October 27

Interrupt Driven DIMM MechanismD  I  MM

D  I  MM

D  I  MM

D  I  MM

CH1  DIMM1

CH1  DIMM2

CH2  DIMM1

PCH

Pin  230  on  DDR4  connector

GPIO  1Interrupt  to  CPU

Interrupt  Mechanism  =  Kernel  modifications  +  Repurposed  Interrupt  pin  =  No  Polling  

CPU

Page 28: JEDEC Jason Taylor 2014 October 27