Top Banner
The x86 Server Platform .. Resistance is futile…. Dec 6, 2004
34

The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

Mar 26, 2015

Download

Documents

Amber Kerr
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

The x86 Server Platform

.. Resistance is futile….

Dec 6, 2004

Page 2: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

2

Server shipments – Total vs x86

Page 3: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

3

Market Share: Servers, United States, 2Q04  

United States: Vendor Revenue by Operating System (Millions of Dollars)

  2Q03 3Q03 4Q03 1Q04 2Q04

Market Share2Q03

Market Share2Q04

Growth2Q03-2Q04

Growth1Q04-2Q04

Windows 1,534.1 1,692.3 1,671.6 1,645.6 1,665.5 34.79% 36.18% 8.6% 1.2%

Unix 1,622.6 1,474.6 1,554.1 1,374.2 1,471.9 36.79% 31.98% -9.3% 7.1%

Others 820.2 823.7 1,142.4 897.2 852.6 18.60% 18.52% 3.9% -5.0%

Linux 433.2 497.3 552.5 555.0 613.2 9.82% 13.32% 41.5% 10.5%

Total 4,410.2 4,487.9 4,920.7 4,472.1 4,603.1 100.00% 100.00% 4.4% 2.9%

Michael McLaughlin, Market Share: Servers, United States, 2Q04  7 October 2004, Gartner

Page 4: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

4

x86 Platform CPUsIntel

• Xeon MP – Gallatin (future is Potomac)

• Xeon SP/DP – EM64T - Nacona

• Itanium II MP – Madison (future is Montecito)

AMD

• Opteron

Page 5: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

5

Gallatin - MP130 nm

3 GHz

4 MB L3 Cache

FSB - 400 MHz

Page 6: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

6

ES7000 – 32 Gallatins

Page 7: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

7

Nacona – Single Processor with EM64T

90 nm

Clock Speed – 3.2-3.6 GHz

L3 – 4 MB

FSB – 800 Mhz

Page 8: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

8

Itanium II - Madison130 nm

9 MB L3 cache

1.6 GHz

FSB – 400 MHz

Page 9: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

this is a footer

Page 10: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

10

Page 11: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

11

STOPWhy Multi-Core?.. And while we’re at it, why Multi-Threading?

It’s all about the balance of

• Silicon real estate

• Compiler technology

• Cost

• Power

…. to meeting the constant pressure to double performance every 18 months

Page 12: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

12

Memory Latency vs CPU Speed

0.01

0.1

1.0

10.0

0.01

0.1

1.0

10.0

1990 1995 2000 2005 2010

MicroprocessorOperating Frequency (GHz)

DRAM AccessFrequency (10-9 sec)-1

Microprocessor on-chip clock

Commodity DRAM

Production Year

Page 13: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

13

Processor ArchitectureWhen latency ↓ Ø and bandwidth ↑ ∞ we will have the perfect CPU

A great deal of innovation has centered around approximating this perfect world

• CISC

• CPU Cache

• RISC

• EPIC

• Multi-Threading

• Multiple Cores

Page 14: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

14

Complex Instruction Set ComputerHardware implements assembler instructions

MULT A, B

• hardware loads registers, multiplies and stores results

• Multiple clocks needed for an instruction

RAM requirements are relatively small

Compilers translate high level languages down to assembler instructions – Von Neumann

http://www.hardwarecentral.com/hardwarecentral/tutorials/2427

hardware

Page 15: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

15

CPU CacheWhen CPU speeds started to increase, memory latency emerged as a bottleneck

CPU caches were used to keep local references “close” to the CPU

For SMP systems, memory banks were more than a clock away

• It is not uncommon today to find 3 orders of magnitude between the fastest and slowest memory latency

Page 16: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

16

Reduced Instruction Set ComputerHardware is simplified – fewer transistors are needed for full instruction set

RAM requirements are higher to store intermediate results and more code

Compilers are more complex

Clock speeds increase because instructions are simpler

Deterministic, simple instructions allow pipelining

Page 17: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

17

Pipelining

Higher Clock Speeds!

25% busy

100% busy 80% busy 60% busy 40% busy

Page 18: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

18

Branch PredictionWhile processing in parallel, branches occur

Branch prediction is used to increase the probability that a specific branch will be followed

If incorrect, the pipeline is “dead” and the CPU stalls

Statistics• 10%-20% of instructions are branches

• Predictions are incorrect about 10% of the time

As the pipeline increases, probability of miss increases and cycles will be discarded

• 80-deep pipeline / 20% branches / 10% miss => 80% chance of miss and a penalty of 80 cycles

Page 19: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

19

Itanium II Epic Instruction SetExplicitly Parallel Instruction Computing

Compiler can indicate code that can be executed in parallel

Both branches are pipelined

• No lost cycles due to miss-prediction

Pipeline can be deeper

Complexity continues to move into the compiler

Page 20: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

20

Multi-Threading

Page 21: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

21

Page 22: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

22

Multiple CoresFabrication sizes continue to diminish

The additional real estate has been used to put more and more memory on the die

Multi-core technology provides a new way to exploit the additional space

The clock rates cannot continue to climb due to the excessive heat

• P = C * V2 * f C - switch capacitance V – Supply Voltage f – clock frequency

Multiple cores is the next step to providing faster execution times for applications

Page 23: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

23

(End of 2005?)

Page 24: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

24

Page 25: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

25

Page 26: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

26

Page 27: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

27

Page 28: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

28

Page 29: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

29

Page 30: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

30

AMD Opteron 800 Series

130 nm

Clock Speed – 1.4-2.4 GHz

L2 – 1 MB

6.4 GB/s Hypertransport

Page 31: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

31

Architectural Comparison

DDR 144-bit

Opteron Opteron

OpteronOpteron

PCI-XBridge

PCI-XBridge

I/OHub

OtherBridge

Hypertransport™ - 6.4 GB/s

Xeon Xeon Xeon Xeon

SNC

I/OHub

MemoryAddressBuffer

MemoryAddressBuffer

MemoryAddressBuffer

MemoryAddressBuffer

PCI-XBridge

PCI-XBridge

PCI-XBridge

6.4 GB/s

Page 32: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

32

Mapping Workloads onto ArchitectureConsider a dichotomy of workloads:

• Large Memory Model – This needs a large, single system image and a large amount of coherent memory

- Database apps - SQL Server / Oracle

- Business Intelligence – Data Warehousing + Analytics

- Memory-resident databases

- 64 bit architectures allow memory addressability above 1 TB

• Small/Medium Memory Model – This can be cost-effective in workloads that do not require extensive shared memory/state

- Stateless Applications and Web Services

- Web Servers

- Clusters of systems for parallelized applications and grids

Page 33: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

33

Large Server VendorsIntel Announcement (Nov 19)Otellini said product development, marketing and software efforts (for Itanium) will all now be aimed at "greater than four-way systems". He also said, "The mainframe isn't dead. That's where I'd like to push Itanium over time."

The size of the SMP is affected by Intel’s chip set support for coherent memory

OEM Vendors (Unisys, HP, SGI, Fujitsu, IBM) • Each has unique “chip set” to build basic four-ways into

large SMP systems

• IBM has Power5, which is a direct competitor

Intel 32-bit and EM674T• This could emerge as the flagship product

Page 34: The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

34

Where Are We Going?Since the early CISC computers, we have moved more and more of the complexity out to the compiler to achieve parallelism and fully exploit the silicon “real estate”

The power requirements, along with the smaller fabrication sizes, have pushed the CPU vendors to exploit multiple cores

The key to performance for these future machines will be the application’s ability to exploit parallelism