Power Roadmap POWER8

© 2013 IBM Corporation

Power RoadmapPOWER8

© 2013 IBM Corporation2

POWER7 Systems Announcements…..

Power 780Power 770B Models

2010

Power 7959119-FHB

Power 7508233-E8B

Power 710 / 730 B Models

Power 720 / 740B Models

Power Blades

Power 7759119-F2C

Power 710 / 730 C Models

Power 720 / 740C Models

Power 780Power 770C Models

2011

P260+7895-22X

p4607895-42X

7R1 / 7R2

p24L

p2607895-22X

2012

Power 780Power 770D Models

PureSystems

2013

Power 760Power 750D Models

Power 710 / 730 D Models

Power 720 / 740D Models

7R1 / 7R2

7R4

P260+7895-23A

p4607895-43X

P270+7895-24X


Power

770+

Power

780+

Power

710+/730+

Power

720+/740+

Power 795

PureSystems

Virtualization & Mgmt.

p260+

p24L

POWER7 Portfolio

Power

750+

Power

760+

PowerLinux

7R1 / 7R2 / 7R4

PureDataPureAppsPureFlex

p460+

p270+

POWER7+


2004 2007 2010 2014-2015

POWER7/7+45/32 nm

POWER8

�Eight Cores�On-Chip eDRAM �Power-Optimized Cores�Memory Subsystem ++�SMT++�Reliability +�VSM & VSX�Protection Keys+

POWER6/6+65/65 nm

�Dual Core�High Frequencies �Virtualization +�Memory Subsystem +�Altivec �Instruction Retry�Dynamic Energy Mgmt�SMT +�Protection Keys

POWER5/5+130/90 nm

�Dual Core�Enhanced Scaling�SMT�Distributed Switch +�Core Parallelism +�FP Performance +�Memory Bandwidth +�Virtualization

Power Processor Technology Roadmap

�More Cores�SMT+++�Reliability ++�CAPI Support�Transactional Memory

�Operating System booted

Future


POWER822 nm

POWER4/4+180 / 130 nm

POWER5/5+130 / 90 nm

POWER6/6+65 nm

POWER7/7+45/32 nm

8 Cores3rd Gen SMTL3+ On Chip

More Cores4th Gen SMT

Encryption LogicCAPI

PCIe AccelerationTransactional memory

Enhanced Caches

Dual CoresDual Threads

External L3

Processor Directions


Technology

POWER5

2004

POWER6

2007

POWER7

2010

POWER7+

2012

Compute

Cores

Threads

Caching

On-chip

Off-chip

Bandwidth

Sust. Mem.

Peak I/O

130nm SOI 65nm SOI

45nm SOI

eDRAM

32nm SOI

eDRAM

2

SMT2

2

SMT2

8

SMT4

8

SMT4

1.9MB

36MB

8MB

32MB

2 + 32MB

None

2 + 80MB

None

15GB/s

6GB/s

30GB/s

20GB/s

100GB/s

40GB/s

100GB/s

40GB/s

Processor Roadmap


Technology

POWER5

2004

POWER8

POWER6

2007

POWER7

2010

POWER7+

2012

Compute

Cores

Threads

Caching

On-chip

Off-chip

Bandwidth

Sust. Mem.

Peak I/O

130nm SOI 65nm SOI

45nm SOI

eDRAM

32nm SOI

eDRAM

2

SMT2

2

SMT2

8

SMT4

8

SMT4

1.9MB

36MB

8MB

32MB

2 + 32MB

None

2 + 80MB

None

15GB/s

6GB/s

30GB/s

20GB/s

100GB/s

40GB/s

100GB/s

40GB/s

2014

Processor Roadmap


LeadershipPerformance

• Increase core

throughput at single

thread, SMT2, SMT4, and

SMT8 level

• Large step in per socket

performance

• Enable more robust

multi-socket scaling

SystemInnovation

• Higher capacity cache hierarchy

and highly threaded processor

• Enhanced memory bandwidth,

capacity, and expansion

• Dynamic code optimization

• Hardware-accelerated virtual

memory management

Open SystemInnovation

• Coherent Accelerator

Processor Interface

(CAPI)

• Agnostic Memory

interface

• Open system software

POWER8 Vision


POWER8 Architecture


VSUFXU

IFU

DFU

ISU

LSU

Larger Caching

Structures vs. POWER7

• 2x L1 data cache (64 KB)

• 2x outstanding data cache misses

• 4x translation Cache

Wider Load/Store

• 32B � 64B L2 to L1 data bus

• 2x data cache to execution dataflow

Enhanced Prefetch

• Instruction speculation awareness

• Data prefetch depth awareness

•Adaptive bandwidth awareness

• Topology awareness

Execution Improvement

vs. POWER7

• SMT4 � SMT8

• 8 dispatch

• 10 issue

• 16 execution pipes:

• 2 FXU, 2 LSU, 2 LU, 4 FPU,

2 VMX, 1 Crypto, 1 DFU,

1 CR, 1 BR

• Larger Issue queues (4 x 16-entry)

• Larger global completion,

Load/Store reorder

• Improved branch prediction

• Improved unaligned storage

access

Core Performance vs . POWER7

~1.6x Single Thread

~2x Max SMT

POWER8 Core


Caches

• 512 KB SRAM L2 / core

• 96 MB eDRAM shared L3

• Up to 128 MB eDRAM L4

(off-chip)

Memory• Up to 230 GB/s

sustained bandwidth

Bus Interfaces• Durable open memory

attach interface

• Integrated PCIe Gen3

• SMP Interconnect

• CAPI (Coherent Accelerator

Processor Interface)

Cores

• 12 cores (SMT8)

• 8 dispatch, 10 issue,

16 exec pipe

• 2X internal data

flows/queues

• Enhanced prefetching

• 64K data cache,

32K instruction cache

Accelerators

• Crypto & memory expansion

• Transactional Memory

• VMM assist

• Data Move / VM Mobility Energy Management• On-chip Power Management Micro-controller

• Integrated Per-core VRM

• Critical Path Monitors

Technology

• 22nm SOI, eDRAM, 15 ML 650mm2

L3 Cache & Chip Interconnect

8M L3

Region

Mem. Ctrl.Mem. Ctrl.

SM

P L

inks

Accelerato

rsS

MP

Lin

ksP

CIe

POWER8 Chip Packaging


• L2: 512 KB 8 way per core

• L3: 96 MB (12 x 8 MB 8 way Bank)

• “NUCA” Cache policy (Non-Uniform Cache Architecture)

– Scalable bandwidth and latency – Migrate “Hot” lines to local L2, then local L3 (replicate L2 contained footprint)

• Chip Interconnect: 150 GB/sec x 12 segments per direction = 3.6 TB/sec

L2

L2 L2 L2

L2 L2 L2 L2

L2 L2

L2

L2

L3 Bank L3 Bank L3 Bank



L3 BankL3 Bank L3 BankL3 Bank L3 BankL3 Bank

Chip InterconnectMemory Memory

Core Core Core

SMP

Acc

Core Core

CoreCoreCoreCoreCoreCore

SMP

PCIe

Core

POWER8 on Chip Caches


…with 16MB

of Cache…MemoryBuffer

DRAMChips

DDR Interfaces

POWER8

Link

Scheduler &

Management

16MB

Memory

Cache

Intelligence Moved into Memory• Scheduling logic, caching structures• Energy Mgmt, RAS decision point

– Formerly on Processor– Moved to Memory Buffer

Processor Interface• 9.6 GB/s high speed interface• More robust RAS•“ On-the-fly” lane isolation/repair• Extensible for innovation build-out

Performance Value• End-to-end fastpath and data retry (latency)• Cache � latency/bandwidth, partial updates• Cache � write scheduling, prefetch, energy• 22nm SOI for optimal performance / energy• 15 metal levels (latency, bandwidth)

POWER8 Memory Buffer Chip


Transactional Memory

Power8 Support�New instructions mark beginning and end of transaction

• Hardware ensures region is performed atomically using speculation

�Speculation recovery performed in hardware, both registers and memory

�“Flattened” Nesting• Hardware tracks nesting of transactions

• Treats them all as a single large transaction

Application-level instruction interface�Transaction Begin/End Instructions

�Explicit abort�Diagnostic register - Transaction Exception and Summary Register

• Indicates cause of transaction failure

Definition�Technique that allows a group of instructions including updates to memory image to execute speculatively and atomically. This group of instructions is called a transaction

Value�Reducing programming development�Reducing customer cost (higher SLA / fewer images and higher scalability�Improving performance of legacy software with large sequential components


POWER7

I/OBridge

GXBus

PCIe G2PCIDevices

PCIe G3

PCIDevice

Native PCIe Gen 3 Support• Direct processor integration

• Replaces proprietary GX/Bridge

• Low latency

• Gen3 x16 bandwidth (16 Gb/s)

Transport Layer for CAPI Protocol• Coherently Attach Devices connect to

processor via PCIe

• Protocol encapsulated in PCIe

POWER8

POWER8 Integrated PCI Gen 3


CustomHardware

Application

POWER8

CAPP

Coherence Bus

PSL

FPGA or ASIC

Customizable Hardware

Application Accelerator

• Specific system SW, middleware, or user application

• Written to durable interface provided by PSL

POWER8

PCIe Gen 3Transport for encapsulated messages

Processor Service Layer (PSL)

• Present robust, durable interfaces to applications

• Offload complexity / content from CAPP

Virtual Addressing• Accelerator can work with same memory addresses that the

processors use• Pointers de-referenced same as the host application• Removes OS & device driver overhead

Hardware Managed Cache Coherence• Enables the accelerator to participate in “Locks” as a normal

thread Lowers Latency over IO communication model

POWER8 CAPI (Coherent Accelerator Processor Interface)


Socket Performance


Client Experience

�Handons testing with POWER8 hardware

Advocate/ESP support team

�Extended team will monitor client testing progress against test matrix &

collect feedback/experienceESP Execution

�Wkly Interlock Mtg for extended ESP team

Program to include support for..

�AIX

�IBM i

�Linux / Powerlinux

�Simplify PowerVM Client Requirements

� Perform meaning testing

� Weekly calls

� Some minimal education

Contact: Marianne Golden Austin TX [email protected]

512-296-4264

Beta Program


Power Roadmap POWER8

Documents