© 2013 IBM Corporation Power Roadmap POWER8
© 2013 IBM Corporation
Power RoadmapPOWER8
© 2013 IBM Corporation2
POWER7 Systems Announcements…..
Power 780Power 770B Models
2010
Power 7959119-FHB
Power 7508233-E8B
Power 710 / 730 B Models
Power 720 / 740B Models
Power Blades
Power 7759119-F2C
Power 710 / 730 C Models
Power 720 / 740C Models
Power 780Power 770C Models
2011
P260+7895-22X
p4607895-42X
7R1 / 7R2
p24L
p2607895-22X
2012
Power 780Power 770D Models
PureSystems
2013
Power 760Power 750D Models
Power 710 / 730 D Models
Power 720 / 740D Models
7R1 / 7R2
7R4
P260+7895-23A
p4607895-43X
P270+7895-24X
© 2013 IBM Corporation3
Power
770+
Power
780+
Power
710+/730+
Power
720+/740+
Power 795
PureSystems
Virtualization & Mgmt.
p260+
p24L
POWER7 Portfolio
Power
750+
Power
760+
PowerLinux
7R1 / 7R2 / 7R4
PureDataPureAppsPureFlex
p460+
p270+
POWER7+
© 2013 IBM Corporation4
2004 2007 2010 2014-2015
POWER7/7+45/32 nm
POWER8
�Eight Cores�On-Chip eDRAM �Power-Optimized Cores�Memory Subsystem ++�SMT++�Reliability +�VSM & VSX�Protection Keys+
POWER6/6+65/65 nm
�Dual Core�High Frequencies �Virtualization +�Memory Subsystem +�Altivec �Instruction Retry�Dynamic Energy Mgmt�SMT +�Protection Keys
POWER5/5+130/90 nm
�Dual Core�Enhanced Scaling�SMT�Distributed Switch +�Core Parallelism +�FP Performance +�Memory Bandwidth +�Virtualization
Power Processor Technology Roadmap
�More Cores�SMT+++�Reliability ++�CAPI Support�Transactional Memory
�Operating System booted
Future
© 2013 IBM Corporation5
POWER822 nm
POWER4/4+180 / 130 nm
POWER5/5+130 / 90 nm
POWER6/6+65 nm
POWER7/7+45/32 nm
8 Cores3rd Gen SMTL3+ On Chip
More Cores4th Gen SMT
Encryption LogicCAPI
PCIe AccelerationTransactional memory
Enhanced Caches
Dual CoresDual Threads
External L3
Processor Directions
© 2013 IBM Corporation6
Technology
POWER5
2004
POWER6
2007
POWER7
2010
POWER7+
2012
Compute
Cores
Threads
Caching
On-chip
Off-chip
Bandwidth
Sust. Mem.
Peak I/O
130nm SOI 65nm SOI
45nm SOI
eDRAM
32nm SOI
eDRAM
2
SMT2
2
SMT2
8
SMT4
8
SMT4
1.9MB
36MB
8MB
32MB
2 + 32MB
None
2 + 80MB
None
15GB/s
6GB/s
30GB/s
20GB/s
100GB/s
40GB/s
100GB/s
40GB/s
Processor Roadmap
© 2013 IBM Corporation7
Technology
POWER5
2004
POWER8
POWER6
2007
POWER7
2010
POWER7+
2012
Compute
Cores
Threads
Caching
On-chip
Off-chip
Bandwidth
Sust. Mem.
Peak I/O
130nm SOI 65nm SOI
45nm SOI
eDRAM
32nm SOI
eDRAM
2
SMT2
2
SMT2
8
SMT4
8
SMT4
1.9MB
36MB
8MB
32MB
2 + 32MB
None
2 + 80MB
None
15GB/s
6GB/s
30GB/s
20GB/s
100GB/s
40GB/s
100GB/s
40GB/s
2014
Processor Roadmap
© 2013 IBM Corporation8
LeadershipPerformance
• Increase core
throughput at single
thread, SMT2, SMT4, and
SMT8 level
• Large step in per socket
performance
• Enable more robust
multi-socket scaling
SystemInnovation
• Higher capacity cache hierarchy
and highly threaded processor
• Enhanced memory bandwidth,
capacity, and expansion
• Dynamic code optimization
• Hardware-accelerated virtual
memory management
Open SystemInnovation
• Coherent Accelerator
Processor Interface
(CAPI)
• Agnostic Memory
interface
• Open system software
POWER8 Vision
© 2013 IBM Corporation9
POWER8 Architecture
© 2013 IBM Corporation10
VSUFXU
IFU
DFU
ISU
LSU
Larger Caching
Structures vs. POWER7
• 2x L1 data cache (64 KB)
• 2x outstanding data cache misses
• 4x translation Cache
Wider Load/Store
• 32B � 64B L2 to L1 data bus
• 2x data cache to execution dataflow
Enhanced Prefetch
• Instruction speculation awareness
• Data prefetch depth awareness
•Adaptive bandwidth awareness
• Topology awareness
Execution Improvement
vs. POWER7
• SMT4 � SMT8
• 8 dispatch
• 10 issue
• 16 execution pipes:
• 2 FXU, 2 LSU, 2 LU, 4 FPU,
2 VMX, 1 Crypto, 1 DFU,
1 CR, 1 BR
• Larger Issue queues (4 x 16-entry)
• Larger global completion,
Load/Store reorder
• Improved branch prediction
• Improved unaligned storage
access
Core Performance vs . POWER7
~1.6x Single Thread
~2x Max SMT
POWER8 Core
© 2013 IBM Corporation11
Caches
• 512 KB SRAM L2 / core
• 96 MB eDRAM shared L3
• Up to 128 MB eDRAM L4
(off-chip)
Memory• Up to 230 GB/s
sustained bandwidth
Bus Interfaces• Durable open memory
attach interface
• Integrated PCIe Gen3
• SMP Interconnect
• CAPI (Coherent Accelerator
Processor Interface)
Cores
• 12 cores (SMT8)
• 8 dispatch, 10 issue,
16 exec pipe
• 2X internal data
flows/queues
• Enhanced prefetching
• 64K data cache,
32K instruction cache
Accelerators
• Crypto & memory expansion
• Transactional Memory
• VMM assist
• Data Move / VM Mobility Energy Management• On-chip Power Management Micro-controller
• Integrated Per-core VRM
• Critical Path Monitors
Technology
• 22nm SOI, eDRAM, 15 ML 650mm2
L3 Cache & Chip Interconnect
8M L3
Region
Mem. Ctrl.Mem. Ctrl.
SM
P L
inks
Accelerato
rsS
MP
Lin
ksP
CIe
POWER8 Chip Packaging
© 2013 IBM Corporation12
• L2: 512 KB 8 way per core
• L3: 96 MB (12 x 8 MB 8 way Bank)
• “NUCA” Cache policy (Non-Uniform Cache Architecture)
– Scalable bandwidth and latency – Migrate “Hot” lines to local L2, then local L3 (replicate L2 contained footprint)
• Chip Interconnect: 150 GB/sec x 12 segments per direction = 3.6 TB/sec
L2
L2 L2 L2
L2 L2 L2 L2
L2 L2
L2
L2
L3 Bank L3 Bank L3 Bank
L3 Bank L3 Bank L3 Bank
L3 Bank L3 Bank L3 Bank
L3 BankL3 Bank L3 BankL3 Bank L3 BankL3 Bank
Chip InterconnectMemory Memory
Core Core Core
SMP
Acc
Core Core
CoreCoreCoreCoreCoreCore
SMP
PCIe
Core
POWER8 on Chip Caches
© 2013 IBM Corporation13
…with 16MB
of Cache…MemoryBuffer
DRAMChips
DDR Interfaces
POWER8
Link
Scheduler &
Management
16MB
Memory
Cache
Intelligence Moved into Memory• Scheduling logic, caching structures• Energy Mgmt, RAS decision point
– Formerly on Processor– Moved to Memory Buffer
Processor Interface• 9.6 GB/s high speed interface• More robust RAS•“ On-the-fly” lane isolation/repair• Extensible for innovation build-out
Performance Value• End-to-end fastpath and data retry (latency)• Cache � latency/bandwidth, partial updates• Cache � write scheduling, prefetch, energy• 22nm SOI for optimal performance / energy• 15 metal levels (latency, bandwidth)
POWER8 Memory Buffer Chip
© 2013 IBM Corporation14
Transactional Memory
Power8 Support�New instructions mark beginning and end of transaction
• Hardware ensures region is performed atomically using speculation
�Speculation recovery performed in hardware, both registers and memory
�“Flattened” Nesting• Hardware tracks nesting of transactions
• Treats them all as a single large transaction
Application-level instruction interface�Transaction Begin/End Instructions
�Explicit abort�Diagnostic register - Transaction Exception and Summary Register
• Indicates cause of transaction failure
Definition�Technique that allows a group of instructions including updates to memory image to execute speculatively and atomically. This group of instructions is called a transaction
Value�Reducing programming development�Reducing customer cost (higher SLA / fewer images and higher scalability�Improving performance of legacy software with large sequential components
© 2013 IBM Corporation15
POWER7
I/OBridge
GXBus
PCIe G2PCIDevices
PCIe G3
PCIDevice
Native PCIe Gen 3 Support• Direct processor integration
• Replaces proprietary GX/Bridge
• Low latency
• Gen3 x16 bandwidth (16 Gb/s)
Transport Layer for CAPI Protocol• Coherently Attach Devices connect to
processor via PCIe
• Protocol encapsulated in PCIe
POWER8
POWER8 Integrated PCI Gen 3
© 2013 IBM Corporation16
CustomHardware
Application
POWER8
CAPP
Coherence Bus
PSL
FPGA or ASIC
Customizable Hardware
Application Accelerator
• Specific system SW, middleware, or user application
• Written to durable interface provided by PSL
POWER8
PCIe Gen 3Transport for encapsulated messages
Processor Service Layer (PSL)
• Present robust, durable interfaces to applications
• Offload complexity / content from CAPP
Virtual Addressing• Accelerator can work with same memory addresses that the
processors use• Pointers de-referenced same as the host application• Removes OS & device driver overhead
Hardware Managed Cache Coherence• Enables the accelerator to participate in “Locks” as a normal
thread Lowers Latency over IO communication model
POWER8 CAPI (Coherent Accelerator Processor Interface)
© 2013 IBM Corporation17
Socket Performance
© 2013 IBM Corporation18
Client Experience
�Handons testing with POWER8 hardware
Advocate/ESP support team
�Extended team will monitor client testing progress against test matrix &
collect feedback/experienceESP Execution
�Wkly Interlock Mtg for extended ESP team
Program to include support for..
�AIX
�IBM i
�Linux / Powerlinux
�Simplify PowerVM Client Requirements
� Perform meaning testing
� Weekly calls
� Some minimal education
Contact: Marianne Golden Austin TX [email protected]
512-296-4264
Beta Program
© 2013 IBM Corporation19