Sima Dezső Többmagos/sokmagos processzorok-2 2012. December Version 1.2
Jan 20, 2016
Sima Dezső
Többmagos/sokmagosprocesszorok-2
2012. December
Version 1.2
Áttekintés
1. Többmagos processzorok megjelenésének szükségszerűsége•
2. Homogén többmagos processzorok •
3. Heterogén többmagos processzorok•
2.1 Hagyományos többmagos processzorok•
3.1 Mester/szolga elvű többmagos processzorok•
3.2 Csatolt többmagos processzorok•
4. Kitekintés•
2.2 Sokmagos processzorok•
3. Heterogén többmagos processzorok
3.1 Heterogén mester/szolga elvű többmagos processzorok (1)
3.1 ábra Többmagos processzorok főbb osztályai
Desktops
Heterogenous multicores
Homogenous multicores
Multicore processors
Manycore processors
Servers
with >8 cores
ConventionalMC processors
Master/slavearchitectures
Add-onarchitectures
MPC
CPU GPU
2 ≤ n ≤ 8 cores
General purpose computing
Prototypes/ experimental systems
MM/3D/HPCproduction stage
HPCnear future
3. Heterogén többmagos processzorok
3.1 Heterogén többmagos mester/szolga elvű TP-ok•
A Cell processzor
3.1 Heterogén mester/szolga elvű TP-ok - A Cell (1)
Cell BE
• Előzmények:
2000 nyara: Az architektúra alapjainak meghatározása02/2006: Cell Blade QS2008/ 2007 Cell Blade QS2105/ 2008 Cell Blade QS22
• Sony, IBM és Toshiba közös terméke
• Cél: Játékok/multimédia, HPC alkalmazások
Playstation 3 (PS3) QS2x Blade Szerver család
(2 Cell BE/blade)
EIB: Element Interface Bus
3.2 ábra: A Cell BE blokk diagramja
SPE: Synergistic Procesing ElementSPU: Synergistic Processor UnitSXU: Synergistic Execution UnitLS: Local Store of 256 KBSMF: Synergistic Mem. Flow Unit
PPE: Power Processing ElementPPU: Power Processing UnitPXU: POWER Execution Unit
MIC: Memory Interface Contr.BIC: Bus Interface Contr.
XDR: Rambus DRAM
3.1 Heterogén mester/szolga elvű TP-ok - A Cell (2)
3.3 ábra: A Cell BE lapka (221mm2, 234 mtrs)
3.1 Heterogén mester/szolga elvű TP-ok - A Cell (3)
3.10 ábra: A Cell BE lapka - EIB
3.1 Heterogén mester/szolga elvű TP-ok - A Cell (4)
3.11 ábra: Az EIB működési elve
3.1 Heterogén mester/szolga elvű TP-ok - A Cell (5)
3.12 ábra: Konkurens átvitelek az EIB-en
3.1 Heterogén mester/szolga elvű TP-ok - A Cell (6)
• Teljesítmény @ 3.2 GHz:
QS21 Csúcs SP FP: 409,6 GFlops (3.2 GHz x 2x8 SPE x 2x4 SP FP/cycle)
• Cell BE - NIK
2007: Faculty Award (Cell 3Đ app./Teaching)
2008: IBM – NIK Kutatási Együttműködési Szerződés: Teljesítményvizsgálatok• IBM Böblingen Lab• IBM Austin Lab
3.1 Heterogén mester/szolga elvű TP-ok - A Cell (7)
The Roadrunner
6/2008 : International Supercomputing Conference, Dresden
A világ 500 leggyorsabb számítógépe
1. Roadrunner1. Roadrunner 1 Petaflops (1015) fenntartott teljesítmény (linpack)
3.1 Heterogén mester/szolga elvű TP-ok - A Cell (8)
3.13 ábra:A világ leggyorsabb számítógépe: IBM Roadrunner (Los Alamos 2008)
3.1 Heterogén mester/szolga elvű TP-ok - A Cell (9)
3.14 ábra: A Roadrunner főbb jellemzői
3.1 Heterogén mester/szolga elvű TP-ok - A Cell (10)
3.2 Heterogén csatolt többmagos processzorok•
3.15 ábra: Többmagos processzorok főbb jellemzői
Desktops
Heterogenous multicores
Homogenous multicores
Multicore processors
Manycore processors
Servers
with >8 cores
ConventionalMC processors
Master/slavearchitectures
Add-onarchitectures
MPC
CPU GPU
2 ≤ n ≤ 8 cores
General purpose computing
Prototypes/ experimental systems
MM/3D/HPCproduction stage
HPCnear future
3.2 Heterogén csatolt többmagos processzorok (1)
kernel0<<<>>>()
kernel1<<<>>>()
Host Device
Csatolt elvű végrehajtás elve GPGPU-k esetén (legegyszerűbb szervezést feltételezve)
3.2 Heterogén csatolt többmagos processzorok (1)
(Adatpárh. progr.)
CUDA
• Heterogén csatolt többmagos processzorok feldolgozás gyorsítók (accelerators)
• A működési elv szempontjából előzmény: heterogén csatolt többprocesszoros rendszerek
Példák: korai személyi számítógépek lebegőpontos társprocesszorokkal Intel 286 + 287 386 + 387
Az Intel 486-nak már volt saját “on-chip” lebegőpontos egysége (FPU) (az SX és SL modelek kivételével)
Megjegyzés a működési elvhez
3.2 Heterogén csatolt többmagos processzorok (2)
Heterogén csatolt többmagos processzorok legfontosabb implementációi
Heterogén csatolt többmagos processzorok
Integrált grafika Okostelefonok
3.2 Heterogén csatolt többmagos processzorok (3)
3.2.1 Integrált grafika•
Integrált grafika (1)
Áttérés angol nyelvű slide-ok használatára
Implementation of integrated graphics
Implementations about1999 - 2009
In the north bridge On the processor dieIn a multi-chip processor package on a separate die
Both the CPU and the GPUare on separate dies
and are mounted into a single package
P
South Bridge
Mem.NBIG
South Bridge
Mem.NB
PGPU CPU
Periph. Contr.
Mem.CPUGPUP
Intel’s Havendale (DT) andAuburndale (M)
(scheduled for 1H/2009 but cancelled)
Arrandale (DT, 1/2010) andClarkdale (M, 1/2010)
Implementation of integrated graphics
Intel’s Sandy Bridge (1/2011) andIvy Bridge (4/2012)
AMD’s Swift (scheduled for 2009)AMD’s Bobcat-based APUs (M, 1/2011)
Llano APUs (DT, 6/2011)Trinity APUs (DT, Q4/2012)
Integrált grafika (2)
Implementation of integrated graphics
Implementations about1999 - 2009
In the north bridge On the processor dieIn a multi-chip processor package on a separate die
Both the CPU and the GPUare on separate dies
and are mounted into a single package
P
South Bridge
Mem.NBIG
South Bridge
Mem.NB
PGPU CPU
Periph. Contr.
Mem.CPUGPUP
Intel’s Havendale (DT) andAuburndale (M)
(scheduled for 1H/2009 but cancelled)
Arrandale (DT, 1/2010) andClarkdale (M, 1/2010)
Implementation of integrated graphics
Intel’s Sandy Bridge (1/2011) andIvy Bridge (4/2012)
AMD’s Swift (scheduled for 2009)AMD’s Bobcat-based APUs (M, 1/2011)
Llano APUs (DT, 6/2011)Trinity APUs (DT, Q4/2012)
Integrált grafika (2)
Example 1: Intel’s Havendale (DT) and Auburndale (M) multi-chip CPU/GPU processor plans (scheduled for 1H/2009 but cancelled about 1/2009) []• Revealed in 9/2007.• Both parts were based on the 2. gen. Nehalem (Lynnfield) architecture (45 nm), as shown below.
RS – Intel 2009 Desktop Platform Overview Sept. 2007
Same LGA 1160 platform
Schedule: • 2H ’08 First Samples• 1H ’09 Production• TDP < 95 W
DMI
DDR3
Graphics
DD
R3 I
MC
PC
I-E
Power Thread Thread
Thread Thread
8M
Co
reC
ore
Co
reC
ore
Thread Thread
Thread Thread
Co
reC
ore
Co
reC
ore
Ibexpeak PCH
PCIe, SATA,NVRAM, etc.
Display Analog
Digital
I/O Control Processors
I/O functions
Lynnfield processor(Monolithic die)
DisplayLink
DMI
DDR3
Graphics
MCP Processor
Power
4M
PCI-E
DDR3 IMC
GPU
Thread Thread
Thread Thread
Co
reC
ore
Co
reC
ore
SDVO, HDMIDisplay Port, DVI
Ibexpeak PCH
VGA
PCIe, SATA,NVRAM, etc.
Display Analog
Digital
I/O Control Processors
I/O functions
No integrated graphics
Havendale processor(Multi-chip package – MCP)
http://pic.xfastest.com/z/INTEL%202009%20%20Overview/2009Overview.ppt
Integrált grafika (3)
Example 2: Intel’s Westmere-EP based multi-chip CPU/GPU processors (2010)-1 []
IDF 2009
Integrált grafika (4)
Clarkdale (desktop)Arrandale (mobile)
Positioning of Clarkdale (DT) and Arrandale (M) in Intel’s roadmap []
Integrált grafika (5)
Single PCH for Intel’s Westmere-EP based multi-chip CPU/GPU processors (2010) []
Integrált grafika (6)
PCH(Peripheral Control Hub)
Removing integrated graphics (IGFX) from the north bridge to the processor []
(Dedicated graphicsvia graphics card)
Integrált grafika (7)
Implementation of integrated graphics
Implementations around1999 - 2009
In the north bridge On the processor die
Intel’s Sandy Bridge (1/2011) andIvy Bridge (4/2012)
AMD’s Swift (scheduled for 2009)AMD’s Bobcat-based APUs (M, 1/2011) and
Llano APUs (DT, 6/2011)Trinity APUs (DT, Q4/2012)
In a multi-chip processor package on a separate die
Both the CPU and the GPUare on separate dies
and are mounted into a single package
P
South Bridge
Mem.NBIG
South Bridge
Mem.NB
PGPU CPU
Periph. Contr.
Mem.CPUGPUP
Implementation of commercial graphics on the processor die
Intel’s Havendale (DT) andAuburndale (M)
(scheduled for 1H/2009 but cancelled)
Arrandale (DT, 1/2010) andClarkdale (M, 1/2010)
Integrált grafika (8)
Key microarchitecture features of the Sandy Bridge vs the Nehalem
Example 1: Intel’s Sandy Bridge with 6 Series PCH-1 []
Integrált grafika (9)
[]: Kahn O., Piazza T., Valentine B.: Technology Insight: Intel Next Generation Microarchitecture Codename Sandy Bridge, IDF 2010 extreme.pcgameshardware.de/.../281270d1288260884-bonusmaterial-pc- games- hardware-12-2010-sf10_spcs001_100.pdf
32K L1D (3 clk)AVX 256 bit4 Operands
256 KB L2(9 clk)
HyperthreadingAES Instr.
VMX Unrestrict.20 nm2 / Core
256 KB L2(9 clk)
256 KB L2(9 clk)
256 KB L2(9 clk)
256 KB L2(9 clk)
256 KB L2(9 clk)
256 KB L2(9 clk)
PCIe 2.0
@ 1.0 1.4 GHz(to L3 connected)
256 b/cycle Ring Architecture
(25 clk)
DDR3-1600
Die plot of the 4C Sandy Bridge processor []
Sandy Bridge 4C
32 nm995 mtrs/216 mm2
¼ MB L2/C8 MB L3
[]: Intel Sandy Bridge Review, Bit-tech, Jan. 3 2011, http://www.bit-tech.net/hardware/cpus/2011/01/03/intel-sandy-bridge-review/1
Integrált grafika (10)
Sandy Bridge desktop datasheet
Core i3-21xx, 2C, 2/2011 Core i5-23xx/24xx/25xx, 4C, 1/2011Core i7-26xx, 4C, 1/2011
Intel 6 series PCH1
1Except P67 that does not provide a display controller in the PCH
Block diagram of Intel’s Sandy Bridge with 6 Series PCH-2 []
1
Integrált grafika (11)
Key microarchitecture features of the Ivy Bridge vs the Sandy Bridge
Example 2: Intel’s Ivy Bridge with 6 Series PCH-1 []
Integrált grafika (12)
http://www.itproportal.com/2012/04/24/picture-ivy-bridge-vs-sandy-bridge-gpu-die-sizes-compared/
Ivy Bridge-DT
Sandy Bridge-DT
22 nm 1480 mtrs160 mm2
32 nm995 mtrs216 mm2
Contrasting the die plots of Ivy Bridge vs. Sandy Bridge (at the same feature size)-1 []
Integrált grafika (13)
Note
In the Ivy Bridge Intel devoted much more emphasis to graphics processing than in the Sandy Bridge to compete with AMD’s graphics superiority.
Contrasting the die plots of Ivy Bridge vs Sandy Bridge (at the same feature size)-2 []
Integrált grafika (14)
Example 3: AMD’s “Swift” Fusion APU plan (2009)
Preliminaries
In 10/2006 AMD acquired the graphics firm ATI and at the same day they announced that
“AMD plans to create a new class of x86 processors that integrate the central processing unit (CPU) and graphics processing unit (GPU) at the silicon level, codenamed “Fusion [].”
AMD Completes ATI Acquisition and Creates Processing Powerhouse
SUNNYVALE, CALIF. -- October 25, 2006 --AMD
Remark
Although in the above statement AMD designated the silicon level integration of the CPU and GPU as the Fusion initiative, in some other publications they call both the package level and the silicon level integration of the CPU and GPU as the Fusion technology, as shown in the next figure [b]
Integrált grafika (15)
Extended interpretation of the term Fusion technology in some AMD publications []
Despite this disambiguation, subsequently AMD understood the term Fusion usually as the silicon level integration of the CPU and the GPU.
AMD Torrenza and Fusion together, 22 March 2007
Integrált grafika (16)
• In 12/2007 at their Financial Analyst Day AMD gave birth to a new term by designating their processors implementing the Fusion concept as APUs (Accelerated Processing Units).
• At the same time AMD announced their first APU family called the Swift family [] as well.
Integrált grafika (17)
• In 11/2008 again at their Financial Analyst Day AMD postponed the introduction of Fusion-based APU processors until the company transitions to the 32 nm technology []..
AMD Fusion now pushed back to 2011By Joel Hruska | Published: November 14, 2008-
Integrált grafika (18)
This is a similar move as done by Intel with their 45 nm Havendale (DT) and Auburndale (M) in-package integrated multi-chip CPU+GPU projects.
As leaked from industry sources in 1/2009 Intel canceled their 45 nm multi-chip processor plans in favor of 32-nm multi-chip processors to be introduced in Q1/2010 [].
Remark
Intel cans 45nm “Auburndale” and “Havendale” Fusion CPUs!
Posted by: theovalich | January 31, 2009
Integrált grafika (19)
Example 4: AMD’s Piledriver-based Trinity desktop APU line (2012)
Announced in 6/2012Introduced 9/2012
The Trinity APU is based on the Piledriver Compute Module, which is a redesign of the ill fated Bulldozer Compute Module.
Integrált grafika (20)
http://www.pcper.com/reviews/Editorial/AMD-Vishera-and-Beyond-New-Design-Philosophy-Dictates-Faster-Pace/How-Does-Vishera
The Piledriver Compute Module of Trinity []
Integrált grafika (21)
http://techreport.com/articles.x/22932
The Trinity APU die with the Piledriver cores []
Integrált grafika (22)
Manufacturing
ProcessDie Size
Transistor Count
AMD Llano 32nm 228mm2 1.178B
AMD Trinity 32nm 246mm2 1.303B
Intel Sandy Bridge (4C)
32nm 216mm2 1.16B
Intel Ivy Bridge (4C)
22nm 160mm2 1.4B
Integrált grafika (23)
Die features
http://www.anandtech.com/show/6332/amd-trinity-a10-5800k-a8-5600k-review-part-1
http://technewspedia.com/meet-the-new-amd-apus-series-a-2-nd-generation-trinity/
The Comal platform that incorporates the (Piledriver-based) Trinity APU and the A70M PCH []
Integrált grafika (24)
3.2.2 Okostelefonok•
3.3.2 Okostelefonok (1)
3.2.2 Smart phone platforms
Example: Texas OMAP 5 (OMAP 5430)
3.3.2 Okostelefonok (2)
4. Kitekintés
4. Kitekintés (4)
Kitekintés
Heterogenous multicores
Master/slavearchitectures
Add-onarchitectures
4.3 ábra: Hetererogén többmagos processzorok várható fejlődése
Több CPUTöbb gyorsító
Köszönöm a figyelmet!