Top Banner
Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had
43

Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Dec 14, 2015

Download

Documents

Lexus Fewell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Joseph B. ManzanoSpring 2009

Features that you (most probably) didn’t know your Microprocessor

had

Page 2: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

OutlineThe Powerful and the FallenThe MutualistsThe Just PassingThe Olympic SprintersThe Threads’ CommuneBreaking the Despotic Rule of the Lock

Page 3: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The Powerful and The Fallen

Common Name

Issue Structure

Hazard Detection

Scheduling Distinguishing characteristics

Examples

Superscalar (static)

Dynamic Hardware Static In order execution Sun UltraSPARC II and III

Superscalar (dynamic)

Dynamic hardware Dynamic Some out of order execution

IBM Power2

Superscalar (speculative)

Dynamic Hardware Dynamic With speculation

Speculative out of order execution

Pentium 3 and 4

VLIW / LIW Static Software Static No hazards between issues packets

Trimedia, i860

EPIC Mostly Static Mostly Software

Mostly Static Explicit Dependences marked by compiler

Itanium

Multiple Issue Architectures: Increase your IPC / Take advantages of ILP

Register RenamingTomasulo Algorithm Reorder Buffer Scoreboarding

Page 4: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The Powerful and The Fallen

Register Renaming

Tomasulo AlgorithmReorder Buffer

ScoreboardingBased on the CDC 6000 ArchitectureImportant Feature: Scoreboard

Issue: WAW, Decode: RAW, execute and write results: WAR

Implemented in the IBM360/91’s floating point unit.Important Feature: Reservation Station and CDB

Issue: tag if not available, copy if they are; Execute: stall RAW monitoring the CDB Write results: Send results to the CDB and dump the store buffer contents; Exception Handling: No insts can be issued until a branch can be resolved

Page 5: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The Powerful and The Fallen

Power5Dual Core Two way SMT IBM PowerPC SuperScalar Architecture.

Picture Courtesy of IBM from “Power5 Microarchitecture”

Page 6: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The Powerful and The Fallen

Intel Xeon Out of Order Engine Pipeline Picture Courtesy of Intel from “Hyper-Threading Technology Architecture and Microarchitecture”

Page 7: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

OutlineThe Powerful and the FallenThe MutualistsThe Just PassingThe Olympic SprintersThe Threads’ CommuneBreaking the Despotic Rule of the Lock

Page 8: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The MutualistsVector Processing

Super Computer of the pastSIMD type of designElements of the data stream are worked by a

single type of instructionSimplifies hardware designMoving toward more “general” purpose

vector processing

Page 9: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The MutualistsThe Cell Broadband EngineCreated by STI Composed of nine computing elements

•The brain of the system•Organizer •Runs Linux•PowerPC dual issue arch

•A modified Vector Arch•Limited memory: 256 KiB•All accesses are to and from this local memory•Main Memory Accesses DMA transfers

BEI

Flex IO

Memory Interface

SPE

PPSS

SPEPPE MFCMFC

•Each SPE has a MFC unit•Issue and receive DMA to and from main memory•Gate Keeper of the bus

•Four rings•Has QoS in a limited fashion (RAM)

Maintain coherency and consistency between all memory units (the MFC, main memory and PPE caches, but not across the local memory of SPEs)

Page 10: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

OutlineThe Powerful and the FallenThe MutualistsThe Just PassingThe Olympic SprintersThe Threads’ CommuneBreaking the Despotic Rule of the Lock

Page 11: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The Just PassingCache “Invisible” architecture

componentNot so much in the last years

PowerPC and other architecture provides instructions to control

dcbf[e], dcbst[e], dcbz[e], icbi[e], isyncInstruction available to touch, to zeroed

out, to reserve, or to lock a line in place.But for some interesting designs look no

further than …

Page 12: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The Just PassingXBOX 360 Xenon Architectures

Picture Courtesy of IBM from ”XBOX 360 System Microarchitecture”

Page 13: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

OutlineThe Powerful and the FallenThe MutualistsThe Just PassingThe Olympic SprintersThe Threads’ CommuneBreaking the Despotic Rule of the Lock

Page 14: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The Olympic SprintersThe Hertz race is over; however …

Some processors are still at it …Power 6 and 7 running at 4 and 5 GHzIntel Polaris: 3.6 to 6 GHz

Many hardware re-designs are in orderMake pipelines shorter, simplerGet rid of “extra” hardware features

Page 15: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The Olympic Sprinters

13 FO4 versus 23 FO4 pipeline

Power6

Running at frequencies from 4 to 5 GHz

Pictures Courtesy of Intel from “IBM Power6 Microarchitecture”

Page 16: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

OutlineThe Powerful and the FallenThe MutualistsThe Just PassingThe Olympic SprintersThe Threads’ CommuneBreaking the Despotic Rule of the Lock

Page 17: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The Threads’ CommuneLarge shared memory systems are

becoming scarceScalability issues due to synchronizationContentionCoherency and Consistency

Novel Solutions have emergedExplicit memory hierarchies with very weak

memory modelsMassive Multithreading on chipSynchronization in memory

Page 18: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The Threads’ CommuneCray XMT

128 Hardware streamsA stream is 31 64-bit registers, 8 target registers,

and a control registerThree functional units: M, A and C500 MHzFull and Empty bits per word (2-bits)

An example of a very high SMT design

Page 19: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The Threads’ CommuneSMT / HT designs

Time

Issue Slot

Super Scalar Coarse MT Fine MT SMT

http://www.intel.com/technology/computing/dual-core/demo/popup/demo.htm

Page 20: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The Threads’ Commune

i = n

i = 3

i = 2

i = 1

. . .

1 2 3 4

Sub- problem

A

i = n

i = 1

i = 0

. . .

Sub- problem

BSubproblem A

Serial Code

Unused streams

. . . .

Programs running in parallel

Concurrent threads of computation

Hardware streams (128)

Instruction Ready Pool;

Pipeline of executing instructions

Cray MTA2 picture from Jonh Feo’s “Can programmers and Machines ever be friends”

Page 21: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The Threads’ CommuneData Race or Race Condition

“There is an anomaly of concurrent accesses by two or more threads to a shared memory and at least one of the accesses is a write”

The orchestration of two or more threads (or processes) to complete a task in a correct manner and to avoid any data races

ProblemsSeparation of lock and guarded data

Page 22: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The Threads’ CommuneCoherency and Consistency

Caching elements and make sure that everyone sees the last copy

If an element is written by processor A then how processor B and C will know that they have the latest copy?

Very difficult problem!One of the scalability problems of Shared

memory

Page 23: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

The Threads’ CommuneHow Cray XMT solves these problems?

For Synchronization: Join the lock with each data word and put the synchronization requirement on the memory instead that the processor

For coherence and consistency: DO NOT cache remote data (outside the local 8 GiB)

Page 24: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

OutlineThe Powerful and the FallenThe MutualistsThe Just PassingThe Olympic SprintersThe Threads’ CommuneBreaking the Despotic Rule of the Lock

Page 25: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Breaking the Despotic Rule of the LockSynchronization

Atomicity and SeriabilityLocks and BarriersAround hundreds to ten thousands of cycles and

grows linearly (in the best cases) or polynomial (in the worst cases) with the number of processors

The lockThe most used synch primitive!Alternatives: Lock-free data structures

Page 26: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Breaking the Despotic Rule of the LockLock Free Data Structures

Used to implement non blocking or / and wait free algorithms

Prevents deadlocks, livelocks and priority inversions

Potential problems: ABA problemIt tells us no-one is working on this now, but not if

someone has done it before

Transactional MemoryBased on transactions (an atomic bundle

operations)If two transactions conflict then one is bound to

fail

Page 27: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Side NoteA Review of LL and SC

27

PowerPC and many other architecture instructions

Provide a way to optimistically execute a piece of code

In case that a “violation” has taken place, discard your results

Many implementationsPowerPC: lwarx and stwcx

Page 28: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Side NoteThe LL and SC behavior

28

The lwarx instructionLoads a word

aligned locationSide Effects:

A reservation is created

Storage coherence mechanism is notified that a reservation exists

The stwcx instructionConditionally Store

a location to a given memory location.Conditionally

Depends on the reservation

If success, all changes will be committed to memory

If not, changes will be discarded.

Page 29: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Side NoteReservations

29

At most one per processorA reservation is lost when

Processor holding the reservation executes A lwarx or ldarx A stwcx or stdcx (No matter if the reservation matches or

not)Other processors executes

A store or a dcbz to the granuleSome other mechanism modifies a storage location in

the same reservation granuleInterrupts does not clean reservations

But interrupt handlers mightGranularity

The length of the memory block to keep under surveillance

Page 30: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Side NoteExamples

30

LL a = ?

SC a

a

a *= 100;

brnz

Storage Mechanism

Memory

a = ?

Page 31: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Side NoteExamples

31

LL a = ?

SC a

a

a *= 100;

brnz

Storage Mechanism

LL a = ?

SC a

a += 100;

brnz

a

Memory

a = ?

Page 32: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Side NoteExamples

32

LL a = ?

SC a

X

a *= 100;

brnz

Storage Mechanism

LL a = ?

SC a

a += 100;

brnz

X

a = 100;

Memory

a = 100

Page 33: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Side NoteExamples

33

LL a = ?

SC a

X

a *= 100;

brnz

Storage Mechanism

LL a = ?

SC a

a += 100;

brnz

X

Memory

a = 100

Page 34: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Side NoteExamples

LL a = ?

SC a

X

a *= 100;

brnz

Storage Mechanism

LL a = 100

SC a

a += 100;

brnz

a

Memory

a = 100

Page 35: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Side NoteExamples

LL a = 100

SC a

a

a *= 100;

brnz

Storage Mechanism

LL a = 100

SC a

a += 100;

brnz

a

Memory

a = 100

Page 36: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Side NoteExamples

LL a = 100

SC a

X

a *= 100;

brnz

Storage Mechanism

LL a = 100

SC a

a += 100;

brnz

a

Memory

a = 200

Page 37: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Side NoteExamples

37

LL a = 100

SC a

X

a *= 100;

brnz

Storage Mechanism

Memory

a = 200

Page 38: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Side NoteExamples

38

LL a = 200

SC a

a

a *= 100;

brnz

Storage Mechanism

Memory

a = 200

Page 39: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Side NoteExamples

LL a = 200

SC a

a

a *= 100;

brnz

Storage Mechanism

Memory

a =20000

Page 40: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Breaking the Despotic Rule of the Lock

Sun Rock ProcessorExecute AheadScouting ThreadsSimultaneous MultithreadingTransactional MemoryCheckpointCache memory with extra bits

for tracking speculative execution

32 logical threads and 16 physical cores

Pictures courtesy of “Rock: A SPARC CMT Processor”

Page 41: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Breaking the Despotic Rule of the LockTake a “RISC”-y Approach

Small transaction HWBest effort

Use the checkpoint mechanism!Transactions == Software construct

Checkpoint in case of failureCommit on successful transactionExecuted speculative by a strandUse the cache store buffers and locks cache lines

until commit ( tracking lines with the “s-bits” )

Page 42: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

CBEPowerPC9 Core chip

Power564 bit PowerPC 2

Core with SMT

Codename: Rock16 Core Processor, 32 Logical

Threads

UltraSparc T2Codename: Niagara

8 Core Processor, 64 Logical Threads

UltraSparc T1Codename: Niagara

8 Core Processor, 32 Logical Threads

AMD Turion64 X2IA32 x86 Dual Core Chip

AMD OpteronCode Name:

DenmarkIA32 x86 2 Core Chip

AMD Code Name: BarcelonaIA32 x86 Native 4 Core

Chip

Codename: Sandy Bridge

Intel Core 2Codename: Penryn,

WolfdaleIA32 x86 Dual & Quad Core

Chip

Intel Core 2 DuoIA32 x86 2 Core

Chip

Intel Core DuoIA32 x86 Dual

Core Chip

Xeon Dual CoreIA32 x86 2 Core

Chip

Pentium DIA32 x86 2 Core

Chip

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

Power 464 bit PowerPC

2 Core

Power 664 bit PowerPC

2 Core with SMT

Xenon64 bit

PowerPC 3 Core chip

Power7

Codename: Nehalem1 to 8 Core

Chip

IBM

Intel

AMD

SUN

Multi-core Trends in this Decade

Page 43: Joseph B. Manzano Spring 2009 Features that you (most probably) didn’t know your Microprocessor had.

Sources The Powerful and the Fallen

Sinharoy, B et al, “Power5 System Microarchitecture”, IBM Journal of Research and Development, Vol 49, June/September 2005

Marr, D et al, “Hyper-Threading Technology Architecture and Microarchitecture” Intel Technology Journal, Vol 6, Issue 1, 2002

The Mutualists The Just Passing

Andrews, Jeff and Baker, Nick “XBOX 360 System Architecture”, IEEE Micro, Volume 26, Issue 2 March 2006

The Olympic Sprinters Le, H.Q. et al, “Power6 System Microarchitecture,” IBM Journal

of Research and Development, Vol 61, November 2007 The Threads’ Commune

Konecny, P, “Introducing the Cray XMT,” May 5th, 2007 Feo, J ,“Can programmers and machines can ever be friends?”

Breaking the Despotic Rule of the Lock Chaundhry, S, “Rock: A SPARC CMT Processor”, August 26, 2008