8. Microarchitecture of Superscalars (6) Register renaming Dezső Sima Fall 2006 D. Sima, 2006
Jan 12, 2016
8. Microarchitecture of Superscalars (6)Register renaming
Dezső Sima
Fall 2006
D. Sima, 2006
Overview
1 The Principle of register renaming•
2 Design Space•
2.1 Overview•
2.2 Types of rename buffers•
5 Implementation of renaming in superscalars•
5.1 The chronology of introducing register renaming•
5.2 Basic implementation schemes of register renaming•
3 Operation of register renaming•
4 Design parameters of register renaming•
6 Examples•
1. Principle of register renaming (1)
Aim: • Eliminating false data dependencies to relieve the issue bottleneck
WAW
False data dependencies
WAR
I1: mul r1, r2, r3I2: add r2, r4, r5Examples:
Write After Read(Anti dependency)
Write After Write:(Output dependency)
I1: mul r1, r2, r3I2: add r1, r4, r5
RBResults Retirement
Ops.
EU EU
AR
Source register numbers
1. Principle of register renaming (2)
Figure 1.1: The principle of register renaming
Basic principle to eliminate false data dependencies:
Then
- referenced source operands need to be fetched from the RB file, if they are actually renaned, else from the AR file,
- during dispatching a new rename buffer need to be allocated to each instruction whose destination register causes
false data depenency1,
- during retirement buffered results need to be transferred from the RB file to the AR file.
1 Usually, processors allocate to each dispatched instruction a rename buffer without checking for the existence of false data dependecies to reduce logic complexity.
False data dependencies are eliminated by writing generated results
temporarily to buffers, called the rename buffers (RB) instead of the referenced architectural registers (AR).
Layout of the rename buffers
Scope of register renaming
Rename rate
Register renaming
Layout of the register mapping
2. Design space of register renaming
2.1 Overview
Type of rename buffers
Types of rename buffers
Res.
2.2 Types of rename buffers
ARRR
Rename reg. file
Ops.
Reg. nrs.
Ret.
Rename reg. file
Allocated,valid
Available Allocated,not valid
Initialized
if instruction iscanceled
Reclaim,
Allocate, if instructionis dispatched
is retiredReclaim, if instruction
is finishedUpdate, if instruction
Res.ARRR
Ops.
Reg. nrs.
Ret.
AR FF
Types of rename buffers
Future file
Ops.
Reg. nrs.
Res. Res. Ret.
2.2 Types of rename buffers
PowerPC 603 (1993)PowerPC 604 (1995)PowerPC 620 (1996)
Power3 (1998)PA 8000 (1996)PA 8200 (1997)PA 8500 (1999)
ARRR
Rename reg. file
Ops.
Reg. nrs.
Ret.
AR FF
Future file
Ops.
Reg. nrs.
Res. Ret.
Valid
Notvalid
Initialized
Update if instruction is finished
Invalidate by referring to the same register as destination
The FF has as many entries as the ARand holds the most actual register values
AR, RRAR FF
Merged arch. and rename register file
Types of rename buffers
Future file
Ops.Ops.
Reg. nrs.
Res.
Reg. nrs.
Res. Res. Ret.
UltraSPARC III (1999)K7 (FX) (1999)K8 (FX) (2003)
2.2 Types of rename buffers
PowerPC 603 (1993)PowerPC 604 (1995)PowerPC 620 (1996)
Power3 (1998)PA 8000 (1996)PA 8200 (1997)PA 8500 (1999)
ARRR
Rename reg. file
Ops.
Reg. nrs.
Ret.
AR, RR
Merged arch. and rename register file
Ops.
Reg. nrs.
Res.
Instruction iscanceled
Availablenot valid
Instructionis completed
Initialized
RB,
AR RB,valid
Architectural registeris reclaimed
if this architectural register becomes renamed anew.
Entry is allocatedto a dispatched instruction
Instruction is finished
It needs a large number of physical registers.
During completion no physical transfer is neededfrom the rename buffer to the referenced architetural register
instead the former rename buffer changes its state and becomes
the referenced architectural register.
AR, RR
Power1 (1990)Power2 (1993)R10000 (1996)R12000 (1999)
Alpha 21264 (1998)Pentium 4 (FP) (2000)
K7 (FP) (1999)K8 (FP) (2003)
AR FF ROB AR
Merged arch. and rename register file
Holding renamed values in the ROB
Types of rename buffers
Future file
Ops.Ops.
Reg. nrs.
Ops.
Res.
Reg. nrs.
Reg. nrs.
Res. Res. Res.Ret. Ret.
UltraSPARC III (1999)K7 (FX) (1999)K8 (FX) (2003)
2.2 Types of rename buffers
PowerPC 603 (1993)PowerPC 604 (1995)PowerPC 620 (1996)
Power3 (1998)PA 8000 (1996)PA 8200 (1997)PA 8500 (1999)
ARRR
Rename reg. file
Ops.
Reg. nrs.
Ret.
Allocated,valid
Available Allocated,not valid
Initialized
if instruction iscanceled
Reclaim,
Allocate, if instructionis dispatched
is retiredReclaim, if instruction
is finishedUpdate, if instruction
Res.
Holding renamed values in the ROB
ROB AR
Ops.
Reg. nrs.
Ret.
ROB entries are extended to hold results as well.
During dispatching a new ROB entry with its result field
is allocated to each dispatched instruction.(The result field serves as the allocated rename buffer).
AR, RR
Power1 (1990)Power2 (1993)R10000 (1996)R12000 (1999)
Alpha 21264 (1998)Pentium 4 (FP) (2000)
K7 (FP) (1999)K8 (FP) (2003)
K5 (1995)K6 (1997)
Pentium Pro (1995)Pentium II (1997)Pentium III (1999)
Pentium 4 (FX) (2000)Pentium M (2003)
Core (2006)
AR FF ROB AR
Merged arch. and rename register file
Holding renamed values in the ROB
Types of rename buffers
Future file
Ops.Ops.
Reg. nrs.
Ops.
Res.
Reg. nrs.
Reg. nrs.
Res. Res. Res.Ret. Ret.
UltraSPARC III (1999)K7 (FX) (1999)K8 (FX) (2003)
2.2 Types of rename buffers
PowerPC 603 (1993)PowerPC 604 (1995)PowerPC 620 (1996)
Power3 (1998)PA 8000 (1996)PA 8200 (1997)PA 8500 (1999)
ARRR
Rename reg. file
Ops.
Reg. nrs.
Ret.
3. Operation of register renaming (1)
The actual rename process depends on both the rename technique implemented and the underlying microarchitecture.
Rename technique: using rename registers and mapping tables
Assumptions:
Rename registers:
Provide buffer space to temporarily hold instruction results
Rename registerfile (RR)
V
During dispatching the Valid bit of the allocated rename register becomes invalidated (v 0)
When the instruction becomes finished the result of the instruction is transferred to the allocated rename buffer entry and
the Valid bit is set (V 1), to indicate that the corresponding value is available.
3. Operation of register renaming (1)
The actual rename process depends on both the rename technique implemented and the underlying microarchitecture.
Rename technique: using rename registers and mapping tables
Assumptions:
A new entry is created while an instruction is dispatched
• by setting the „Entry valid” bit and
• writing the index of the allocated rename buffer („RB index”) to the entry that corresponds to the destination register of the dispatched instruction.A valid mapping is updated by writing a new „RB index” into it when the architectural register
belonging to that entry is renamed again.
An entry is invalidated when the instruction that actually belongs to that entry is retired.
In this way the mapping table continuously holds the latest allocations.
Mapping table:
It includes an entry to each architectural register.
Each entry has an „Entry valid” bit that indicates whether or not the corresponding architectural register is renamed and
in case of a renaming it holds the index of the associated rename buffer
(RB index).
Entryvalid
RBindex
Mappingtable
Look-upfor r7
6
7
8
0
1
1
12
14
"12"(RB index=12)
0
n-1
3. Operation of register renaming (1)
The actual rename process depends on both the rename technique implemented and the underlying microarchitecture.
Rename technique: using rename registers and mapping tables
Underlying microarchitechture:• in order dispatching• dynamic instruction issue• split FX and FP register files• operand fetch policy
• both alteratives are discussed
Assumptions:
3. Operation of register renaming (2)
Considered part of the microarchitecture for both dispatch bound and issue bound operand fetching :
• it executes only FX-instructions,• consists of an architectural register file (AR) and
a single execution unit (EU).
Mappingtable
Architectural registerfile (AR)
Rs1'
Rs2'Updatearch. rf.
Op1
Op2
Rd'
OC
Rd, Rs1, Rs2
Decoded instructions
Update RR
Update RS
Result, Rd'
OC, Rd', Op1, Op2
Rename registerfile (RR)
OC Rd' Op1/Rs1' V1 Op2/Rs2' V2
EU
Check valid bits
Rs1, Rs2
V
Bypassing
Op1/Rs1'
Op2/Rs2'
Dispatch
IssueReservation station
(RS)
3. Operation of register renaming (3)
Figure 3.1: An FX-core assuming buffered issue and dispatch bound operand fetching
Renamingdestination andsurce registers
Fetching op.s if validelse tags
When inst. retiredupdating the AR
After instr. executed,updating RS, RR
Issuing instr.when op.s ready
Mappingtable
Rename registerfile (RR)
Architectural registerfile (AR)
EU
Result, Rd'
Update RR
Rs1', Rs2'
Checking for availabilityof (Rs1'), (Rs2')
Op1
Op2
OC, Rd'
Decoded instructions
OC Rd, Rs1, Rs2
OC Rd’ Rs1' Rs2'
V
Rd' Rs2'
Rs1'
Reservationstation (RS)
Bypassing
Dispatch
Issue
3. Operation of register renaming (4)
Figure 3.2: An FX-core assuming buffered issue and issue bound operand fetching
Renaming destination and source registers
Dispatching instructionsinto the RS
Issuing inst. when operands valid,fetching op.s
Executing instr.updating RR
when instr. finished
Updating ARwhen inst. retires
Processor type/year of volume shipment
Type of renamebuffer
Number of rename buffers
Dispatch rate
Width ofthe issuewindow
Total number of rename buffers
Reorder width
FX FP (wdw) (nr) (nROB)
RISC processors
PowerPC 603 (1993) ren. reg. file na. 4 3 3 na. 5
PowerPC 604 (1995) ren. reg. file 12 8 4 12 20 16
PowerPC 620 (1996) ren. reg. file 8 8 4 15 16 16
POWER3 (1998) ren. reg. file 16 24 4 23 40 32
POWER4 (2001) merged 80 72 5 78 152 20*5
POWER5 (2004) merged 120 120 5 82 240 20*5
R10000 (1996) merged 32 32 4 48 64 32
R12000 (1998) merged 32 32 4 48 64 48
Alpha 21264 (1998) merged 48 41 4 35 89 80
PA 8000 (1986) ren. reg. file 56 56 4 56 112 56
PA 8200 (1987) ren. reg. file 56 56 4 56 112 56
PA 8500 (1989) ren. reg. file 56 56 4 56 112 56
PM1 (1996) merged 38 24 4 36 62 62
4. Design parameters of register renaming (1)
Source: Sima, D. „Register Renaming Techniques”, Computer Engineering Handbook, CRC PRESS 2006
Processor type/year of volume shipment
Type of renamebuffer
Number of rename buffers
Dispatch rate
Width ofthe issuewindow
Total number of rename buffers
Reorder width
FX FP (wdw) (nr) (nROB)
CISC (x 86) processors
Pentium Pro (1995) in the ROB 40 32 20 40 40
Pentium II (1997) in the ROB 40 32 20 40 40
Pentium III (1999) in the ROB 40 32 20 40 40
Pentium 4 (2000) (Willamette) merged 128 32 n.a. 128 126
Pentium 4 (2002) Northwood merged 128 3 n.a. 256? 2*126?
Pentium 4 (2004) Prescott merged 256 3 n.a. 512? 4*128?
Pentium M (2003) in the ROB 40 3 24 40 40
Core (2006) in the ROB 96 4 32 96 96
K5 (1995) in the ROB 16 42 11(?) 16 16
K6 (1996) in the ROB 24 32 24 24 24
K7 (1999) in the ROB/
merged72n.a.
32 54 88 24*3
K8 (2003)in the ROB/
merged72 120 32 60 192 24*3
4. Design parameters of register renaming (2)
Source: Sima, D. „Register Renaming Techniques”, Computer Engineering Handbook, CRC PRESS 2006
5. Implementation of renaming in superscalars
5.1 The chronology of introducing register renaming
Figure 5.1: Chronology of introducing register renaming
MC 88000
Gmicro
M
SPARC
PowerPC
PA
R
Nx/K
80x86
POWER
ES
MC 68000
Motorola
CYRIX
Sun/Hal
MIPS
AMD
Intel
IBM
HP
TRON
Compaq
PowerPCAlliance
Alpha
RISC processors
IBM
Motorola
CISC processors
The Nx586 has scalar issue for CISC instructions but a 3-way superscalar core for converted RISC instructions. **
- Partial renaming
- Full renaming
PPC designates PowerPC.*
***The dispacth rate of the POWER2 and P2SC is 6 along the sequential path while only 4 immediately after a branch.
Gmicro/500 (2)
Alpha 21064 (2) Alpha 21164 (4)
SuperSPARC (3)
PA7100 (2)
Pentium (2)
MC 68060 (3)
R 8000 (4)
POWER1 (4)(RS/6000)
12
ES/9000 (2)28
POWER2(6/4)***13
PentiumPro (3)24
Alpha 21264(4)7
PA8000 (4)9
PM1 (4) (SPARC64)
23
K5 (4) 32Nx586 (1/3)31**
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
Pentium III (3)
PA8200(4)
UltraSPARC-2 (4)
K6 (3)
MII (2)
POWER3 (4)
PA 8500 (4)
R 12000 (4)
K7 (3)
UltraSPARC-3 (4)
MC88110 (2)
UltraSPARC (4)
PPC 601 (3)15* PPC 604 (4)
* 17
Pentium/MMX (2)
Pentium II (3)
PPC 620 (4)19*
PPC 603 (3)16*
R 10000 (4)21
PPC 602 (2)* 18
PA7200 (2)
M1 (2)29
14P2SC (6/4)
***
10 11
20
22
2526
30
33 34
8
Pentium 4 (3)27
2000
Source: Sima, D. „Register Renaming Techniques”, Computer Engineering Handbook, CRC PRESS 2006
5.2 The basic implementation schemes of register renaming
Merged arch. and rename register file
Holding renamed values in the ROB
Types of rename buffers
Future fileRename reg. file
Dispatchbound
Issuebound
Dispatchbound
Issuebound
Dispatchbound
Issuebound
Dispatchbound
Issuebound
Typ
es o
f re
n.bu
ffer
sO
p. f
et. p
oli.
Pro
posa
lsE
xam
ples
Keller (75)
Smith, Pleszkun, (85)
Sohi,Vajapeyam (87)Johnson (87)
PM1 (95)(SPARC 64)
ES/9000 (92)POWER1 (90)POWER2 (93)
Nx586 (94)R10000 (96)
P2SC (96)
R12000 (99)Pentium 4 (00)
POWER4 (01)POWER5 (04)
K7 (FP) (99)K8 (FP) (03)
PowerPC 603 (93)PowerPC 604 (95)PowerPC 620 (96)
POWER3 (98)PA 8000 (96)PA 8200 (97)PA 8500 (99)
Pentium Pro (95)Pentium II (97)Pentium III (99)Pentium M (03)
Core (06)
K7 (FX) (99)K8 (FX) (03)
Am29000 (95)K5 (95)
Lightning* (91)K6* (97)
UltraSPARC III (99)
6. Examples (1)
Rename register file
Source: Song, P. „IBM’s Power3 to Replace P2SC”, Microprocessor Report, Nov. 17, 1997
Figure 6.1: The microarchitecture of the POWER3
6. Examples (2)
Future file
Source: Horel, T. „UltraSPARC-III”, IEEE MICRO, May-June 99, pp. 73-95
WARF: Working and Architectural Register File (Future file)
Figure 6.2: The microarchitecture of the UltraSPARC-III
6. Examples (3)
Merged architectural and rename reg.
Figure 6.3: The microarchitecture of the Alpha 21264
Source: Kessler, R.E. et al. .„The Alpha 21264 Microprocessor Architecture”, h18002.www1.hp.com/alphaserver
6. Examples (4)
Holding renamed values in the ROB
Figure 6.4: The microarchitecture of the Core processor
Source: Kanter, D., „Intel’s next Generation Microarchitecture Unveiled”, Real World Tech., 2006 March 9.