EE141-Spring 2006 Digital Integrated Circuits
Post on 17-Nov-2021
3 Views
Preview:
Transcript
EE141
1
EE1411
EECS141
EE141EE141--Spring 2006Spring 2006Digital Integrated Digital Integrated CircuitsCircuits
Design of an Execution UnitDesign of an Execution Unit
Luke TsaiLuke TsaiAMDAMD
EE1412
EECS141
OutlineOutlineIntroductionWhat is the Execution Unit?High Level Design ConsiderationsCircuit Design of an Barrel Shifter“Real Life” Designs
EE141
2
EE1413
EECS141
IntroductionIntroduction
If you love EE141…Consider a career in Microprocessor DesignAll aspects and variety of circuit designMaximum complexityLeading Edge Technology
EE1414
EECS141
What is an What is an Execution Execution Unit (EX)?Unit (EX)?
EE141
3
EE1415
EECS141
A Classical Processor A Classical Processor Block DiagramBlock Diagram
Instruction Fetch (IF)
Decode (DE)
Scheduler (SC)
Execution Unit (EX)
Load-Store (LS)
Floating Point (FPU)
Memory(L2 Cache)
EE1416
EECS141
The EX Unit Implements the The EX Unit Implements the Integer Instruction SetInteger Instruction Set
Add* R1, R2Sub R1, R2Mult R1, R2Div R1, R2ROL R1, R2SAR R1, R2CLZ R1
Instruction Fetch (IF)
Decode (DE)
Scheduler (SC)
Execution Unit (EX)
Load-Store (LS)
Floating Point (FPU)
Memory(L2 Cache)
*X86 notation. The first register is both a source and the destination
EE141
4
EE1417
EECS141
Interface to the SCInterface to the SC
Instruction Fetch (IF)
Decode (DE)
Scheduler (SC)
Execution Unit (EX)
Load-Store (LS)
Floating Point (FPU)
Memory(L2 Cache)
The SC issues instructions to the EXOut-of-order SC needs to check for source dependency
Dependency
No Dependency,Can Issue in Parallel
Add R1, R2
Sub R3, R1
Mult R4, R2
.
EE1418
EECS141
Interface to the LSInterface to the LS
Instruction Fetch (IF)
Decode (DE)
Scheduler (SC)
Execution Unit (EX)
Load-Store (LS)
Floating Point (FPU)
Memory(L2 Cache)
For Load/Store Ops, EX generates address for the LS, which in turn sends/receives Data to/from EX.
Address generation to load data return is a classical critical path in processor design
Add R1, [R2]
Sub [R3], R1
Mult [R4], [R2]
Load
Store
Load-Op-Store
EE141
5
EE1419
EECS141
A Typical Block Diagram of EXA Typical Block Diagram of EX
Execution Unit
Multi-portedRegister File
ALU0
ALU
1..N
AG
en1.
.N
Add
er
Shift
er
Mul
t
Div
/CLZ
/Pop
cnt
Result Bus
Operand Bus
Byp
ass
EE14110
EECS141
High Level High Level Design Design ConsiderationsConsiderations
EE141
6
EE14111
EECS141
Meeting the Performance TargetMeeting the Performance TargetIPC: How each instr is executed
What EX unit and how many each to buildFrequency
What type of circuit stylePower
How much energy per operationArea
Silicon real estate is expensiveThe design point is based on trade-offs of the above criteria
EE14112
EECS141
MicroMicro--Architecture ConsiderationsArchitecture ConsiderationsPipelineInterface with the Scheduler
How to handle Out-of-order ExecutionInterface with the LS unit
How many cycle for Agen-Data loop?How to suppress speculative execution when load data is invalid?
EE141
7
EE14113
EECS141
Physical Design ConsiderationsPhysical Design ConsiderationsOperand Bypass
Bypass condition occurs when an operand of an instruction scheduled to be executed in cycle n is generated in the immediate preceding cycle (n-1).The data of this operand do not reside in the register file and need to be bypassed from one of the result buses.
Bypass ConditionAdd* R1, R2
Sub R3, R1
Mult R4, R2
* Actual execution sequence (not program order)
EE14114
EECS141
Physical Design ConsiderationsPhysical Design ConsiderationsFloorplan
Floorplan of an EX unit is very crucial piece of design decision. It impacts:
– Bus length (frequency, power)– Datapath pitch (frequency, power, area)– Bypass Scheme (area, power)
EE141
8
EE14115
EECS141
Circuit Design Circuit Design of an Barrel of an Barrel ShifterShifter
EE14116
EECS141
What is a Barrel Shifter?What is a Barrel Shifter?Performs a shift or rotate on the full/partial data
Example: 8 bit shifter
Input Bit PositionRot Left 1Rot Right 1
Logical Shift Left 2Arithmetic Shift Left 2Logical Shift Right 3
Arithmetic Shift Right 3
7 6 5 4 3 2 1 06 5 4 3 2 1 0 70 7 6 5 4 3 2 15 4 3 2 1 0 L L (= mult by 4)5 4 3 2 1 0 L L (Same as above)L L L 7 6 5 4 37 7 7 7 6 5 4 3L = Low (zero)
EE141
9
EE14117
EECS141
Barrel Shifter DesignBarrel Shifter DesignObserve: Any input bit could be passed to ALL output bit positions.
Therefore: the shifter is nothing but a giant NxN mux, where N is the width of data.The mux select is the one-hot decode of the shift amount.7 6 5 4 3 2 1 0
3 3 3 3 3 3 3 3
7 6 5 4 3 2 1 0
3 3 3 3 3 3 3 3
EE14118
EECS141
Barrel Shifter ImplementationsBarrel Shifter Implementations1. Single-stage NxN mux
Fewest gates between input and outputMost number of select signals (largest load for shift amount)
2. Multi-stage MuxMore stage = more gates between input and outputReduction in select signal is a diminishing return
– For 64 bit shifts:1 stage = 64 selects2 stages (8x8) = 16 selects (75% reduction)3 stages (4x4x4) = 12 selects (25% reduction)
3. Mux ImplementationLow swing passgateFull Swing Domino
EE141
10
EE14119
EECS141
Barrel Shifter ArrayBarrel Shifter ArrayInputs
Inputsturn 90o
Outputs
Selects
Connection
One-Stage Mux Two-Stage MuxInputs
Inter-mediate
OutputsConnection
Selects
EE14120
EECS141
Barrel Shifter Additional ComplexityBarrel Shifter Additional Complexity1. Partial Shifts/Rotates
X86 Instruction Set supports 8(L/H)/16/32/64 bit shifts
2. Shift differs from RotateShifts fills in zeros or the sign bit => How do you build a barrel shifter that does both shift and rotate?
3. Rotate could include the Carry bitX86 supports RCL/RCR (Rotate with Carry Left/Right) => A 64-bit RCL requires a 65-bit barrel shifter!
EE141
11
EE14121
EECS141
““Real LifeReal Life””DesignsDesigns
EE14122
EECS141
Robustness and ReliabilityRobustness and ReliabilityRobustness: Higher Yield=Higher Profit Margin
Circuit needs to function across PVT variationChip target yield of 70% could require EX yield of 99%What works in spice (w/o PVT) may not work in real life
ReliabilityIn addition to simulation for speed, real design also checks
– Noise– IR Drop– Electro-Migration– Inductive Effects– …
EE141
12
EE14123
EECS141
Process VariationProcess VariationMajor Culprits: Threshold, Channel Length, Channel Width
In 45nm, Vth ~ +- 150mV, ΔL ~ +- 15%, ΔW ~ +- 10% (for min devices). (Idsat/Idoff relationships to variation non-linear. Try it in spice.)Matching devices/paths: sense-amp, analog, memory cell stability, clock treeIncreases Leakage: 80% of chip leakage caused by 20% of devices: limits usage of dynamic circuitSlows down critical pathsWorse hold-time requirements
EE14124
EECS141
Voltage/Temperature VariationsVoltage/Temperature VariationsIntroduce more timing variationsIncrease NoiseWorsen cross chip matching (e.g. Clock tree)Degrade reliability 1.072 V
1.103 V
1.224 V
1.194 V
1.134 V
1.164 V
top related