www-inst.eecs.berkeley.edu/~cs152/. CS 152 Computer Architecture and Engineering. Lecture 19 -- Dynamic Scheduling II. 2014-4-3 John Lazzaro (not a prof - “John” is always OK). TA: Eric Love. Play:. Case studies of dynamic execution. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Issue stage close-up:(1) Newly issued instructions placed in top of queue.
(2) Instructions check scoreboard: are 2 sources ready?
(3) Arbiter selects 4 oldest “ready” instructions.(4) Update removes these 4 from queue.Output:
The 4 oldest
instructions whose 2 source registers are ready for use.
Input: 4
just-issued instructions, renamed to use physical
registers.
Scoreboard: Tracks writes to physical registers.
Execution close-up:(1) Two copies of register files, to reduce port
pressure.(2) Forwarding buses are low-latency paths through
CPU. Relies on speculations
Latencies, from issue to retirement.
8 retirements per cycle can be sustained over
short time periods.Peak rate is 11
retirements in a single cycle.
Retirement managed here.
Short latencies keep buffers to a reasonable size.
Execution unit close-up:(1) Two arbiters: one for top pipes, one for bottom
pipes.(2) Instructions statically assigned to top or bottom.
(3) Arbiter dynamically selects left or right.TopTop
Bottom
Thus, 2 dual-issue dynamic machines, not a 4-issue machine. Why? Simplifies arbiter. Performance penalty? A few %.
Memory stages close-up:
Input: Say something
Loads and stores from execution unit appear as
“Cluster 0/1 memory unit” in the diagram
below.
1st stop: TLB, to convert virtual memory
addresses.
3rd stop: Flush STQ to the data cache ... on a miss, place in Miss Address File.
(MAF == MHSR)
“Doublepumped”
1 GHz
2nd stop: Load Queue(LDQ) and Store Queue (SDQ) each hold 32 instructions, until retirement ...
So we can roll back!
LDQ/STQ close-up:
Hazards we are trying to prevent:
To do so, LDQ and SDQ lists of up to 32 loads and stores, in issued order. When a new load or store arrives, addresses are compared to detect/fix hazards:
LDQ/STQ speculation
It also marks the load instruction in a predictor, so that future invocations are not speculatively executed.
First execution Subsequent execution
Designing a microprocessor is a team sport. Below are the author and acknowledgement lists for the papers whose figures I use.