8/6/2019 The Micro Architecture of Intel Pentium 4
1/20
1
The Microarchitecture of Intel
Pentium 4
Sudipta Mahapatra
8/6/2019 The Micro Architecture of Intel Pentium 4
2/20
2
Introduction
The Intel Pentium 4 was introduced in November2000 targeted at a high clock rate of 1.5 GHz.
The Netburst microarchitecture formed the basis
for a new family of Intel processors starting from
the Pentium 4.
Developed with an intention of delivering high
level of performance for many important
applications such as multimedia.
8/6/2019 The Micro Architecture of Intel Pentium 4
3/20
3
Targeted application areas
Internet audio and streaming video.
Image processing
Video content creation
Speech recognition
3D applications and games.
Video editing and video conferencing.
8/6/2019 The Micro Architecture of Intel Pentium 4
4/20
4
Overview of the
Netburst Microarchitecture
Uses a deeply pipelined architecture to ensure a
high clock rate.
Uses a high-performance, quad-pumped bus
interface to the 100 MHz system bus to transferdata at a rate of 400 MHz.
Uses a high speed execution engine to reduce the
latency of basic integer instructions
8/6/2019 The Micro Architecture of Intel Pentium 4
5/20
5
Overview (Contd.)
Out-of-order speculative execution to enable
parallelism
Superscalar issue to exploit maximal parallelism
8/6/2019 The Micro Architecture of Intel Pentium 4
6/20
6
Main Features
Hardware register renaming to avoid registername space limitations (WAW hazards)
Cache line sizes of 64 bytes
Optimization for the common case of frequently
executed instructions
Improved branch handling techniques.
8/6/2019 The Micro Architecture of Intel Pentium 4
7/20
7
Basic Block Diagram
Branch-history update
[Glenn Hinton et. al., Intel Technology Jn. Q1, 2001]
8/6/2019 The Micro Architecture of Intel Pentium 4
8/20
8
Main sections
1. In order front end (FE)2. Out-of-order Execution logic (OOE)
3. Integer and Floating-point Execution Units
(EX)4. Memory Subsystem (M)
8/6/2019 The Micro Architecture of Intel Pentium 4
9/20
9
In order front end
Fetches the instructions to be executed next.Supplies a set ofdecoded instructions to the
execution pipeline.
Uses accurate branch prediction logic to
determine the branch target.
The instructions from the branch target are
decoded to generate a set of micro-operations
or uops that may be executed in the executioncore.
Uses the trace cache to store the uops
corresponding to the most recently executed
instructions.
8/6/2019 The Micro Architecture of Intel Pentium 4
10/20
10
Front end
From L2
Cache
To Allocator/
RegisterRenamer
[Glenn Hinton et. al., Intel Technology Jn. Q1, 2001]
8/6/2019 The Micro Architecture of Intel Pentium 4
11/20
11
Front end components Trace cache (TC): Serves as the L1 instruction
cache. However, it holds the uops corresponding to
the most recently decoded instructions.
Delivers up to three uops per clock cycle to the
OOE. Capacity=12K uops.
Only in case of TC miss, the L2 cache is accessed.
The trace cache has its own branch predictor that
indicates where to go next in the trace cache.
This is smaller than the Front-end BTB as it isconcerned only with the subset of instructions that
are currently in the trace cache.
Also includes a 16-entry return address stack.
8/6/2019 The Micro Architecture of Intel Pentium 4
12/20
8/6/2019 The Micro Architecture of Intel Pentium 4
13/20
13
Front end components (Contd.)
Instruction decoder: Receives two IA-32 instructions at atime from the L2 cache and decodes them into uops.
Can decode at a maximum rate of one IA-32 instruction at
a time.
Most of the instructions are converted into single uops. If the instruction needs more than 4 uops, control is
transferred into the microcode ROM.
8/6/2019 The Micro Architecture of Intel Pentium 4
14/20
14
Out-of-order Execution logic
Prepares the instructions for out-of-orderexecution.
Uses aggressive reordering to execute the
instructions as soon as they are ready to execute.
Maximal utilization of execution resources.
Has retirement logic to reorder the instructions so
that they commit in order.
8/6/2019 The Micro Architecture of Intel Pentium 4
15/20
15
Out-of-order Execution logic
From uop Queue
To execution units
[Glenn Hinton et. al., Intel Technology Jn. Q1, 2001]
8/6/2019 The Micro Architecture of Intel Pentium 4
16/20
16
Execution Units
The execution units include several integer and
floating point units for result computation.
The execution section also includes the L-1 data
cache used for most of the load/store operations.
8/6/2019 The Micro Architecture of Intel Pentium 4
17/20
17
Execution Units
From/to
memory
subsystem
From out-of-order execution logic
[Glenn Hinton et. al., Intel Technology Jn. Q1, 2001]
8/6/2019 The Micro Architecture of Intel Pentium 4
18/20
18
Memory Subsystem
The memory section contains the L2 cache and the
system bus.
Used to access the main memory when the L2
cache has a cache miss.Also used to access the I/O resources.
8/6/2019 The Micro Architecture of Intel Pentium 4
19/20
19
Memory Subsystem
To ITLB/Prefetcher
From execution units
[Glenn Hinton et. al., Intel Technology Jn. Q1, 2001]
8/6/2019 The Micro Architecture of Intel Pentium 4
20/20
20
Pentium 4 pipeline
The P6 microarchitecture (P2, P3, Celeron) has
twice the pipeline depth of Pentium processor.
The Netburst microarchitecture has almost
doubled the depth of pipelining of P6.- It allows for a higher frequency of operation.
- Different parts of Pentium 4 operate at different
clock frequencies.