Evolution of Personal Computing by Microprocessors and SoCs For Credit Seminar: EEC7203 (Internal Assessment) Submitted To Dr. T. Shanmuganantham Associate Professor, Department of Electronics Engineering Azmath Moosa Reg No: 13304006 M. Tech 1 st Yr Department of Electronics Engineering, School of Engg & Tech, Pondicherry University
This file documents the evolution of microprocessors from 4004 to the 4th generation i7
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Evolution of Personal Computing by
Microprocessors and SoCs
For Credit Seminar: EEC7203 (Internal Assessment)
Submitted To
Dr. T. Shanmuganantham
Associate Professor,
Department of Electronics Engineering
Azmath Moosa
Reg No: 13304006 M. Tech 1st Yr
Department of Electronics Engineering, School of Engg & Tech,
Pondicherry University
Page | i
Abstract
Throughout history, new and improved technologies have transformed the human
experience. In the 20th century, the pace of change sped up radically as we entered the
computing age. For nearly 40 years the Microprocessor driven by innovations of companies
like Intel have continuously created new possibilities in the lives of people around the world.
In this paper, I hope to capture the evolution of this amazing device that has raised computing
to a whole new level and made it relevant in all fields – Engineering, Research, Medical,
Academia, Businesses, Manufacturing, Commuting etc. I will highlight the significant strides
made in each generation of Processors and the remarkable ways in which engineers overcame
seemingly unsurmountable challenges and continued to push the evolution to where it is today.
Page | ii
Table of Contents
Title Page No.
1. Abstract i
2. Table of Contents ii
3. List of Figures iii
4. Introduction 1
5. X86 and birth of the PC 2
6. The Pentium 3
7. Pipelined Design 4
8. The Pentium 4 5
9. The Core Microarchitecture 7
10. Tick Tock Cadence 10
11. The Nehalem Microarchitecture 10
12. The SandyBridge Microarchitecture 12
13. The Haswell Microarchitecture 15
14. Performance Comparison 16
15. Shift in Computing Trends 18
16. Advanced RISC Machines 18
17. System on Chip (SoC) 19
18. Conclusion 22
19. References
Page | iii
List of Figures
Figure 1: 4004 Layout 1 Figure 2: Pentium Chip 3 Figure 3: Pentium CPU based PC architecture 4 Figure 4: Pentium 2 logo 4 Figure 5: Pentium 3 logo 4 Figure 6: Pentium 4 HT technology illustration 6 Figure 7: NetBurst architecture feature presentation at Intel Developer
Forum 6 Figure 8: The NetBurst Pipeline 7 Figure 9: The Core architecture feature presentation at Intel Developer
Forum 8 Figure 10: The Core architecture pipeline 8 Figure 11: Macro fusion explained at IDF 9 Figure 12: Power Management capabilities of Core architecture 9 Figure 13: Intel's new tick tock strategy revealed at IDF 10 Figure 14: Nehalem pipeline backend 11 Figure 15: Nehalem pipeline frontend 11 Figure 16: Improved Loop Stream Detector 11 Figure 17: Nehalem CPU based PC architecture 11 Figure 18: Sandybridge architecture overview at IDF 12 Figure 19: Sandybridge pipeline frontend 13 Figure 20: Sandybridge pipeline backend 13 Figure 21: Video transcoding capabilities of Nehalem 14 Figure 22: Typical planar transistor 14 Figure 23: FinFET Tri-Gate transistor 14 Figure 24: FinFET Delay vs Power 15 Figure 25: SEM photograph of fabricated FinFET trigate transistors 15 Figure 26: Haswell pipeline frontend 16 Figure 27: Haswell pipeline backend 16 Figure 28: Performance comparisons of 5 generations of Intel processors 17 Figure 29: Market share of personal computing devices. 18 Figure 30: A smartphone SoC; Qualcomm's OMAP 20 Figure 31: A SoC for tablet; Nvidia TEGRA 21
Page | 1
Introduction
In 1969, Intel was found with aim of manufacturing memory devices. Their first
product was Shottky TTL bipolar SRAM memory chip. A Japanese company – Nippon
Calculating Machine Corporation approached Intel to design 12 custom chips for its new
calculator. Intel engineers suggested a family of just four chips, including one that could be
programmed for use in a variety of products. Intel designed a set of four chips known as the
MCS-4. It included a central processing unit (CPU) chip—the 4004—as well as a supporting
read-only memory (ROM) chip for the custom applications programs, a random-access
memory (RAM) chip for processing data, and a shift-register chip for the input/output (I/O)
port. MCS-4 was a "building block" that engineers could purchase and then customize with
software to perform different functions in a wide variety of electronic devices.
And thus, the industry of the Microprocessor was born. 4004 had 2,300 pMOS
transistors at 10um and was clocked at 740 kHz. 4 pins were multiplexed for both address and
data (16 pin IC). In the very next year, the 8008 was introduced. It was an 8 bit processor
clocked at 500 kHz with 3,500 pMOS transistors at the same 10um. It was actually slower
with 0.05 MIPS (Millions of instructions per second) as compared to 4004 with 0.07. It was
in 1974, that the 8080 with 10 times the performance of 8008 with a different transistor
technology was launched. It used 4,500 NMOS transistors of size 6um. It was clocked at 2
MHz with a whopping 0.29 MIPS. Finally in March 1976, the 8085 clocked at 3 MHz with
yet another newer transistor technology - depletion type NMOS transistors of size 3 um was
launched. It was capable of 0.37 MIPS. The 8085 was a popular device of its time and is still
used in universities across the globe to introduce students to microprocessors.
Figure 1: 4004 Layout
Page | 2
x86 and birth of the PC
The 8086 16 bit processor made its debut in 1978. New techniques such as that of
memory segmentation into banks to extend capacity and Pipelining to speed up execution were
introduced. It was designed to be compatible with 8085 Assembly Mnemonics. It had 29,000
transistors of 3um channel length and was clocked at 5, 8 and 10 MHz with a full 0.75 MIPS
at maximum clock. It was the father of what is now known as the x86 Architecture which
eventually turned out to be Intel’s most successful line of processors that power many
computing devices even today. Introduced soon after was the processor that powered the first
PC – the 8088. Clocked at 5-8 MHz with 0.33-0.66 MIPS, it was 8086 with an external 8 bit
bus.
In 1981, a revolution seized the computer industry stirred by the IBM PC. By the late
'70s, personal computers were available from many vendors, such as Tandy, Commodore, TI
and Apple. Computers from different vendors were not compatible. Each vendor had their own
architecture, their own operating system, their own bus interface, and their own software.
Backed by IBM's marketing might and name recognition, the IBM PC quickly captured the
bulk of the market. Other vendors either left the PC market (TI), pursued niche markets
(Commodore, Apple) or abandoned their own architecture in favor of IBM's (Tandy). With a
market share approaching 90%, the PC became a de-facto standard. Software houses wrote
operating systems (MicroSoft DOS, Digital Research DOS), spread sheets (Lotus 123), word
processors (WordPerfect, WordStar) and compilers (MicroSoft C, Borland C) that ran on the
PC. Hardware vendors built disk drives, printers and data acquisition systems that connected
to the PC's external bus. Although IBM initially captured the PC market, it subsequently lost
it to clone vendors. Accustomed to being a monopoly supplier of mainframe computers, IBM
was unprepared for the fierce competition that arose as Compaq, Leading Edge, AT&T, Dell,
ALR, AST, Ampro, Diversified Technologies and others all vied for a share of the PC market.
Besides low prices and high performance, the clone vendors provided one other very important
thing to the PC market: an absolute hardware standard. In order to sell a PC clone, the
manufacturer had to be able to guarantee that it would run all of the customer's existing PC
software, and work with all of the customer's existing peripheral hardware. The only way to do
this was to design the clone to be identical to the original IBM PC at the register level. Thus,
the standard that the IBM PC defined became graven in stone as dozens of clone vendors
shipped millions of machines that conformed to it in every detail. This standardization has been
an important factor in the low cost and wide availability of PC systems.
Page | 3
8086 and 80186/88 were limited to addressing 1M of memory. Thus, the PC was also
limited to this range. This limitation was increased to 16 MB by 80286 released in 1982. It
had max clock of 16 MHz with more than 2 MIPS. It had 134,000 transistors at 1.5um. The
processors and the PC up to this point were all 16 bit. The 80386 range of processors, released
in 1985, were the first 32 bit processors to be used in the PC. The first of these had 275,000
transistors at 1um and was clocked at 33 MHz with 5.1 MIPS. Its addressing range could be
virtually 32 GB. Over the next few years, Intel modified the architecture and provided some
improvements in terms of memory addressing range and clock speed. The 80486 range of
processors, released in 1989, brought significant advancements in computing capability with a
whopping 41 MIPS for a processor clocked at 50 MHz with 1.2 million transistors at 0.8 um
or 800 nm. It had a new technique to speed up RAM read/writes with the Cache memory. It
was integrated onto the CPU die and was referred to as level 1 or L1 cache (as opposed to the
L2 cache available in the motherboard). As with the previous series, Intel slightly modified
the architecture and released higher clocked versions over the next few years.
The Pentium
The Intel Pentium microprocessor was introduced in
1993. Its microarchitecture, dubbed P5, was Intel's fifth-
generation and first 32 bit superscalar microarchitecture.
Superscalar architecture is one in which multiple execution
units or functional units (such as adders, shifters and
multipliers) are provided and operate in parallel. As a direct
extension of the 80486 architecture, it included dual integer
pipelines, a faster floating-point unit, wider data bus, separate
code and data caches and features for further reduced address
calculation latency. In 1996, the Pentium with MMX Technology (often simply referred to as
Pentium MMX) was introduced with the same basic microarchitecture complemented with an
MMX instruction set, larger caches, and some other enhancements. The Pentium was based
on 0.8 um process technology, involved 3.1 million transistors and was clocked at 60 MHz
with 100 MIPS. The Pentium was truly capable of addressing 4 GB of RAM without any
operating system based virtualization.
Figure 2: Pentium Chip
Page | 4
The next microarchitecture was the P6
or the Pentium Pro released in 1995.
It had an integrated L2 cache. One
major change Intel brought to the PC architecture was
the presence of FSB (Front Side Bus) that managed the
CPU’s communications with the RAM and other IO.
RAM and Graphics card were high speed peripherals
and were interfaced through the Northbridge. Other IO
devices like keyboard and speakers were interfaced
through the Southbridge.
Pentium II followed it soon in 1997. It
had MMX, improved 16 bit
performance and had double the L2 cache. Pentium II had 7.5 million
transistors starting with 0.35um process technology but later revisions utilised
0.25um transistors.
The Pentium III followed in 1999 with 9.5 million 0.25um transistors and a
new instruction set SSE (Streaming SIMD Extensions) that assisted DSP and
graphics processing. Intel was able to push the clock speed higher and higher
with Pentium III with some variants clocked as high as 1 GHz.
Pipelined Design
At a high level the goal of a CPU is to grab instructions from memory and execute those
instructions. All of the tricks and improvements we see from one generation to the next just
help to accomplish that goal faster.
The assembly line analogy for a pipelined microprocessor is over used but that's because it is
quite accurate. Rather than seeing one instruction worked on at a time, modern processors
Figure 3: Pentium CPU based PC architecture
Figure 4: Pentium 2 logo
Figure 5: Pentium 3 logo
Page | 5
feature an assembly line of steps that breaks up the grab/execute process to allow for higher
throughput.
The basic pipeline is as follows: fetch, decode, execute, and commit to memory. One would
first fetch the next instruction from memory (there's a counter and pointer that tells the CPU
where to find the next instruction). One would then decode that instruction into an internally
understood format (this is key to enabling backwards compatibility). Next one would execute
the instruction (this stage, like most here, is split up into fetching data needed by the instruction
among other things). Finally one would commit the results of that instruction to memory and
start the process over again. Modern CPU pipelines feature many more stages than what've
been outlined above.
Pipelines are divided into two halves. Frontend and Backend. The front end is responsible for
fetching and decoding instructions, while the back end deals with executing them. The division
between the two halves of the CPU pipeline also separates the part of the pipeline that must
execute in order from the part that can execute out of order. Instructions have to be fetched and
completed in program order (can't click Print until you click File first), but they can be executed
in any order possible so long as the result is correct.
Many instructions are either dependent on one another (e.g. C=A+B followed by E=C+D) or
they need data that's not immediately available and has to be fetched from main memory (a
process that can take hundreds of cycles, or an eternity in the eyes of the processor). Being able
to reorder instructions before they're executed allows the processor to keep doing work rather
than just sitting around waiting.
This document aims to highlight changes to the x86 pipeline with each generation of
processors.
The Pentium 4
The NetBurst microarchitecture started with Pentium 4. This line of processors started
in 2000 clocked at 1.4 GHz, 42 million transistors at 0.18 um process size and SSE2 instruction
set. The early variants were codenamed Willamette (1.9 to 2.0 GHz) and later ones Northwood
(up to 3.0 GHz) and Prescott.
Page | 6
The diagram is from Intel feature presentation
of the NetBurst architecture. The Willamette
was an early variant with SSE2, Rapid
Execution engine (in which ALUs operate at
twice the core clock frequency) and
Instruction Trace Cache (ITC cached
decoded instructions for faster loop execution).
HT Technology refers to the prevention of
CPU wastage by assigning it to execute one
thread or application when another one waits
for data from RAM to arrive. This essentially acts like a dual processor system.
The NetBurst pipeline was 20 stages long. As illustrated in the figure to the right, the BTB
(Branch Target Buffer) helps to define the address of the next micro-op in the trace cache (TC
Nxt IP). Then micro-ops are fetched out of the trace cache (TC Fetch) and are transferred
(Drive) into the RAT (register alias table). After that, the necessary resources are allocated
(such as loading queues, storing buffers etc. (Alloc)), and there comes logic registers rename
(Rename). Micro-ops are put in the Queue until there appears free place in the Schedulers.
There, micro-ops' dependencies are to be solved, and then micro-ops are transferred to the
register files of the corresponding Dispatch Units. There, a micro-op is executed, and Flags are
calculated. When implementing the jump instruction, the real branch address and the predicted
Figure 7: NetBurst architecture feature presentation at Intel Developer Forum
Figure 6: Pentium 4 HT technology illustration
Page | 7
one are to be compared (Branch Check). After that the new
address is recorded in the BTB (Drive).
Northwood and Prescott were later variations with certain
enhancements as illustrated in the diagram above. Processor
specific details are unnecessary.
The next major advancement was the 64 bit
NetBurst released in 2005. The Prescott line up continued
with maximum clock speeds of 3.8 GHz, transistor sizes of