Evolution of Personal Computing Microprocessors Azmath Moosa M. Tech 1 st Year 13304006 Credit Seminar
Dec 04, 2014
Evolution of Personal Computing
MicroprocessorsAzmath MoosaM. Tech 1st Year
13304006Credit Seminar
Instruction DecoderALU
Full Adder
A
B
Sum
Carry
Full Subtractor
A
B
D
B
Shifter and LogicAA
Shifter and logic BB
01 10 11
Out
2002Architecture by
Azmath!
sel
4:2 MUX
SUB 2, 3
The Invention
• Intel was established as a memory device manufacturer• Nippon Calculating Machine Corporation
approached Intel to design 12 custom chips for its new calculator.• Intel suggested a family of just 4 chips – 4004 was
one of them
1969
4004 to 8085
• 4 bit• 2,300 Transistors• 10um PMOS• Clocked @ 740 kHz
4004
• 8 bit• 3,500 Transistors• 500 kHz
8008 • 4,500 Transistors• 6um NMOS• 2 MHz
8080
• 3um depletion type NMOS
• 3 MHz
8085
1969 - 76
8086/80186/88
• 16 bit• Pipelined• 29000 Transistors• 3um process• 5, 8, 10 MHz• Chosen by
1978
Fetch
Execute
The PC 1981
80286 1984• 16 bit• Pipelined• 134000 Transistors• 1.5um process• Upto 16 MHz
80386
• 32 bit• 275,000 transistors • 1um process technology• 33 MHz
1985
8086801802803804xHas become the standard CPU architecture for the PC platform. All vendors must adhere to this standard to make compatible CPUs for the PC.
Instruction Set• Includes a specification of the set
of opcodes (machine language), and the native commands implemented by a particular processor.• Ex: MMX, 3DNow!, SSE,AVX,AES etc.• Either Hardwired or Microcode routines
Micro-op table
Integer ALU
FP ALU
Load/Store
instruction op1 op2
Operand Fetch
Backend
Frontend
Pipeline
Fetch
Decode
Execute
Write to Memory
• Fetches Instructions & Operands from memory• Any techniques to optimize fetching can be implemented here• Converts Instructions to internal micro-op codes• Has to process instructions in order
• Executes instructions• Parallel units that perform same operation can be present• Instructions can be processed out of order• Any techniques to optimize write back can be implemented
here
The Pentium
• Codenamed P5• Superscaler Architecture• Longer Pipeline• 3.1 Million transistors• 800nm process technology• Upto 233 MHz
• Included MMX instruction set
1993Prefetch
Decode
Decode
Execute
Execute
Writeback
Pentium Pro
• Codenamed P6• Integrated L2
Cache• Chipset +
MemoryController = Northbridge• Iface to ATA, PCI,
ISA, BIOS, SuperIO = Southbridge
1995
P6 Architecture
• 10 Stage Pipeline
• Branch Predictor, predicts branches and prefetches
Pentium II/III 1997-99• 250nm process• 7.5/9.5 Million
Transistors• AGP for faster
graphics• SSE Instruction Set• 1 Ghz
Performance Comparison
Pentium 4
• NetBurst Microarchitecture• 42 Million Transistors• 180nm process technology• SSE Instruction Set• 1.4 to 3.0 Ghz
2000
NetBurst Architecture
• 20 Stage Long Pipeline• Trace Cache• Load operands and store• Que• OoO execution• ALU clocked @ dbl• Hyper Threading
• Too long, high power dissipation
Performance Comparison
Pentium 4
Pentium 3
320 325 330 335 340 345 350 355 360 365
Content Creati on Benchmark
Core Architecture
• 65 nm process
• 291 Million Transistors
• Shorter, Efficient Pipeline
• Wide – Dynamic Execution• Superscaler• Macro-fusion
• Advanced Digital Media Boost• 128 bit ALU
• Advanced Smart Cache
• Smart Memory Access
• Execute Disable Bit
• HT disabled
2006
Performance Comparison
Tick - Tock 2007
Nehalem Architecture• 45 nm process• 700 million transistors• Shared L3 Cache• Integrated Memory
Controller
• Improved Loop Stream detector• Improved Branch Prediction• SSE4+ instruction set• Turbo boost• HT reintroduced
Performance Comparison
SandyBridge Architecture• 32nm process• 1.2 Billion transistors• Ondie - GPU• Ring style on-die interconnect• Aggressive Turbo• AVX instruction set
2010
• Improved BPU• Micro-OP cache• Wider ALU
Performance Comparison
IvyBridge
• Tick• 22nm FinFET Transistors
2012
FinFET Structure
Performance Comparison
HaswellArchitecture• 1.4 Billion transistors• AVX2 Instruction set• Improved cache bandwidth• Improved GPU & QuickSync
2013
• Improved BPU• Unified decoder queue• Wider reorder buffer• Wider EU with 2 additional
ports• FMA – Fused multiply add
Backend
Performance Comparison
1. Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 1: Basic Architecture, [online] Available: http://www.intel.com/products/processor/manuals
2. King, J. ; Quinnell, E. ; Galloway, F. ; Patton, K. ; Seidel, P. ; Dinh, J. ; Hai Bui and Bhowmik, A., "The Floating-Point Unit of the Jaguar x86 Core," in 21st IEEE Symposium on Computer Arithmetic (ARITH), 2013, pp. 7-16.
3. Ibrahim, A.H. ; Abdelhalim, M.B. ; Hussein, H. ; Fahmy, A., "Analysis of x86 instruction set usage for Windows 7 applications," in 2nd International Conference on Computer Technology and Development (ICCTD), 2010, pp. 511-516.
4. PC Architecture, Acid Reviews, [online] 2014, http://acidreviews.blogspot.in/2008/12/pc-architecture.html (Accessed: 2nd February 2014).
5. Alpert, D. and Avnon, D., "Architecture of the Pentium microprocessor," IEEE Micro, vol. 13, Issue 3, pp. 11-21, 1993.
6. Computer Processor History, Computer Hope, [online] 2014, http://www.computerhope.com/history/processor.htm (Accessed: 2nd February 2014).
7. Gartner Press Release, Gartner Analyst, [online] 2014, http://www.gartner.com/newsroom/id/2610015 (Accessed: 8th February 2014).
8. Intel Processor Number, CPU World, [online] 2014, http://www.cpu-world.com/info/Intel/processor-number.html (Accessed: 9th February 2014).
References
Thank You