CPSC 321 Computer Architecture Fall 2006 Lecture 1 Introduction and Five Components of a Computer Adapted from CS 152 Spring 2002 UC Berkeley Copyright (C) 2001 UCB
Dec 18, 2015
CPSC 321Computer Architecture
Fall 2006
Lecture 1
Introduction and Five Components of a Computer
Adapted from CS 152 Spring 2002 UC Berkeley
Copyright (C) 2001 UCB
Course InstructorRabi Mahapatra
E-mail: ([email protected]), Sections: 501-503:MWF 12:40 – 1:30 PM
• 520B, HRBB tel: 845-5787• Office Hours: After the Class
TA Information
• Suman K Mandal
Email:
Office:
Office Hours:• Lei Wu Phone:
E-mail: ([email protected])
Office: 526, HRBB tel: 571-2640
Office Hour: TBD
Course Information [contd…]
• Grading: Projects, Assignments, Exams– Assignments 20%– Mid Term 25%– Finals 25%– Projects 30%
• Labs– MIPS (Assembly Programming), Verilog (HDL)
• Projects– Project 1: MIPS – Projects 2 & 3: Verilog (Datapath Design)
Course Information [contd…]
• Book (Required)– Computer Organization and Design: The Hardware/Software
Interface, Third Edition , David A. Patterson and John L. Hennessy, Morgan Kaufmann Publishers.Do not get second edition
REFERENCES:– Digital Design
M. Morris Mano, 3rd Edition, Prentice Hall– The Verilog Hardware Description Language
Thomas & Morby, 5th Edition, Kluwer Academic Publishers – Check the course webpage for other materials and links
Course Information [contd…]
• Course Webpage– http://courses.cs.tamu.edu/rabi/cpsc321/
• CS Accounts– Use your CS accounts to turnin and check
any email regarding course
Course Overview
34-b it A LU
LO register(16x2 bits)
Load
HI
Cle
arH
I
Load
LO
M ultiplicandRegister
S h iftA ll
LoadM p
Extra
2 bits
3 232
LO [1 :0 ]
Result[H I] Result[LO]
32 32
Prev
LO[1]
Booth
Encoder E N C [0 ]
E N C [2 ]
"LO
[0]"
Con trolLog ic
InputM ultiplier
32
S ub /A dd
2
34
34
32
InputM ultiplicand
32=>34sig nEx
34
34x2 M U X
32=>34sig nEx
<<13 4
E N C [1 ]
M ulti x2 /x1
2
2HI register(16x2 bits)
2
01
3 4 Arithmetic
Single/multicycleDatapaths
Computer Arithmetic
Datapaths
Course Overview [contd…]IFetchDcd Exec Mem WB
IFetchDcd Exec Mem WB
IFetchDcd Exec Mem WB
IFetchDcd Exec Mem WB
Pipelining
Memory Systems
Performance
Memory
What’s In It For Me ?
• In-depth understanding of the inner-workings of modern computers, their evolution, and trade-offs present at the hardware/software boundary.– Insight into fast/slow operations that are easy/hard to
implementation hardware
• Experience with the design process in the context of a large complex (hardware) design.– Functional Spec --> Control & Datapath --> Physical
implementation– Modern CAD tools
Computer Architecture - Definition
• Computer Architecture = ISA + MO
• Instruction Set Architecture– What the executable can “see” as underlying hardware– Logical View
• Machine Organization– How the hardware implements ISA ?– Physical View
Computer Architecture – Changing Definition• 1950s to 1960s: Computer Architecture Course:
–Computer Arithmetic
• 1970s to mid 1980s: Computer Architecture Course: –Instruction Set Design, especially ISA appropriate for compilers
• 1990s: Computer Architecture Course:Design of CPU, memory system, I/O system, Multiprocessors, Networks
• 2000s: Computer Architecture Course: –Non Von-Neumann architectures, Reconfiguration
• DNA Computing, Quantum Computing ????
Some Examples …° Digital Alpha (v1, v3) 1992-97 RIP soon
° HP PA-RISC (v1.1, v2.0) 1986-96 RIP soon
° Sun SPARC (v8, v9) 1987-95
° SGI MIPS (MIPS I, II, III, IV, V) 1986-96
° IA-16/32 (8086,286,386, 486, 1978-1999 Pentium, MMX, SSE, …)
° IA-64 (Itanium) 1996-now
° AMD64/EMT64 2002-now
° IBM POWER (PowerPC,…) 1990-now
° Many dead processor architectures live on in
° microcontrollers
The MIPS R3000 ISA (Summary)
• Instruction Categories
– Load/Store– Computational– Jump and Branch– Floating Point
• coprocessor– Memory Management– Special
R0 - R31
PCHI
LO
OP
OP
OP
rs rt rd sa funct
rs rt immediate
jump target
3 Instruction Formats: all 32 bits wide
CPSC 321
“What” is Computer Architecture ?
I/O systemInstr. Set Proc.
Compiler
OperatingSystem
Application
Digital DesignCircuit Design
Instruction Set Architecture
Firmware
• Coordination of many levels of abstraction• Under a rapidly changing set of forces• Design, Measurement, and Evaluation
Datapath & Control
Layout
Impact of changing ISA
• Early 1990’s Apple switched instruction set architecture of the Macintosh– From Motorola 68000-based machines
– To PowerPC architecture
• Intel 80x86 Family: many implementations of same architecture– program written in 1978 for 8086 can be run
on latest Pentium chip
Factors affecting ISA ???
ComputerArchitecture
Technology ProgrammingLanguages
OperatingSystems
History
Applications
Cleverness
ISA: Critical Interface
instruction set
software
hardware
Examples: 80x86 50,000,000 vs. MIPS 5500,000 ???
The Big Picture
Control
Datapath
Memory
Processor
Input
Output
Since 1946 all computers have had 5 components!!!
Example Organization• TI SuperSPARCtm TMS390Z50 in Sun SPARCstation20
Floating-point Unit
Integer Unit
InstCache
RefMMU
DataCache
StoreBuffer
Bus Interface
SuperSPARC
L2$
CC
MBus Module
MBus
L64852 MBus controlM-S Adapter
SBus
DRAM Controller
SBusDMA
SCSIEthernet
STDIO
serialkbdmouseaudioRTC
FloppySBusCards
Technology Trends
• Processor– logic capacity: about 30% per year– clock rate: about 20% per year
• Memory– DRAM capacity: about 60% per year (4x every 3 years)– Memory speed: about 10% per year– Cost per bit: improves about 25% per year
• Disk– capacity: about 60% per year– Total use of data: 100% per 9 months!
• Network Bandwidth– Bandwidth increasing more than 100% per year!
i4004
i8086
i80386
Pentium
i80486
i80286
SU MIPS
R3010
R4400
R10000
1000
10000
100000
1000000
10000000
100000000
1965 1970 1975 1980 1985 1990 1995 2000 2005
Transistors
i80x86
M68K
MIPS
Alpha
° In ~1985 the single-chip processor (32-bit) and the single-board computer emerged
° In the 2002+ timeframe, these may well look like mainframes compared single-chip computer (maybe 2 chips)
DRAMYear Size1980 64 Kb1983 256 Kb1986 1 Mb1989 4 Mb1992 16 Mb1996 64 Mb1999 256 Mb2002 1 Gb
uP-Name
Microprocessor Logic DensityDRAM chip capacity
Technology Trends
Technology Trends
Smaller feature sizes – higher speed, density
ECE/CS 752; copyright J. E. Smith, 2002 (Univ. of Wisconsin)
Technology Trends
Number of transistors doubles every 18 months
(amended to 24 months)
ECE/CS 752; copyright J. E. Smith, 2002 (Univ. of Wisconsin)
Levels of RepresentationHigh Level Language
Program
Assembly Language Program
Machine Language Program
Control Signal Specification
Compiler
Assembler
Machine Interpretation
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
• lw $15, 0($2)• lw $16, 4($2)• sw $16, 0($2)• sw $15, 4($2)
0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111
ALUOP[0:3] <= InstReg[9:11] & MASK
Execution CycleInstruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction
Obtain instruction from program storage
Determine required actions and instruction size
Locate and obtain operand data
Compute result value or status
Deposit results in storage for later use
Determine successor instruction
The Role of Performance
Example of Performance Measure
Performance Metrics
• Response Time– Delay between start end end time of a task
• Throughput– Numbers of tasks per given time
• New: Power/Energy– Energy per task, power
Examples (Throughput/Performance)
• Replace the processor with a faster version?– 3.8 GHz instead of 3.2 GHz
• Add an additional processor to a system?– Core Duo instead of P4
Measuring Performance
• Wall-clock time –or- Total Execution Time
• CPU Time– User Time– System Time
Try using time command on UNIX system
Relating the Metrics
• Performance = 1/Execution Time
• CPU Execution Time = CPU clock cycles for program x Clock cycle time
• CPU clock cycles = Instructions for a program x Average clock cycles per Instruction
Amdahl’s Law
• Pitfall: Expecting the improvement of one aspect of a machine to increase performance by an amount proportional to the size of improvement
Amhdahl’s Law [contd…]
• A program runs in 100 seconds on a machine, with multiply operations responsible for 80 seconds of this time. How much do I have to improve the speed of multiplication if I want my program to run five times faster ?
• Execution Time After improvement = (exec time affected by improvement/amount of improvement) + exec time unaffectedexec time after improvement = (80 seconds / n) + (100 – 80 seconds)
We want performance to be 5 times faster =>
20 seconds = 80/n seconds / n + 20 seconds
0 = 80 / n !!!!
Amdahl’s Law [contd…]
• Opportunity for improvement is affected by how much time the event consumes
• Make the common case fast
• Very high speedup requires making nearly every case fast
• Focus on overall performance, not one aspect
Summary • Computer Architecture = Instruction Set Architure + Machine
Organization• All computers consist of five components
– Processor: (1) datapath and (2) control– (3) Memory– (4) Input devices and (5) Output devices
• Not all “memory” are created equally– Cache: fast (expensive) memory are placed closer to the
processor– Main memory: less expensive memory--we can have more
• Interfaces are where the problems are - between functional units and between the computer and the outside world
• Need to design against constraints of performance, power, area and cost
Summary
• Performance “eye of the beholder”
Seconds/program =
(Instructions/Pgm)x(Clk Cycles/Instructions)x(Seconds/Clk cycles)
• Amdahl’s Law “Make the Common Case Faster”