This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
CPE 408340Computer Organization
Chapter 1: Computer Abstractionsand Technology
Sa’ed R. Abed[Computer Engineering Department,
Hashemite University][Adapted from Otmane Ait Mohamed Slides &
Principles of computer architecture: CPU datapath and controlunit design (single-issue pipelined, superscalar, VLIW), memoryhierarchies and design, I/O organization and design, advancedprocessor design (multiprocessors).
Course goalsTo learn the organizational paradigms that determine thecapabilities and performance of computer systems. Tounderstand the interactions between the computer’s architectureand its software so that future software designers (compilerwriters, operating system designers, database programmers, …)can achieve the best cost-performance trade-offs and so thatfuture architects understand the effects of their design choices onsoftware applications.
Course prerequisitesCPE 408330: Assembly Language and Microprocessor Systems.
3
What You Should KnowBasic logic design & machine organization:
To learn the organizational paradigms that determinesthe capabilities and performance of computer systems.
Create, assemble, run, debug programs in anassembly language:
MIPS preferred.
To explore the memory hierarchy system and how tointerface it to a computer.
4
Course StructureDesign focused class:
• Various homework assignments throughout the semester
Lectures:Computer Abstractions and Technology
Instructions: Language of the Computer
Arithmetic for Computers
Review and First Exam
The processor
Review and Second Exam
Exploiting Memory Hierarchy
Review and Final Exam5
Chapter 1 (2 Weeks)(Sec. 1.1 to 1.4)Chapter 2 (2 1/2 Weeks)(Sec. 2.1 to 2.7 & 2.10)Chapter 3 (1 1/2 Weeks)(Sec. 3.1 to 3.4)(1/2 Week)
Chapter 4 (4 Weeks)(Sec. 4.1 to 4.9)(1/2 Week)
Chapter 5 (3 Weeks)(Sec. 5.1 to 5.3 & 5.5)(1/2 Week)
Grading InformationGrade determinates
• First Exam ~20%Monday, March 12th.
• Second Exam ~25%Monday, April 16th.
•Final Exam ~50%TBD
• Class participation & pop quizzes ~5%
Let me know about any exam conflicts ASAP
6
Ethics and ProfessionalismEthics
Disciplined dealing with moral duty.Moral Principles or Practice.System of right behavior.
ProfessionalismThe conduct, aims or qualities that characterize a professional person.
7
What characterizes a “professional”?
a professional accepts responsibility fully – does not blame others for failure.
a professional is reliable - gets the job done on time.
a professional is competent - gets the correct answer.
a professional works independently – finds out what he/she does not know.
a professional follows up on all the details.
a professional has high standards of ethical behavior – does not lie or cheat.
a professional does not steal the work of others and present it as his own.
8
What characterizes a “professional”?
a professional is respectful to others.
a professional does not offer excuses in lieu of completed work.
a professional is resourceful.
a professional has initiative.
a professional succeeds in spite of obstacles and road blocks.
a professional has justifiable self-confidence.
9
The Student is the Product of our Engineering School
We are an accredited engineering school: our product is engineering professionals.
Employers expect our graduates to behave like professionals.
Employers seek the qualities of a professional in job interviews.
Professionalism must start in the first semester and be part of every course over four years.
10
The Student is the Product of our Engineering School
Every student must learn to “think like an engineer”:o accept responsibility for his/her own learning o follow up on lecture material and homeworko learn problem-solving skills not just how to solve each specific
homework problemo build a body of knowledge integrated over four years of courses
We all want HU’s excellent reputation to be reinforced so that employers will hire our graduates!
11
By the architecture of a system, I mean the complete and detailed specification of the user interface. … As Blaauw has said, “Where architecture tells whathappens, implementation tells how it is made to happen.”
The Mythical Man-Month, Brooks, pg 45
12
Moore’s Law
In 1965, Gordon Moore predicted that the number of transistors that can be integrated on a die would double every 18 to 24 months (i.e., grow exponentially with time).
Amazingly visionary – million transistor/chip barrier was crossed in the 1980’s.
2300 transistors, 1 MHz clock (Intel 4004) - 197116 Million transistors (Ultra Sparc III)42 Million transistors, 2 GHz clock (Intel Xeon) – 200155 Million transistors, 3 GHz, 130nm technology, 250mm2 die (Intel Pentium 4) - 2004140 Million transistor (HP PA-8500)
13
Where is the Market?
290
933
488
1143
892
1354
862
1294
1122
1315
0
200
400
600
800
1000
1200
1998 1999 2000 2001 2002
EmbeddedDesktopServers
Milli
ons
of C
ompu
ters
14
Processor Performance Increase
1
10
100
1000
10000
1987 1989 1991 1993 1995 1997 1999 2001 2003
Year
Perf
orm
ance
(SPE
C In
t)
SUN-4/260 MIPS M/120MIPS M2000
IBM RS6000
HP 9000/750
DEC AXP/500 IBM POWER 100
DEC Alpha 4/266DEC Alpha 5/500
DEC Alpha 21264/600
DEC Alpha 5/300
DEC Alpha 21264A/667Intel Xeon/2000
Intel Pentium 4/3000
15
Growth Capacity of DRAM Chips
K = 1024 (210)In recent years growth rate has slowed to 2x every 2 year
16
The Evolution of Computer Hardware
When was the first transistor invented?Modern-day electronics began with the invention in 1947 of the transfer resistor - the bi-polar transistor -by Bardeen et.al at Bell Laboratories
18
The Evolution of Computer Hardware
When was the first IC (integrated circuit) invented?
In 1958 the IC was “born” when Jack Kilby at Texas Instruments successfully interconnected, by hand, several transistors, resistors and capacitors on a single substrate
20
The Underlying Technologies
Year Technology Relative Perform/Unit Cost1951 Vacuum Tube 11965 Transistor 351975 Integrated Circuit (IC) 9001995 Very Large Scale IC (VLSI) 2,400,0002005 Submicron VLSI 6,200,000,000
What if technology in the transportation industry advanced at the same rate?
21
The PowerPC 750
Introduced in 1999
3.65M transistors
366 MHz clock rate
40 mm2 die size
250nm (0.25micron) technology
22
Technology Outlook
High Volume Manufacturing
2004 2006 2008 2010 2012 2014 2016 2018
Technology Node (nm)
90 65 45 32 22 16 11 8
Integration Capacity (BT)
2 4 8 16 32 64 128 256
Delay = CV/I scaling
0.7 ~0.7 >0.7 Delay scaling will slow down
Energy/Logic Op scaling
>0.35 >0.5 >0.5 Energy scaling will slow down
Bulk Planar CMOS High Probability Low ProbabilityAlternate, 3G etc Low Probability High ProbabilityVariability Medium High Very HighILD (K) ~3 <3 Reduce slowly towards 2 to 2.5RC Delay 1 1 1 1 1 1 1 1Metal Layers 6-7 7-8 8-9 0.5 to 1 layer per generation
23
Impacts of Advancing Technology
Processorlogic capacity: increases about 30% per yearperformance: 2x every 1.5 years
MemoryDRAM capacity: 4x every 3 years, now 2x every 2 yearsmemory speed: 1.5x every 10 yearscost per bit: decreases about 25% per year
Advantages of Higher-Level Languages ?Higher-level languages
As a result, very little programming is done today at the assembler level
Allow the programmer to think in a more natural language and for their intended use (Fortran for scientific computation, Cobol for business programming, Lisp for symbol manipulation, Java for web programming, …)Improve programmer productivity – more understandable code that is easier to debug and validateImprove program maintainabilityAllow programs to be independent of the computer on which they are developed (compilers and assemblers can translate high-level language programs to the binary instructions of any machine)Emergence of optimizing compilers that produce very efficient assembly code optimized for the target machine
37
Machine OrganizationCapabilities and performance characteristics of the principal Functional Units (FUs)
Logic and means by which information flow between FUs is controlled
The machine’s Instruction Set Architecture (ISA)
Register Transfer Level (RTL) machine description
38
Instruction Set Architecture (ISA)ISA: An abstract interface between the hardware and the lowest level software of a machine that encompasses all the information necessary to write a machine language program that will run correctly, including instructions, registers, memory access, I/O, and so on.
“... the attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementation.”
– Amdahl, Blaauw, and Brooks, 1964Enables implementations of varying cost and performance to run identical software
ABI (application binary interface): The user portion of the instruction set plus the operating system interfaces used by application programmers. Defines a standard for binary portability across computers.
Processor fetches the next instruction from memory
How does it know which location in memory to fetch
from next?
50
Processor OrganizationControl needs to have circuitry to
What location does it load from and store to?
Decide which is the next instruction and input it from memoryDecode the instructionIssue signals that control the way information flows between datapath componentsControl what operations the datapath’s functional units perform
Execute instructions - functional units (e.g., adder) and storage locations (e.g., register file) Interconnect the functional units so that the instructions can be executed as requiredLoad data from and store data to memory
The interface description separating the software and hardware
55
The MIPS ISAInstruction Categories
Load/StoreComputationalJump and BranchFloating Point
- coprocessorMemory ManagementSpecial
R0 - R31
PCHILO
OP
OP
OP
rs rt rd sa funct
rs rt immediate
jump target
3 Instruction Formats: all 32 bits wide
Registers
Q: How many already familiar with MIPS ISA? 56
How Do the Pieces Fit Together?
I/O systemProcessor
Compiler
OperatingSystem
Applications
Digital DesignCircuit Design
Instruction SetArchitecture
Firmware
Coordination of many levels of abstraction
Under a rapidly changing set of forces
Design, measurement, and evaluation
Memory system
Datapath & Control
network
57
Performance MetricsPurchasing perspective
given a collection of machines, which has the- best performance ?- least cost ?- best cost/performance?
Design perspectivefaced with design options, which has the
- best performance improvement ?- least cost ?- best cost/performance?
Both requirebasis for comparisonmetric for evaluation
Our goal is to understand what factors in the architecture contribute to overall system performance and the relative importance (and cost) of these factors
59
Which of these airplanes has the best performance?
How much faster is the Concorde compared to the 747? How much bigger is the 747 than the Douglas DC-8?
60
Response Time (latency)— How long does it take for my job to run?— How long does it take to execute a job?— How long must I wait for the database query?
Throughput— How many jobs can the machine run at once?— What is the average execution rate?— How much work is getting done?
If we upgrade a machine with a new processor what do we increase?
If we add a new machine to the lab what do we increase?
Computer Performance: TIME, TIME, TIME
61
Elapsed Timecounts everything (disk and memory accesses, I/O , etc.)a useful number, but often not good for comparison purposes
CPU timedoesn't count I/O or time spent running other programscan be broken up into system time, and user time
Our focus: user CPU time time spent executing the lines of code that are "in" our program
Execution Time
62
For some program running on machine X,
PerformanceX = 1 / Execution timeX
"X is n times faster than Y"
PerformanceX / PerformanceY = n
Problem:machine A runs a program in 20 secondsmachine B runs the same program in 25 seconds
Book's Definition of Performance
63
Defining (Speed) PerformanceNormally interested in reducing
Response time (aka execution time) – the time between the start and the completion of a task
- Important to individual usersThus, to maximize performance, need to minimize execution time
Throughput – the total amount of work done in a given time- Important to data center managers
Decreasing response time almost always improves throughput
Performance FactorsWant to distinguish elapsed time and the time spent on our task
CPU execution time (CPU time) – time the CPU spends working on a task
Does not include time waiting for I/O or running other programs
CPU execution time # CPU clock cyclesfor a program for a program
= x clock cycle time
CPU execution time # CPU clock cycles for a programfor a program clock rate
= -------------------------------------------
Can improve performance by reducing either the length of the clock cycle or the number of clock cycles required for a program
or
65
Review: Machine Clock RateClock rate (MHz, GHz) is inverse of clock cycle time (clock period)
CC = 1 / CR
one clock period
10 nsec clock cycle => 100 MHz clock rate
5 nsec clock cycle => 200 MHz clock rate
2 nsec clock cycle => 500 MHz clock rate
1 nsec clock cycle => 1 GHz clock rate
500 psec clock cycle => 2 GHz clock rate
250 psec clock cycle => 4 GHz clock rate
200 psec clock cycle => 5 GHz clock rate66
Clock Cycles
Instead of reporting execution time in seconds, we often use cycles
Clock “ticks” indicate when to start activities (one abstraction):
cycle time = time between ticks = seconds per cycle
clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec)
A 4 Ghz. clock has a cycle time
time
secondsprogram
=cycles
program×
secondscycle
(ps) spicosecond 2501210 9104
1 =××
67
So, to improve performance (everything else being equal) you can
either (increase or decrease?)
________ the # of required cycles for a program, or
________ the clock cycle time or, said another way,
________ the clock rate.
How to Improve Performance
secondsprogram
=cycles
program×
secondscycle
68
Could assume that number of cycles equals number of instructions
This assumption is incorrect,
different instructions take different amounts of time on different machines.
Why? hint: remember that these are machine instructions, not lines of C code
time
1st i
nstru
ctio
n
2nd
inst
ruct
ion
3rd
inst
ruct
ion
4th
5th
6th ...
How many cycles are required for a program?
69
Multiplication takes more time than addition
Floating point operations take longer than integer ones
Accessing memory takes more time than accessing
registersImportant point: changing the cycle time often changes the number of cycles required for various instructions
time
Different numbers of cycles for different instructions
70
CSE431 L01 Introduction.71 Irwin, PSU, 2005
Clock Cycles per InstructionNot all instructions take the same amount of time to execute
One way to think about execution time is that it equals the number of instructions executed multiplied by the average time per instruction
Clock cycles per instruction (CPI) – the average number of clock cycles each instruction takes to execute
A way to compare two different implementations of the same ISA
# CPU clock cycles # Instructions Average clock cyclesfor a program for a program per instruction = x
CPI for this instruction classA B C
CPI 1 2 3
71
Effective CPIComputing the overall effective CPI is done by looking at the different types of instructions and their individual cycle counts and averaging
Overall effective CPI = Σ (CPIi x ICi)i = 1
n
Where ICi is the count (percentage) of the number of instructions of class i executedCPIi is the (average) number of clock cycles per instruction for that instruction classn is the number of instruction classes
The overall effective CPI varies by instruction mix – a measure of the dynamic frequency of instructions across one or many programs
72
THE Performance EquationOur basic performance equation is then
CPU time = Instruction_count x CPI x clock_cycle
Instruction_count x CPI
clock_rate CPU time = -----------------------------------------------
or
These equations separate the three key factors that affect performance
Can measure the CPU execution time by running the programThe clock rate is usually givenCan measure overall instruction count by using profilers/ simulators without knowing all of the implementation detailsCPI varies by instruction type and ISA implementation for which we must know the implementation details
73
CSE431 L01 Introduction.75 Irwin, PSU, 2005
Determinates of CPU PerformanceCPU time = Instruction_count x CPI x clock_cycle
Instruction_count
CPI clock_cycle
Algorithm
Programming languageCompiler
ISA
Processor organizationTechnology X
XX
XX
X X
X
X
X
X
X
75
CSE431 L01 Introduction.77 Irwin, PSU, 2005
A Simple Example
How much faster would the machine be if a better data cache reduced the average load time to 2 cycles?
How does this compare with using branch prediction to save a cycle off the branch time?
What if two ALU instructions could be executed at once?
Op Freq CPIi Freq x CPIiALU 50% 1
Load 20% 5
Store 10% 3
Branch 20% 2
Σ =
.5
1.0
.3
.4
2.2
CPU time new = 1.6 x IC x CC so 2.2/1.6 means 37.5% faster
1.6
.5
.4
.3
.4
.5
1.0
.3
.2
2.0
CPU time new = 2.0 x IC x CC so 2.2/2.0 means 10% faster
.25
1.0
.3
.4
1.95
CPU time new = 1.95 x IC x CC so 2.2/1.95 means 12.8% faster77
Comparing and Summarizing Performance
Guiding principle in reporting performance measurements is reproducibility – list everything another experimenter would need to duplicate the experiment (version of the operating system, compiler settings, input set used, specific computer configuration (clock rate, cache sizes and speed, memory size and speed, etc.))
How do we summarize the performance for benchmark set with a single number?
The average of execution times that is directly proportional to total execution time is the arithmetic mean (AM)
AM = 1/n Σ Timeii = 1
n
Where Timei is the execution time for the ith program of a total of n programs in the workloadA smaller mean indicates a smaller average execution time and thus improved performance
78
Performance is specific to a particular program/sTotal execution time is a consistent summary of performance
For a given architecture performance increases come
from:increases in clock rate (without adverse CPI affects)improvements in processor organization that lower CPIcompiler enhancements that lower CPI and/or instruction countAlgorithm/Language choices that affect instruction count
Pitfall: expecting improvement in one aspect of a
machine’s performance to affect the total performance
Remember
79
Summary: Evaluating ISAsDesign-time metrics:
Can it be implemented, in how long, at what cost?Can it be programmed? Ease of compilation?
Static Metrics:How many bytes does the program occupy in memory?
Dynamic Metrics:How many instructions are executed? How many bytes does the processor fetch to execute the program?How many clocks are required per instruction?How "lean" a clock is practical?
Best Metric: Time to execute the program!
CPI
Inst. Count Cycle Timedepends on the instructions set, the processor organization, and compilation techniques.
80
Next Lecture and RemindersNext lecture
Instructions: Language of the Computer- Reading assignment – Chapter 2