CS 252 Graduate Computer Architecture Lecture 2 - Metrics and Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs252 8/30 CS252-Fall!07 2 Review from Last Time • Computer Architecture >> instruction sets • Computer Architecture skill sets are different – 5 Quantitative principles of design – Quantitative approach to design – Solid interfaces that really work – Technology tracking and anticipation • CS 252 to learn new skills, transition to research • Computer Science at the crossroads from sequential to parallel computing – Salvation requires innovation in many fields, including computer architecture • Opportunity for interesting and timely CS 252 projects exploring CS at the crossroads – RAMP as experimental platform
29
Embed
CS 252 Graduate Computer Architecture Lecture 2 - Metrics ...cs252/fa07/lectures/L02-Pipelining.pdf · ¥Computer Architecture skill sets are different Ð5 Quantitative principles
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
CS 252 Graduate Computer Architecture
Lecture 2 - Metrics and Pipelining
Krste AsanovicElectrical Engineering and Computer Sciences
University of California at Berkeley
http://www.eecs.berkeley.edu/~krste
http://inst.eecs.berkeley.edu/~cs252
8/30 CS252-Fall!07 2
Review from Last Time
• Computer Architecture >> instruction sets
• Computer Architecture skill sets are different– 5 Quantitative principles of design
– Quantitative approach to design
– Solid interfaces that really work
– Technology tracking and anticipation
• CS 252 to learn new skills, transition to research
• Computer Science at the crossroads from sequentialto parallel computing
– Salvation requires innovation in many fields, including computerarchitecture
• Opportunity for interesting and timely CS 252projects exploring CS at the crossroads
– RAMP as experimental platform
8/30 CS252-Fall!07 3
Review (continued)
• Other fields often borrow ideas from architecture
• Quantitative Principles of Design1. Take Advantage of Parallelism
2. Principle of Locality
3. Focus on the Common Case
4. Amdahl’s Law
5. The Processor Performance Equation
• Careful, quantitative comparisons– Define, quantity, and summarize relative performance
– Define and quantity relative cost
– Define and quantity dependability
– Define and quantity power
• Culture of anticipating and exploiting advances intechnology
• Culture of well-defined interfaces that are carefullyimplemented and thoroughly checked
8/30 CS252-Fall!07 4
Metrics used to Compare Designs
• Cost– Die cost and system cost
• Execution Time– average and worst-case
– Latency vs. Throughput
• Energy and Power– Also peak power and peak switching current
• Reliability– Resiliency to electrical noise, part failure
– Robustness to bad software, operator error
• Maintainability– System administration costs
• Compatibility– Software costs dominate
8/30 CS252-Fall!07 5
Cost of Processor
• Design cost (Non-recurring Engineering Costs, NRE)– dominated by engineer-years (~$200K per engineer year)
– also mask costs (exceeding $1M per spin)
• Cost of die– die area
– die yield (maturity of manufacturing process, redundancy features)
– cost/size of wafers
– die cost ~= f(die area^4) with no redundancy
• Cost of packaging– number of pins (signal + power/ground pins)
• Select pieces of workload that work well on your design, ignore others• Use unrealistic data set sizes for application (too big or too small)• Report throughput numbers for a latency benchmark• Report latency numbers for a throughput benchmark• Report performance on a kernel and claim it represents an entire application• Use 16-bit fixed-point arithmetic (because it’s fastest on your system) even
though application requires 64-bit floating-point arithmetic• Use a less efficient algorithm on the competing machine• Report speedup for an inefficient algorithm (bubblesort)• Compare hand-optimized assembly code with unoptimized C code• Compare your design using next year’s technology against competitor’s year
old design (1% performance improvement per week)• Ignore the relative cost of the systems being compared• Report averages and not individual results• Report speedup over unspecified base system, not absolute times• Report efficiency not absolute times• Report MFLOPS not absolute times (use inefficient algorithm)
[ David Bailey “Twelve ways to fool the masses when giving performance
results for parallel supercomputers” ]
8/30 CS252-Fall!07 19
Benchmarking for Future Machines
• Variance in performance for parallel architectures isgoing to be much worse than for serial processors
– SPECcpu means only really work across very similar machineconfigurations
• What is a good benchmarking methodology?
• Possible CS252 project– Berkeley View Techreport has “Dwarves” as major types of code
that must run well (http://view.eecs.berkeley.edu)
– Can you construct a parallel benchmark methodology fromDwarves?
8/30 CS252-Fall!07 20
Power and Energy
• Energy to complete operation (Joules)– Corresponds approximately to battery life
– (Battery energy capacity actually depends on rate of discharge)
• Peak power dissipation (Watts = Joules/second)– Affects packaging (power and ground pins, thermal design)
• di/dt, peak change in supply current (Amps/second)– Affects power supply noise (power and ground pins, decoupling
capacitors)
8/30 CS252-Fall!07 21
Peak Power versus Lower Energy
• System A has higher peak power, but lower total energy
• System B has lower peak power, but higher total energy
Power
Time
Peak A
Peak B
Integrate power
curve to get energy
8/30 CS252-Fall!07 22
CS252 Administrivia
Instructor: Prof. Krste Asanovic
Office: 645 Soda Hall, krste@eecs
Office Hours: M 1-3PM, 645 Soda Hall
T. A.: Rose Liu, rfl@eecs
Office Hours: W 4-5PM, 751 Soda Hall
Lectures: Tu/Th, 9:30-11:00AM 203 McLaughlin
Text: Computer Architecture: A Quantitative Approach,
4th Edition (Oct, 2006)
Web page: http://inst.eecs.berkeley.edu/~cs252
Lectures available online <6:00 AM day of lecture
8/30 CS252-Fall!07 23
CS252 Updates
• Prereq quiz will cover:– Finite state machines
– ISA designs & MIPS assembly code programming (today’s lecture)
• Exception: An unusual event happens to an instruction duringits execution
– Examples: divide by zero, undefined opcode
• Interrupt: Hardware signal to switch the processor to a newinstruction stream
– Example: a sound card interrupts when it needs more audio output samples(an audio “click” happens if it is left waiting)
• Problem: It must appear that the exception or interrupt mustappear between 2 instructions (Ii and Ii+1)
– The effect of all instructions up to and including Ii is totalling complete
– No effect of any instruction after Ii can take place
• The interrupt (exception) handler either aborts program orrestarts at instruction Ii+1
Precise Exceptions In-Order Pipelines
Key observation: architected state only
change in memory and register write stages.
8/30 CS252-Fall!07 58
Summary: Metrics and Pipelining
• Machines compared over many metrics– Cost, performance, power, reliability, compatibility, …
• Difficult to compare widely differing machines on benchmark suite• Control VIA State Machines and Microprogramming• Just overlap tasks; easy if tasks are independent• Speed Up % Pipeline Depth; if ideal CPI is 1, then:
• Hazards limit performance on computers:– Structural: need more HW resources– Data (RAW,WAR,WAW): need forwarding, compiler scheduling– Control: delayed branch, prediction
• Exceptions, Interrupts add complexity• Next time: Read Appendix C!