Top Banner
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili
22

Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Dec 24, 2015

Download

Documents

Karen Allison
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Performance

Lecture notes from MKP, H. H. Lee and S. Yalamanchili

Page 2: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(2)

Reading

• Section 1.6

• Practice Problems: Module 5 – 20, 21, 27

Page 3: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(3)

Understanding Performance

• Algorithm Determines number of operations executed

• Programming language, compiler, architecture Determine number of machine instructions executed

per operation

• Processor and memory system Determine how fast instructions are executed

• I/O system (including OS) Determines how fast I/O operations are executed

Instruction Set Architecture

Page 4: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(4)

Defining Performance

• Which airplane has the best performance?

Page 5: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(5)

Metrics

• Response time (latency) How long it takes to do a task

• Throughput Total work done per unit time

o e.g., tasks/transactions/… per hour Trading throughput vs. latency

• Energy/Power Measure of work being performed Increases with clock frequency/voltage Determines temperature Is affected by temperature

Page 6: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(6)

Relative Performance

• “X is n time faster than Y”

n XY

YX

time Executiontime Execution

ePerformancePerformanc

Example: time taken to run a program 10s on A, 15s on B Execution TimeB / Execution TimeA

= 15s / 10s = 1.5 So A is 1.5 times faster than B

Page 7: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(7)

CPU Clocking

• Operation of digital hardware governed by a constant-rate clock

Clock (cycles)

Data transferand computation

Update state

Clock period

Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250×10–12s

Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0×109Hz

Our cycle time

Page 8: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(8)

CPU Time

• Performance improved by Reducing number of clock cycles Increasing clock rate Hardware designer must often trade off clock rate

against cycle count

Rate Clock

Cycles Clock CPU

Time Cycle ClockCycles Clock CPUTime CPU

Page 9: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(9)

CPU Time Example

• Computer A: 2GHz clock, 10s CPU time

• Designing Computer B Aim for 6s CPU time Can do faster clock, but causes 1.2 × clock cycles

• How fast must Computer B clock be?

Page 10: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(10)

Instruction Count and CPI

• Instruction Count for a program Determined by program, ISA and compiler

• Average cycles per instruction Determined by CPU hardware If different instructions have different CPI

o Average CPI affected by instruction mix

Page 11: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(11)

• Multiplication takes more time than addition• Floating point operations take longer than integer

ones• Accessing memory takes (in general) more time

than accessing registers

• Important point: changing the cycle time often changes the number of cycles required for various instructions (more later)

time

Cycles and Instructions

Page 12: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(12)

Program Execution time

~= Instruction_count * CPIavg * clock_cycle_time

algorithms/compiler architecture technology

Relative frequency

Number of instruction classes

Page 13: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(13)

CPI Example

• Computer A: Cycle Time = 250ps, CPI = 2.0

• Computer B: Cycle Time = 500ps, CPI = 1.2

• Same ISA

• Which is faster, and by how much?

A is faster…

…by this much

Page 14: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(14)

CPI Example

• Alternative compiled code sequences using instructions in classes A, B, C

Class A B C

CPI for class 1 2 3

IC in sequence 1 2 1 2

IC in sequence 2 4 1 1

Sequence 1: IC = 5 Clock Cycles

= 2×1 + 1×2 + 2×3= 10

Avg. CPI = 10/5 = 2.0

Sequence 2: IC = 6 Clock Cycles

= 4×1 + 1×2 + 1×3= 9

Avg. CPI = 9/6 = 1.5

Example:

Page 15: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(15)

Performance Summary

• Performance depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, Tc

The BIG Picture

cycle Clock

Seconds

nInstructio

cycles Clock

Program

nsInstructioTime CPU

Page 16: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(16)

SPEC CPU Benchmark

• Programs used to measure performance Supposedly typical of actual workload

• Standard Performance Evaluation Corp (SPEC) Develops benchmarks for CPU, I/O, Web, …

• SPEC CPU2006 Elapsed time to execute a selection of programs

o Negligible I/O, so focuses on CPU performance Normalize relative to reference machine Summarize as geometric mean of

performance ratioso CINT2006 (integer) and CFP2006 (floating-point)

n

n

1iiratio time Execution

Page 17: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(17)

Other Benchmark Suites

• Report performance metrics for execution on target platforms Designed to assess how well the platforms function in

specific domains

• Examples Media Bench - Multimedia EEMBC – Embedded systems Rodinia, Parboil: For GPU Systems SPECWeb, SPECJbb – Enterprise systems Many more……

Page 18: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(18)

Pitfall: Amdahl’s Law

• Improving an aspect of a computer and expecting a proportional improvement in overall performance

2080

20 n

Can’t be done!

unaffectedaffected

improved Tfactor timprovemen

TT

Example: multiply accounts for 80s/100s How much improvement in multiply performance to

get 5× overall?

Corollary: make the common case fast

Page 19: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(19)

Amdahl’s Law

• Speed-up = Exec_timeold / Exec_timenew =

• Performance improvement from using faster mode is limited by the fraction the faster mode can be applied.

f(1 - f)

Told

(1 - f)

Tnew

f / P

Pf

f )1(

1

affected

Page 20: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(20)

Concluding Remarks

• Cost/performance is improving Due to underlying technology development

• Hierarchical layers of abstraction In both hardware and software

• Instruction set architecture The hardware/software interface

• Execution time: the best performance measure

• Power is a limiting factor Use parallelism to improve performance

Page 21: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(21)

Study Guide

• Practice problems provided on the class website

Page 22: Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

(22)

Glossary

• Amdahl’s Law

• Benchmarks

• CPI (cycles per instruction)

• CPU Time

• Latency

• SPEC CPU

• Throughput