Top Banner
1 The Difference Engine, Charles Babbage The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra Slides courtesy Anselmo Lastra
59

1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

Jan 05, 2016

Download

Documents

Joella Pierce
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

1

The Difference Engine, Charles BabbageThe Difference Engine, Charles BabbageImages from Wikipedia (Joe D and Andrew Dunn)Images from Wikipedia (Joe D and Andrew Dunn)

Slides courtesy Anselmo LastraSlides courtesy Anselmo Lastra

Page 2: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

2

COMP 740:COMP 740:Computer Architecture and Computer Architecture and ImplementationImplementation

Montek SinghMontek Singh

Wed, Jan 12, 2011Wed, Jan 12, 2011

Lecture 2: Lecture 2: Fundamentals and TrendsFundamentals and Trends

Page 3: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

3

Quantitative Principles of Computer Quantitative Principles of Computer DesignDesign

T

1P

Execution timeResponse timeLatency

Execution timeResponse timeLatency

PerformanceRate of producing resultsThroughputBandwidth

PerformanceRate of producing resultsThroughputBandwidth

bitn / instructio / programresult / work /

time

time

bits / nsinstructio / program / resultswork /

Page 4: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

4

TopicsTopics PerformancePerformance ChipsChips Trends inTrends in

““Bandwidth” (or Throughput) vs. LatencyBandwidth” (or Throughput) vs. LatencyPowerPowerCostCostDependabilityDependability

Measuring PerformanceMeasuring Performance

Page 5: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

5

Trends: Moore’s LawTrends: Moore’s Law

Era of the microprocessor.Increases due to transistorsand architectural improvements

Page 6: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

6

PerformancePerformance Increase by 2002 was 7X faster than would Increase by 2002 was 7X faster than would

have been due to tech alonehave been due to tech alone What has slowed the trend?What has slowed the trend?

Note what is really being builtNote what is really being builtA commodity device!A commodity device!So cost is very importantSo cost is very important

ProblemsProblemsAmount of heat that can be removed economicallyAmount of heat that can be removed economicallyLimits to instruction level parallelismLimits to instruction level parallelismMemory latencyMemory latency

Page 7: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

7

Moore’s LawMoore’s Law Number of transistors on a chip Number of transistors on a chip

at the lowest cost/componentat the lowest cost/component

It’s not quite clear what it isIt’s not quite clear what it is Moore’s original paper, doubling yearlyMoore’s original paper, doubling yearly

Didn’t make it in 1975Didn’t make it in 1975 Often quoted as doubling every 18 monthsOften quoted as doubling every 18 months Sometimes as doubling every two yearsSometimes as doubling every two years

Moore’s article worth reading if you haven’t Moore’s article worth reading if you haven’t yetyet

Page 8: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

8

Quick Look: Classes of ComputersQuick Look: Classes of Computers Used to be Used to be

mainframe, mainframe, mini and mini and micromicro

NowNow Desktop (portable?)Desktop (portable?)

Price/performance, single app, graphicsPrice/performance, single app, graphics ServerServer

Reliability, scalability, throughputReliability, scalability, throughput EmbeddedEmbedded

Not only “toasters”, but also cell phones, etc.Not only “toasters”, but also cell phones, etc.Cost, power, real-time performanceCost, power, real-time performance

Page 9: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

9

Chip PerformanceChip Performance Based on a number of factorsBased on a number of factors

Feature size (or “technology” or “process”)Feature size (or “technology” or “process”)Determines transistor & wire densityDetermines transistor & wire densityUsed to be measured in microns, now nanometersUsed to be measured in microns, now nanometersCurrently: 90 nm, 65 nm, even 45 nmCurrently: 90 nm, 65 nm, even 45 nm

Die sizeDie size Device speedDevice speed

Note section on wires in HP4Note section on wires in HP4 Thin wires -> more resistance and capacitanceThin wires -> more resistance and capacitance Wire delay scales poorlyWire delay scales poorly

Page 10: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

10

Wafer, Die, YieldWafer, Die, Yield

Page 11: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

11

PackagingPackaging

Page 12: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

12

ITRSITRSInternational Technology Roadmap for International Technology Roadmap for

SemiconductorsSemiconductors http://www.itrs.net/http://www.itrs.net/ An industry consortiumAn industry consortium Predicts trendsPredicts trends Take a look at the yearly report on their websiteTake a look at the yearly report on their website

Page 13: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

13

ITRS Predictions (2006 update)ITRS Predictions (2006 update)

Page 14: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

Aside: Ray KurzweilAside: Ray Kurzweil Kurzweil: Kurzweil:

futurist, authorfuturist, authorBook in 2005: Book in 2005:

“The Singularity “The Singularity is Near”is Near”

Movie in 2010Movie in 2010

14

Page 15: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

15

TrendsTrends Now let’s look at trends inNow let’s look at trends in

““Bandwidth” (Throughput) vs. LatencyBandwidth” (Throughput) vs. LatencyPowerPowerCostCostDependabilityDependabilityPerformancePerformance

Page 16: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

16

Bandwidth over LatencyBandwidth over Latency Very important to understand section in HP4 Very important to understand section in HP4

on page 15on page 15 What they mean by What they mean by bandwidthbandwidth is also is also

processor performance processor performance (throughput),(throughput), maybe maybe memory size, etcmemory size, etc

Let’s look at chartsLet’s look at charts

Page 17: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

17

DiskDisk

1

10

100

1000

10000

1 10 100

Relative Latency Improvement

Relative BW

Improvement

Disk

(Latency improvement = Bandwidth improvement)

Page 18: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

18

RAMRAM

1

10

100

1000

10000

1 10 100

Relative Latency Improvement

Relative BW

Improvement

MemoryDisk

(Latency improvement = Bandwidth improvement)

Page 19: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

19

LANLAN

1

10

100

1000

10000

1 10 100

Relative Latency Improvement

Relative BW

Improvement

Memory

Network

Disk

(Latency improvement = Bandwidth improvement)

Page 20: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

20

ProcessorProcessor

1

10

100

1000

10000

1 10 100

Relative Latency Improvement

Relative BW

Improvement

Processor

Memory

Network

Disk

(Latency improvement = Bandwidth improvement)

CPU high, Memory low(“Memory Wall”)

Page 21: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

21

SummarySummary

In the time that bandwidth doubles, latency In the time that bandwidth doubles, latency improves by no more than a factor of 1.2 to 1.4 improves by no more than a factor of 1.2 to 1.4 (and capacity improves faster than bandwidth)(and capacity improves faster than bandwidth)

Stated alternatively:Stated alternatively: Bandwidth improves by more than the square of the Bandwidth improves by more than the square of the

improvement in Latencyimprovement in Latency

Page 22: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

22

Why Less Improvement?Why Less Improvement? Moore’s Law helps bandwidthMoore’s Law helps bandwidth

Longer distance for signal to travel, so longer latencyLonger distance for signal to travel, so longer latency Which offsets faster transistorsWhich offsets faster transistors

Distance limits latencyDistance limits latency Speed of light lower boundSpeed of light lower bound

Bandwidth sellsBandwidth sells Capacity, processor “speed” and benchmark scoresCapacity, processor “speed” and benchmark scores

Latency can help bandwidthLatency can help bandwidth Often bandwidth is increased by adding latencyOften bandwidth is increased by adding latency

OS introduces latencyOS introduces latency

Page 23: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

23

Techniques to AmeliorateTechniques to Ameliorate CachingCaching

Use capacity (“bandwidth”) to reduce average latencyUse capacity (“bandwidth”) to reduce average latency

ReplicationReplication Again, leverage capacityAgain, leverage capacity

PredictionPrediction Use extra processing transistors to pre-fetchUse extra processing transistors to pre-fetch Maybe also to recompute instead of fetchMaybe also to recompute instead of fetch

Page 24: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

24

TrendsTrends Now let’s look at trends inNow let’s look at trends in

““Bandwidth” vs. LatencyBandwidth” vs. LatencyPowerPowerCostCostDependabilityDependabilityPerformancePerformance

Page 25: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

25

PowerPower For CMOS chips, traditional dominant energy For CMOS chips, traditional dominant energy

consumption has been in switching transistors, consumption has been in switching transistors, called dynamic powercalled dynamic power

witchedFrequencySVoltageLoadCapacitivePowerdynamic 2

2/1

For mobile devices, energy is better metric:For mobile devices, energy is better metric:

VoltageLoadCapacitiveEnergydynamic2

For fixed task, slowing clock rate reduces power, not energyFor fixed task, slowing clock rate reduces power, not energy Capacitive load a function of number of transistors Capacitive load a function of number of transistors

connected to output and of technology, which determines connected to output and of technology, which determines capacitance of wires and transistorscapacitance of wires and transistors

Dropping voltage helps both, moved from 5V to 1VDropping voltage helps both, moved from 5V to 1V Clock gatingClock gating

Page 26: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

26

ExampleExample Suppose 15% reduction in voltage Suppose 15% reduction in voltage

results in a 15% reduction in frequency. results in a 15% reduction in frequency. What is impact on dynamic power?What is impact on dynamic power?

dynamic

dynamic

dynamic

OldPower

OldPower

witchedFrequencySVoltageLoadCapacitive

witchedFrequencySVoltageLoadCapacitivePower

6.0

)85(.

)85(.85.2/1

2/1

3

2

2

Page 27: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

27

Trends in PowerTrends in Power Because leakage current flows even when a Because leakage current flows even when a

transistor is off, now transistor is off, now static powerstatic power important important tootoo

Leakage current increases in processors with Leakage current increases in processors with smaller transistor sizessmaller transistor sizes

Increasing the number of transistors increases Increasing the number of transistors increases power even if they are turned offpower even if they are turned off

In 2006, goal for leakage is 25% of total power In 2006, goal for leakage is 25% of total power consumption; high performance designs at 40%consumption; high performance designs at 40%

Very low power systems even gate voltage to Very low power systems even gate voltage to inactive modules to control loss due to leakageinactive modules to control loss due to leakage

VoltageCurrentPower staticstatic

Page 28: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

28

TrendsTrends Now let’s look at trends inNow let’s look at trends in

““Bandwidth” vs. LatencyBandwidth” vs. LatencyPowerPowerCostCostDependabilityDependabilityPerformancePerformance

Page 29: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

29

Cost of Integrated CircuitsCost of Integrated Circuits

yield test Final

packaging ofCost die testingofCost die ofCost IC ofCost

yield Die

test timedie Average hour per testingofCost die testingofCost

yield Dieper wafer Dies

waferofCost die ofCost

per wafer diesTest area Die2

diameterWafer

area Die2diameterWafer

per wafer Dies

2

area Die areaunit per Defects

1 yield Wafer yield Die

Dingwall’s Equation

Page 30: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

30

ExplanationsExplanations

Second term in “Dies per wafer”corrects for the rectangular diesnear the periphery of round wafers

“Die yield” assumes a simple empiricalmodel: defects are randomly distributedover the wafer, and yield is inverselyproportional to the complexity of thefabrication process (indicated by )

=3 for modern processes implies thatcost of die is proportional to (Die area)4

Page 31: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

31

“Revised Model Reduces Cost Estimates”, Linley Gwennap, Microprocessor Report 10(4), 25 Mar 1996

Intel AMD Cyrix MIPS PowerPC PowerPC Pentium Sun HitachiPentium 5K86 6x86 R5000 603e 604 Pro UltraSparc SH7604

Process BiCMOS CMOS CMOS CMOS CMOS CMOS BiCMOS CMOS CMOSLine width (microns) 0.35 0.35 0.44 0.35 0.64 0.44 0.35 0.47 0.8Metal layers 4 3 5 3 4 4 4 4 2Wafer size (mm) 200 200 200 200 200 200 200 200 150Wafer cost $2,700 $2,200 $2,400 $2,600 $2,500 $2,300 $2,700 $2,200 $500Die area (sq mm) 91 181 204 84 98 196 196 315 82Effective area 85% 75% 85% 48% 65% 72% 85% 68% 75%Dice/wafer 297 159 122 325 275 128 128 74 177Defects/sq cm 0.6 0.8 0.7 0.8 0.5 0.8 0.6 0.8 0.5Yield 65% 40% 36% 74% 74% 38% 42% 26% 75%Die cost $14 $40 $55 $11 $9 $47 $50 $116 $4Package size (pins) 296 296 296 272 240 304 387 521 144Package type PGA PGA PGA PBGA CQFP CQFP MCM PGA PQFPPackage cost $18 $21 $21 $11 $14 $21 $40 $45 $3Test & assembly cost $8 $10 $10 $6 $6 $12 $21 $28 $1Total mfg cost $40 $71 $86 $28 $29 $80 $144 $189 $8

Real World ExamplesReal World Examples

Page 32: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

32

TrendsTrends Now let’s look at trends inNow let’s look at trends in

““Bandwidth” vs. LatencyBandwidth” vs. LatencyPowerPowerCostCostDependabilityDependabilityPerformancePerformance

Page 33: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

33

DependabilityDependability When is a system operating properly? When is a system operating properly? Infrastructure providers now offer Service Infrastructure providers now offer Service

Level Agreements (SLA) to guarantee that Level Agreements (SLA) to guarantee that their networking or power service would be their networking or power service would be dependabledependable

Systems alternate between 2 states of service Systems alternate between 2 states of service with respect to an SLA:with respect to an SLA: Service accomplishment, where the service is Service accomplishment, where the service is

delivered as specified in SLAdelivered as specified in SLA Service interruption, where the delivered service is Service interruption, where the delivered service is

different from the SLAdifferent from the SLA

Failure = transition from state 1 to state 2Failure = transition from state 1 to state 2 Restoration = transition from state 2 to state 1Restoration = transition from state 2 to state 1

Page 34: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

34

DefinitionsDefinitionsModule reliability = measure of continuous Module reliability = measure of continuous

service accomplishment (or time to failure)service accomplishment (or time to failure) Two key metrics:Two key metrics:

Mean Time To Failure (MTTF) measures ReliabilityMean Time To Failure (MTTF) measures ReliabilityFailures In Time (FIT) = 1/MTTF, the rate of failures Failures In Time (FIT) = 1/MTTF, the rate of failures

Traditionally reported as failures per billion hours of Traditionally reported as failures per billion hours of operationoperation

Derived metrics:Derived metrics:Mean Time To Repair (MTTR) measures Service Mean Time To Repair (MTTR) measures Service

InterruptionInterruptionMean Time Between Failures (MTBF) = MTTF+MTTRMean Time Between Failures (MTBF) = MTTF+MTTR

Module availability measures service as alternate Module availability measures service as alternate between the 2 states of accomplishment and between the 2 states of accomplishment and interruption (number between 0 and 1, e.g. 0.9)interruption (number between 0 and 1, e.g. 0.9)

Module availability = MTTF / ( MTTF + MTTR)Module availability = MTTF / ( MTTF + MTTR)

Page 35: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

35

Example -- Calculating ReliabilityExample -- Calculating Reliability If modules have If modules have exponentially distributed lifetimesexponentially distributed lifetimes (age of (age of

module does not affect probability of failure), overall failure module does not affect probability of failure), overall failure rate is the sum of failure rates of the modulesrate is the sum of failure rates of the modules

Calculate Calculate FITFIT and and MTTFMTTF for 10 disks (1M hour MTTF per disk), 1 for 10 disks (1M hour MTTF per disk), 1 disk controller (0.5M hour MTTF), and 1 power supply (0.2M disk controller (0.5M hour MTTF), and 1 power supply (0.2M hour MTTF):hour MTTF):

FailureRate

MTTF

Solution next

Page 36: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

36

SolutionSolution

hours

MTTF

FIT

eFailureRat

000,59

000,17/000,000,000,1

000,17

000,000,1/17

000,000,1/5210

000,200/1000,500/1)000,000,1/1(10

If modules have If modules have exponentially distributed lifetimesexponentially distributed lifetimes (age of module does not affect probability of (age of module does not affect probability of failure), overall failure rate is the sum of failure failure), overall failure rate is the sum of failure rates of the modulesrates of the modules

Calculate Calculate FITFIT and and MTTFMTTF for 10 disks (1M hour MTTF for 10 disks (1M hour MTTF per disk), 1 disk controller (0.5M hour MTTF), and 1 per disk), 1 disk controller (0.5M hour MTTF), and 1 power supply (0.2M hour MTTF):power supply (0.2M hour MTTF):

Page 37: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

37

TrendsTrends Now let’s look at trends inNow let’s look at trends in

““Bandwidth” vs. LatencyBandwidth” vs. LatencyPowerPowerCostCostDependabilityDependabilityPerformancePerformance

Page 38: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

38

First, What is Performance?First, What is Performance? The starting point is universally acceptedThe starting point is universally accepted

““The time required to perform a specified amount of The time required to perform a specified amount of computation is the ultimate measure of computer computation is the ultimate measure of computer performance”performance”

How should we summarize (reduce to a single How should we summarize (reduce to a single number) the measured execution times (or number) the measured execution times (or measured performance values) of measured performance values) of severalseveral benchmark programs?benchmark programs?Two propertiesTwo properties

A single-number performance measure for a set of A single-number performance measure for a set of benchmarks expressed in units of time should be benchmarks expressed in units of time should be directly proportional to the total (weighted) time directly proportional to the total (weighted) time consumed by the benchmarks.consumed by the benchmarks.

A single-number performance measure for a set of A single-number performance measure for a set of benchmarks expressed as a rate should be inversely benchmarks expressed as a rate should be inversely proportional to the total (weighted) time consumed by proportional to the total (weighted) time consumed by the benchmarks.the benchmarks.from “Characterizing Computer Performance with a Single Number”, J. E. Smith, CACM, October 1988, pp. 1202-1206

Page 39: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

39

Quantitative Principles of Computer Quantitative Principles of Computer DesignDesign Performance is in units of things per secPerformance is in units of things per sec

So bigger is betterSo bigger is better

What if we are primarily concerned with What if we are primarily concerned with response time?response time?

T

1P

Execution timeResponse timeLatency

Execution timeResponse timeLatency

PerformanceRate of producing resultsThroughputBandwidth

PerformanceRate of producing resultsThroughputBandwidth

bitn / instructio / programresult / work /

time

time

bits / nsinstructio / program / resultswork /

Page 40: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

40

Performance: What to measure?Performance: What to measure? What about just MIPS and MFLOPS?What about just MIPS and MFLOPS? Usually rely on benchmarks vs. real workloadsUsually rely on benchmarks vs. real workloads Older measures wereOlder measures were

Kernels orKernels or Small programs designed to mimic real workloadsSmall programs designed to mimic real workloads

Whetstone, DhrystoneWhetstone, Dhrystone http://www.netlib.org/benchmark Note LINPACK and Top500Note LINPACK and Top500

Page 41: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

41

MIPSMIPS

MIPS10 timeCPU

countn Instructio

10CPI

Clockrate

timeCPU

countn Instructio

CPI

ClockrateClockrate

countn InstructioCPI timeCPU

66

Machines with different Machines with different instruction sets?instruction sets?

Programs with different Programs with different instruction mixes?instruction mixes?

Uncorrelated with Uncorrelated with performanceperformance Marketing metricMarketing metric

““Meaningless Indicator of Meaningless Indicator of Processor Speed”Processor Speed”

Page 42: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

42

MFLOP/sMFLOP/s

610 timeCPU

operations FP ofNumber MFLOP/s

Popular in supercomputing Popular in supercomputing communitycommunity

Often not where time is Often not where time is spentspent

Not all FP operations are Not all FP operations are equalequal ““Normalized” MFLOP/sNormalized” MFLOP/s

Can magnify performance Can magnify performance differencesdifferences A better algorithm (e.g., A better algorithm (e.g.,

with better data reuse) can with better data reuse) can run faster even with higher run faster even with higher FLOP countFLOP count

Page 43: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

43

Peak Performance?Peak Performance?

Page 44: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

44

BenchmarksBenchmarks To increase predictability, collections of benchmark applications, To increase predictability, collections of benchmark applications,

called called benchmark suitesbenchmark suites, are popular, are popular SPECCPUSPECCPU: popular desktop benchmark suite: popular desktop benchmark suite

CPU only, split between integer and floating point programsCPU only, split between integer and floating point programs SPECint2000 has 12 integer, SPECfp2000 has 14 integer pgmsSPECint2000 has 12 integer, SPECfp2000 has 14 integer pgms SPECCPU2006 was announced Spring 2006SPECCPU2006 was announced Spring 2006 SPECSFSSPECSFS (NFS file server) and (NFS file server) and SPECWebSPECWeb (WebServer) added as server (WebServer) added as server

benchmarksbenchmarks www.spec.org

Transaction Processing CouncilTransaction Processing Council measures server performance measures server performance and cost-performance for databasesand cost-performance for databases TPC-CTPC-C Complex query for Online Transaction Processing Complex query for Online Transaction Processing TPC-H models ad hoc decision supportTPC-H models ad hoc decision support TPC-W a transactional web benchmarkTPC-W a transactional web benchmark TPC-App application server and web services benchmarkTPC-App application server and web services benchmark

Page 45: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

45

SPEC2006 ProgramsSPEC2006 Programs

Page 46: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

46

How to Summarize Performance?How to Summarize Performance? Arithmetic average of execution times??Arithmetic average of execution times??

But they vary in basic speed, so some would be more But they vary in basic speed, so some would be more important than others in arithmetic averageimportant than others in arithmetic average

Could add weights per program, but how to Could add weights per program, but how to pick weight? pick weight? Different companies want different weights for their Different companies want different weights for their

productsproducts

SPECRatio: Normalize execution times to SPECRatio: Normalize execution times to reference computer, yielding a ratio reference computer, yielding a ratio proportional to performance =proportional to performance = time on reference computer / time on computer being time on reference computer / time on computer being

ratedrated Spec uses an older Sun machine as referenceSpec uses an older Sun machine as reference

Page 47: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

47

RatiosRatios If program SPECRatio on Computer A is 1.25 times If program SPECRatio on Computer A is 1.25 times

bigger than Computer B, thenbigger than Computer B, then

1.25

reference

A A

referenceB

B

B A

A B

ExecutionTime

SPECRatio ExecutionTimeExecutionTimeSPECRatio

ExecutionTime

ExecutionTime Performance

ExecutionTime Performance

Note that when comparing 2 computers as a Note that when comparing 2 computers as a ratio, execution times on the reference ratio, execution times on the reference computer drop out, so choice of reference computer drop out, so choice of reference computer is irrelevant computer is irrelevant

Page 48: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

48

MeansMeans

.1 numbers, positive of tuple-an be ,,Let 1 nnrr nr

n

r

rr

rrr

rrr

rrr

ii

n

H

nG

nnA

nnQ

n

n

ii

n

ii

n

ii

n

1

111

)(mean Harmonic

1)(mean Geometric

)(mean Arithmetic

)(mean Quadratic

1

1

1

1

222

1

r

r

r

r

Page 49: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

49

Geometric MeanGeometric Mean

Since ratios, proper mean is geometric mean Since ratios, proper mean is geometric mean (SPECRatio unitless, so arithmetic mean meaningless)(SPECRatio unitless, so arithmetic mean meaningless)

1

n

ni

i

GeometricMean SPECRatio

1.1. Geometric mean of the ratios is the same as the ratio of Geometric mean of the ratios is the same as the ratio of

the geometric meansthe geometric means

2.2. Ratio of geometric means Ratio of geometric means = Geometric mean of = Geometric mean of performanceperformance ratios ratios choice of reference computer is irrelevant! choice of reference computer is irrelevant!

These two points make geometric mean of ratios These two points make geometric mean of ratios attractive to summarize performanceattractive to summarize performance

Page 50: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

50

Different TakeDifferent Take Smith (CACM 1988, see references) takes a Smith (CACM 1988, see references) takes a

different view on meansdifferent view on means First let’s look at exampleFirst let’s look at example

Page 51: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

51

RatesRates Change to MFLOPS and also look at different Change to MFLOPS and also look at different

meansmeans

Page 52: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

52

Avoid the Geometric Mean?Avoid the Geometric Mean? If benchmark execution times are normalized to If benchmark execution times are normalized to

some reference machine, and means of some reference machine, and means of normalized execution times are computed, only normalized execution times are computed, only the geometric mean gives consistent results no the geometric mean gives consistent results no matter what the reference machine ismatter what the reference machine is This has led to declaring the geometric mean as the This has led to declaring the geometric mean as the

preferred method of summarizing execution time (e.g., preferred method of summarizing execution time (e.g., SPEC)SPEC)

Smith’s commentsSmith’s comments ““The geometric mean does provide a consistent measure The geometric mean does provide a consistent measure

in this context, but it is consistently wrong.”in this context, but it is consistently wrong.” ““If performance is to be normalized with respect to a If performance is to be normalized with respect to a

specific machine, an aggregate performance measure specific machine, an aggregate performance measure such as total time or harmonic mean rate should be such as total time or harmonic mean rate should be calculated before any normalizing is done. That is, calculated before any normalizing is done. That is, benchmarks should not be individually normalized first.”benchmarks should not be individually normalized first.”

He advocates using time, or normalizing after taking He advocates using time, or normalizing after taking meanmean

Page 53: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

53

VariabilityVariability Does a single mean summarize performance Does a single mean summarize performance

of programs in benchmark suite?of programs in benchmark suite? Can decide if good predictor by characterizing Can decide if good predictor by characterizing

variability of distribution using standard variability of distribution using standard deviationdeviation

Like geometric mean, geometric standard Like geometric mean, geometric standard deviation is multiplicative rather than deviation is multiplicative rather than arithmeticarithmetic

Can simply take the logarithm of SPECRatios, Can simply take the logarithm of SPECRatios, compute the standard mean and standard compute the standard mean and standard deviation, and then take the exponent to deviation, and then take the exponent to convert back:convert back:

1

1exp ln

exp ln

n

i

i

i

GeometricMean SPECRation

GeometricStDev StDev SPECRatio

Page 54: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

54

Form of Standard DeviationForm of Standard Deviation Standard deviation is more informative if we know Standard deviation is more informative if we know

distribution has a standard formdistribution has a standard form bell-shaped normal distributionbell-shaped normal distribution, whose data are symmetric , whose data are symmetric

around mean around mean lognormal distributionlognormal distribution, where logarithms of data--not data , where logarithms of data--not data

itself--are normally distributed (symmetric) on a itself--are normally distributed (symmetric) on a logarithmic scalelogarithmic scale

For a lognormal distribution, we expect that For a lognormal distribution, we expect that

68% of samples fall in range 68% of samples fall in range

95% of samples fall in range 95% of samples fall in range

gstdevmeangstdevmean ,/

22 ,/ gstdevmeangstdevmean

Page 55: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

55

0

2000

4000

6000

8000

10000

12000

14000

wup

wis

e

swim

mgr

id

appl

u

mes

a

galg

el art

equa

ke

face

rec

amm

p

luca

s

fma3

d

sixt

rack

apsi

SP

EC

fpR

atio

1372

5362

2712

GM = 2712GSTEV = 1.98

Example (1/2)Example (1/2)

GM and multiplicative StDev of GM and multiplicative StDev of SPECfp2000 for Itanium 2SPECfp2000 for Itanium 2

Page 56: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

56

Example (2/2)Example (2/2)

GM and multiplicative StDev of SPECfp2000 for GM and multiplicative StDev of SPECfp2000 for AMD AthlonAMD Athlon

0

2000

4000

6000

8000

10000

12000

14000

wup

wis

e

swim

mgr

id

appl

u

mes

a

galg

el art

equa

ke

face

rec

amm

p

luca

s

fma3

d

sixt

rack

apsi

SP

EC

fpR

atio

1494

29112086

GM = 2086GSTEV = 1.40

Page 57: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

57

CommentsComments Standard deviation of 1.98 for Itanium 2 is Standard deviation of 1.98 for Itanium 2 is

much higher-- vs. 1.40--so results will differ much higher-- vs. 1.40--so results will differ more widely from the mean, and therefore are more widely from the mean, and therefore are likely less predictablelikely less predictable

Falling within one standard deviation: Falling within one standard deviation: 10 of 14 benchmarks (71%) for Itanium 210 of 14 benchmarks (71%) for Itanium 2 11 of 14 benchmarks (78%) for Athlon11 of 14 benchmarks (78%) for Athlon

Thus, the results are quite compatible with a Thus, the results are quite compatible with a lognormal distribution (expect 68%)lognormal distribution (expect 68%)

Page 58: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

58

Next TimeNext Time Principles of Computer DesignPrinciples of Computer Design Amdahl’s LawAmdahl’s Law

Then on to Instruction Set ArchitectureThen on to Instruction Set Architecture

Page 59: 1 The Difference Engine, Charles Babbage Images from Wikipedia (Joe D and Andrew Dunn) Slides courtesy Anselmo Lastra.

59

Readings/ReferencesReadings/References Gordon Moore’s paperGordon Moore’s paper

http://www.intel.com/pressroom/kits/events/moores_law_40th/index.htm

http://download.intel.com/museum/Moores_Law/Articles-Press_Releases/Gordon_Moore_1965_Article.pdf

Paper on which latency section is basedPaper on which latency section is based Patterson, D. A. 2004. Latency lags bandwidth. Patterson, D. A. 2004. Latency lags bandwidth.

Commun. ACMCommun. ACM 47, 10 (Oct. 2004), 71-75. 47, 10 (Oct. 2004), 71-75. ““Characterizing Computer Performance with a Characterizing Computer Performance with a

Single Number”, J. E. Smith, CACM, October Single Number”, J. E. Smith, CACM, October 1988, pp. 1202-12061988, pp. 1202-1206