1 Excess-k notation A method for representing signed integers — add excess amount k to every number to obtain a positive integer (smallest number that can be represented is k) — most significant bit represents sign (1 = positive, 0 = negative) Examples (excess-128): 1 0 1 0 0 0 0 1 = +33 0 1 0 1 1 1 1 1 = 33 1 0 0 0 0 0 0 0 = 0 unique representation of 0! If bias k = 2 n-1 , then excess-k is almost identical to two’s complement — sign-bit reversed (1 means positive) — comparing integers identical to the unsigned case Big endian (MSB first) vs. little endian (LSB first) 28 64 32 16 8 4 2 1
Excess- k notation. A method for representing signed integers add excess amount k to every number to obtain a positive integer (smallest number that can be represented is k ) most significant bit represents sign (1 = positive, 0 = negative) Examples (excess-128): - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Excess-k notation
A method for representing signed integers— add excess amount k to every number to obtain a positive integer
(smallest number that can be represented is k)— most significant bit represents sign (1 = positive, 0 = negative)
If bias k = 2n-1, then excess-k is almost identical to two’s complement— sign-bit reversed (1 means positive)— comparing integers identical to the unsigned case
Big endian (MSB first) vs. little endian (LSB first)
128 64 32 16 8 4 2 1
2
Floating Point Numbers
How can we represent 3.14 ?— What’s wrong with: (int_part, frac_part)— 3.14 and 3.014 have the same representation!
The leading-zeroes problem can be solved if numbers are normalized— write the number in the form d.f 10e , d is a single non-zero digit— normalized(3.14) = 3.14 100, normalized(0.314) = 3.14 101
In binary, the “d” part will always be 1 (zero is a special case)— this implicit 1 can be ignored
Ideal representation scheme has these features:— can represent positive and negative, low and high magnitude— it is easy to compare two numbers— it is easy to do basic math
3
IEEE 754 standard
Format for single-precision (32-bit) and double-precision (64-bit) reals
The normalized (non-zero) binary number 1.f 2e is stored as
Comparison of floats almost identical to comparison of ints!
MIPS has separate floating point registers and instructions
23-bit fraction f8-bit exponent eexcess-127
notation
1 sign bit1 =
negative0 =
positive
single precision float
52-bit fraction f11-bit exponent eexcess-1023
notation
1 sign bit1 =
negative0 =
positive
double precision double
4
Two notions of performance
Which has higher performance?
From a passenger’s viewpoint: latency (time to do the task)— hours per flight, execution time, response time
From an airline’s viewpoint: throughput (tasks per unit time)— passengers per hour, bandwidth
Latency and throughput are often in opposition
Aircraft DC to Paris Passengers
747 6 hours 500
Concorde 3 hours 125
5
Some Definitions
Latency is time per task (e.g. hours per flight)
If we are primarily concerned with latency,
Performance(x) = 1 execution_time(x)
Throughput is number of tasks per unit time (e.g. passengers per hour)
Performance(x) = throughput(x) Again, bigger is better
Relative performance: “x is N times faster than y”
N = Performance(x) Performance(y)
Bigger is better
6
CPU performance
The obvious metric: how long does it take to run a test program?— Aircraft analogy: how long does it take to transport 1000
passengers?
Our vocabulary Aircraft analogy
N instructions N passengers
c cycles per instruction (1/c) passengers per flight
Instructions executed:—the dynamic instruction count (#instructions actually executed)—not the (static) number of lines of code
Cycles per instruction:—average number of clock cycles per instruction—function of the machine and program
• CPI(floating-point operations) CPI(integer operations)• Pentium executes same instructions as an 80486, but faster
—Single-cycle machine: each instruction takes 1 cycle (CPI = 1)• CPI can be 1 due to memory stalls and slow instructions• CPI can be 1 on superscalar machines
Clock cycle time: 1 cycle = minimum time it takes the CPU to do any work—clock cycle time = 1/ clock frequency—500MHz processor has a cycle time of 2ns (nanoseconds)—2GHz (2000MHz) CPU has a cycle time of just 0.5ns—higher frequency is usually better
The easiest way to remember this is match up the units:
Make things faster by making any component smaller!
Often easy to reduce one component by increasing another
Execution time, again
Seconds
=
Instructions
*
Clock cycles
*
Seconds
Program
Program Instructions Clock cycle
Program Compiler ISA Organization
Technology
InstructionExecuted
CPI
Clock Cycle Time
9
Let’s compare the performances two x86-based processors—An 800MHz AMD Duron, with a CPI of 1.2 for an MP3
compressor—A 1GHz Pentium III with a CPI of 1.5 for the same program
Compatible processors implement identical instruction sets and will use the same executable files, with the same number of instructions
But they implement the ISA differently, which leads to different CPIs
CPU timeAMD,P= InstructionsP * CPIAMD,P * Cycle timeAMD
=
CPU timeP3,P = InstructionsP * CPIP3,P * Cycle timeP3
=
Example: ISA-compatible processors
10
Another Example: Comparing across ISAs
Intel’s Itanium (IA-64) ISA is designed facilitate executing multiple instructions per cycle. If it achieves an average of 3 instructions per cycle, how much faster is it than a Pentium4 (which uses the x86 ISA) with an average CPI of 1?
a) Itanium is three times fasterb) Itanium is one third as fastc) Not enough information
11
Improving CPI
Some processor design techniques improve CPI— Often they only improve CPI for certain types of instructions
where Fi = fraction of instructions of type i
First Law of Performance:
Make the common case fast
i
n
ii FCPICPI
1
Second Law of Performance:
Make the fast case common
12
Example: CPI improvements
Base Machine:
How much faster would the machine be if:— we added a cache to reduce average load time to 3 cycles?— we added a branch predictor to reduce branch time by 1
cycle?— we could do two ALU operations in parallel?
Op Type Freq (Fi) CPIi contribution to CPI
ALU 50% 3
Load 20% 6
Store 20% 3
Branch 10% 2
13
Amdahl’s Law states that optimizations are limited in their effectiveness
Example: Suppose we double the speed of floating-point operations—If only 10% of the program execution time T involves floating-
point code, then the overall performance improves by just 5%