Computer Architecture ELEC3441 2 nd Semester, 2017-18 Dr. Hayden Kwok-Hay So Department of Electrical and Electronic Engineering Computer Performance 2nd sem. '17-18 ENGG3441 - HS 2 How do you measure performance of a computer? How do you make a computer fast? Ways to measure Performance n Execution time (response time) ≠ Throughput n We will focus on execution time in this course 2nd sem. '17-18 ENGG3441 - HS 3 Execution Time Throughput Time to finish a task Number of tasks finish per unit time Relative Performance n Define performance of a computer as 2nd sem. '17-18 ENGG3441 - HS 4 Performance = 1 ExecutionTime n Computer B is n times faster than Computer A if: n = Performance B Performance A = ExecutionTime A ExecutionTime B
11
Embed
Relative Performance! - University of Hong Kongelec3441/sp18/handouts/L02-perf-4up.pdf2nd Semester, 2017-18 ... RISC: Reduced Instruction Set Computer ! !CISC and RISC are two different
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Computer Architecture ELEC3441
2nd Semester, 2017-18 Dr. Hayden Kwok-Hay So
Department of Electrical and
Electronic Engineering
Computer Performance
2nd sem. '17-18 ENGG3441 - HS 2
How do you measure performance of a computer?
How do you make a computer fast?
Ways to measure Performance
n Execution time (response time) ≠ Throughput n We will focus on execution time in this course
2nd sem. '17-18 ENGG3441 - HS 3
Execution Time Throughput
Time to finish a task Number of tasks finish per unit time
Relative Performancen Define performance of a computer as
2nd sem. '17-18 ENGG3441 - HS 4
Performance = 1ExecutionTime
n Computer B is n times faster than Computer A if:
n = PerformanceBPerformanceA
=ExecutionTimeAExecutionTimeB
Quick Checkn Computer A finishes a task in 5s, Computer B
finishes the same task in 4s. Which one is faster, by how much?
2nd sem. '17-18 ENGG3441 - HS 5
PerformanceBPerformanceA
=ExecutionTimeAExecutionTimeB
=54=1.25
Computer B is 1.25 times faster than Computer A
Ways to Measure Execution Timen Wall Clock Time (Elapse Time)
• The total time a user experiences that a computer takes to finish a task
• Includes OS overhead, I/O, idle time, time shared with other users
n CPU Time • The time spent on a user task in the CPU • User CPU + OS CPU time • Does not include I/O, time spent by other users, etc
n Focus on CPU Time in this course
2nd sem. '17-18 ENGG3441 - HS 6
$ time shasum afile 132ecc0e19eec19d5dc775752efeac280cecebdc afile real 0m20.177s user 0m12.835s sys 0m1.786s
2nd sem. '17-18 ENGG3441 - HS 7
How can we determine CPU time needed to execute a program?
CPUTime = # of instructionprogram
×# of cycleinstruction
×timecycle
The Iron Law
CPU Time – Step 1
n Most modern CPUs are synchronous digital systems
n The time needs to finish executing a task is determined by the number of cycles needed for that ask, multiply by the cycle time.
2nd sem. '17-18 ENGG3441 - HS 8
CPUTime =CycleCount×CycleTime
=CycleCount
ClockFrequency
Digital system design review…
Synchronous Sequential Circuitsn A synchronous sequential circuit contains exactly 1
clock signal n All state elements are connected to the same clock
signal • è the state of the entire circuit is updated at the same time
n Common form of synchronous sequential circuits:
9
clk
Comb Logic
clk
input
clk
Comb Logic
clk
Comb Logic
Comb Logic
output
2nd sem. '17-18 ENGG3441 - HS
Clock Signaln A clock signal is particularly important signal in a
synchronous sequential circuit • It controls the action of all DFFs
n A clock signal toggles between ‘0’ and ‘1’ periodically
n The frequency of the toggling determines the maximum speed of the circuit • E.g.: in the accumulator example earlier, the output S
cannot change faster than the clock frequency
10
X x0 x1 x2
S 0 x0 x0 + x1 x0 + x1 + x2
clk
1
clock period
1clock period
= clock frequency
e.g. Intel CPU runs at 3 GHz, Mobile phone processors at 1 GHz Lab FPGA board at 50 MHz
2nd sem. '17-18 ENGG3441 - HS
Timing in Synchronous Circuits
n In a synchronous sequential circuit, signal changes occur only during clock edge
n All signals are therefore synchronized to change values right after a clock edge
n In the above example, need to make sure correct value of y available BEFORE next clock edge • Avoid glitches
11
ab
c
d yclk clk
clk
2nd sem. '17-18 ENGG3441 - HS
Timing in Synchronous Circuitsn In general, the propagation
delay through the combinational logic between any two registers must be shorter than the clock period
n The longest such path is called the critical path of the circuit
n The critical path determines the maximum clock speed
12
clk
clk clk
Comb Logic
a
b
x
y
clk
1
From glitch example Stable before
clock edge
2nd sem. '17-18 ENGG3441 - HS
CPU Time – Step 1 – Summary
n To improve performance:
n Increase clock freq è shorter critical path è less work accomplished in 1 cycle è more cycles needed • Engineers need tradeoff between the two
n Program A has 2000 instructions, each instruction takes 2 cycles to finish. How many cycles does it take to complete Program A?
n Program B has 3000 instructions. 2000 of them takes 2 cycles and 1000 of them takes 1 cycle. How many cycles does the program take to finish?
2nd sem. '17-18 ENGG3441 - HS 14
CycleCount = InstructionCount×CyclePerInstruction
Average CPIn In general, different machine instructions may
take different amount of time to complete.
n Assuming n classes of instructions, then total clock cycle:
n Weighted average CPI:
2nd sem. '17-18 ENGG3441 - HS 15
ClockCycle = CPIi × InstructionCount ii=1
n
∑
CPI = CycleCountInstructionCount
= CPIi ×InstructionCount iInstructionCounti=1
n
∑
CPI Example (1)
n The ISA of computer A includes 3 classes of instructions that take different number of cycles to complete. A program P is compiled using compiler J, resulting in the utilization above.
n What is the average CPI of the compiled program?
2nd sem. '17-18 ENGG3441 - HS 16
Class C1 C2 C3Cycles 1 4 8Compiler J 100 50 100
CPI Example (2)
n A newer compiler K was developed to compile same program P, resulting in the utilization above.
n What is the average CPI of the compiled program using compiler K?
2nd sem. '17-18 ENGG3441 - HS 17
Class C1 C2 C3Cycles 1 4 8Compiler J 100 50 100Compiler K 350 100 50
Ans: 2.3
Which compiler was better…?
CPI Example (3)
n Observation: • Compiler J results in higher CPI • Compiler K uses more instructions
n Q2: Which instruction class, when its cycle count is reduced by half, will result in most performance improvement? • Largest CPI? • Most used? • Most cycles used?