1 | Page Computer Architecture IN 2320 Lesson 02 – Introduction Computer architecture: Deals with the functional behavior of a computer system as viewed by a programmer Ex: the size of a data type –32 bits to an integer Computer organization: Deals with structural relationships that are not visible to the programmer Ex: clock frequency or the size of the physical memory Levels of a computer: 1. User Level: Application Programs (HIGH LEVEL) 2. High level languages 3. Assembly Language/ Machine Code 4. Microprogrammed/ Hardwired Control 5. Functional Units (Memory, ALU, etc) 6. Logic Gates 7. Transistors and Wires (LOW LEVEL) Computer Architecture-Definition: The attributes of the computer system that are visible to programmers i.e. the attributes of the computer system that have a direct impact on the logical execution of a program Ex: the instruction set, the size of a data type, techniques of addressing the memory EX: Architectural issue is whether a computer will have a multiply instruction Computer Organization-Definition: The operational units and their interconnection that realize the architectural specifications Ex: control signals, interface between computer and peripherals, memory technology used Ex: Organizational issue is whether multiply instruction is implemented using the a separate cct or whether it is implemented using the repeated use of adder cct. Organozational decision may based on the several parameters such as anticipated frequency of the use of multiply instruction.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1 | P a g e
Computer Architecture IN 2320
Lesson 02 – Introduction
Computer architecture:
Deals with the functional behavior of a computer system as viewed by a programmer
Ex: the size of a data type –32 bits to an integer
Computer organization:
Deals with structural relationships that are not visible to the programmer
Ex: clock frequency or the size of the physical memory
Levels of a computer:
1. User Level: Application Programs (HIGH LEVEL)
2. High level languages
3. Assembly Language/ Machine Code
4. Microprogrammed/ Hardwired Control
5. Functional Units (Memory, ALU, etc)
6. Logic Gates
7. Transistors and Wires (LOW LEVEL)
Computer Architecture-Definition:
The attributes of the computer system that are visible to programmers i.e. the attributes of the
computer system that have a direct impact on the logical execution of a program
Ex: the instruction set, the size of a data type, techniques of addressing the memory
EX: Architectural issue is whether a computer will have a multiply instruction
Computer Organization-Definition:
The operational units and their interconnection that realize the architectural specifications
Ex: control signals, interface between computer and peripherals, memory technology used
Ex: Organizational issue is whether multiply instruction is implemented using the a separate cct
or whether it is implemented using the repeated use of adder cct.
Organozational decision may based on the several parameters such as anticipated frequency of
the use of multiply instruction.
2 | P a g e
Forces on Computer Architecture:
Technology
Programming Languages
Applications
OS
History
The Computer Architect’s view:
Architect is concerned with design & performance
Designs the ISA for optimum programming utility and optimum performance of implementation
Designs the hardware for best implementation of the instructions
Uses performance measurement tools, such as benchmark programs, to see that goals are met
Balances performance of building blocks such as CPU, memory, I/O devices, and
interconnections
Meets performance goals at lowest cost
Factors involving when selecting a better computer are;
1. COST factors
a. Cost of hardware design
b. Cost of software design (OS, applications)
c. Cost of manufacture
d. Cost of end purchaser
2. PERFORMANCE factors
a. What programs will be run?
b. How frequently will they be run?
c. How big are the programs?
d. How many users?
e. How sophisticated are the users (User level)?
f. What I/O devices are necessary?
g. There are two ways to make computers go faster.
i. Wait sometime (year). Implement in a faster/better/newer technology.
1. More transistors will fit on a single chip.
2. More pins can be placed around the IC.
3. The process used will have electronic devices (transistors) that switch
faster.
ii. New/innovative architectures and architectural features, and clever
implementations of existing architectures.
3 | P a g e
Higher Computer performance may involve one or more of the following:
Short response time for a given piece or work
o The total time taken by a functional unit to respond to a request for service
o Functional unit/ execution unit is a part of CPU that performs the operations and
calculations as instructed by a computer.
High throughput (rate or processing work)
o Rate at which something can be processed
Low utilization of computing resources
o System resources(practical): physical or virtual entities of limited availability
Ex: memory, processing capacity, network speed
o Computational resources(abstract): resources used for solving a computational problem
Ex: computational time, memory space
Fast data compression and decompression
High bandwidth
Short data transmission time
*note red coloured performance factors are the area of interest.
Throughput:
if(no overlap or if no parallelism)
throughput = 1/average response time
else
throughput > 1/average response time
//the number of parallel processing computers are also important
Elapsed time/response time:
Elapsed time = Response time = CPU time + I/O wait time
CPU time = time spent running a program
Performance= 1/response time
Since we are more concerned about CPU time,
Performance = 1/CPU time
*note Improve Performance
1. Faster the CPU
Helps to improve both response time and throughput
2. Add more CPUs
Helps to improve throughput and perhaps response time due to less queuing
4 | P a g e
*Note: Selection is depend on what is important to whom, i.e. cost factors and performance factors
Ex 01: Computer system user
Goal: Minimize elapsed time for program=time_end-time_start
Called response time (counted in ms)
Ex 02: Computer Center Manager
Goal: Maximize completion rate = no. of jobs per second
Called throughput (counted per sec)
Factors driving architecture:
Effective use of new technology
Can a desired performance improvement
Performance Metrics
Values derived from some fundamental measurements:
Count of how many times an event occurs
Duration of a time interval
Size of some parameter
Some basic metrics include:
Response time
o Elapse time from request to response
o Elapsed time = Response time = CPU time + I/O wait time
CPU time = time spent running a program
Performance time= 1/response time
Since we are more concerned about CPU time,
Performance time= 1/CPU time
o CPU time is affected by;
Number of instructions in the program
Average number of clock cycles to complete one instruction
Clock cycle time
Throughput
o Jobs or operations completed per unit of time
Bandwidth
o Bits per second
Resource utilization
5 | P a g e
Standard benchmark metrics
SPEC
TCP
Characteristics of good metrics:
Linear
o Proportional to the actual system performance
Reliable
o Larger value -> better performance
Repeatable
o Deterministic when measured
Consistent
o Units and definition constant across systems
Independent
o Independent from influence of vendors
Easy to measure
Some examples of Standard Metrics:
MIPS
MFLOPS, GFLOPS, TFLOPS, PFLOPS
SPEC metrics
TCP metrics
Parameters of Performance Metrics:
Clock rate (=1/Clock cycle time)
Instructions per program (I/P)
Average clock cycles per instruction (CPI)
Service time
Interarrival time (time between arrivals of successive requests)
Number of users
Think time
*note Execution time (CPU time, runtime) = I/P * CPI * clock cycle time <= Iron Law
All the three factors are combined to affect the metric Execution time.
I/P -> depend on compiler
CPI -> depend on CPU design/organization
Clock cycle time -> processor architecture
6 | P a g e
Ex01:
Our program takes 10s to run on computer A, which has 400 MHz clock. We want it to run in 6s. The
designer says that the clock rate can be increased, but it will cause the total number of clock cycles for
the program to increase to 1.2 times the previous value. What is the minimum clock rate required to get
the desired speedup?
Answer:
Old Machine A New Machine A
Runtime 10s 6s
Clock Rate 400Hz CR
Let Total number of clock cycles per program in old machine A = x
Since clock cycles per program = Clock Rate * Runtime
x = 400 * 10 = 4000 cycles
Total number of clock cycles per program in new machine A = 1.2 x
= 1.2 * 4000
= 6 * CR
6 * CR = 1.2 * 4000
CR = 800Hz
Workload:
A test case for the system
Benchmark:
A set of workload which together is representative of ‘my program’ should be reproducible.
Ex02:
Which is faster? A or B?
Test Case Machine A Machine B
1 1s 10s
2 100s 10s
Assume Test Case 1 type processes happen 99% of the time
7 | P a g e
Answer:
We have to obtain the weighted average of runtime.
Weighted average for A = 1(99)+ 100(1)
100 = 1.99 s <= answer is A
Weighted average for B = 10(99)+ 10(1)
100 = 10 s
*note
Cost of improving the processor is high. But if you find that you are needed a particular circuit 99% of
the time (ex: multiplication instruction), then you can improve that circuit from 2, 3 factors. You will
improve the performance as a whole that way.
Performance comparison
Performance = 1
𝑡𝑖𝑚𝑒
There are 2 machines A and B.
Performance(A) = 1
𝑡𝑖𝑚𝑒(𝐴)
Performance(B) =1
𝑡𝑖𝑚𝑒(𝐵)
Therefore;
Performance(A)
Performance(B) =
𝑡𝑖𝑚𝑒(𝐵)
𝑡𝑖𝑚𝑒(𝐴) = 1 +
𝑥
100 iff A is x% faster than B
Ex03:
time(B) = 10s, time(B) = 15s
Performance(A)
Performance(B) =
𝑡𝑖𝑚𝑒(𝐵)
𝑡𝑖𝑚𝑒(𝐴) =
15
10 = 1.5 = 1 +
50
100 i.e. A is 50% faster than B
Breaking down performance:
A program is broken into instructions.
o Hardware is aware of instructions, not programs.
At lower level, hardware breaks into instructions into cycles.
o Lowe level state machines change state every cycle
For example 500MHz P-III runs 500M cycles/sec, 1 cycle = 2ns
Since 2 way set associative mapping is considered, a set contains 2 line or 2 cache blocks
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑎𝑐ℎ𝑒 = 𝑣 = 16 𝐾
2= 8 𝐾 = 213
Now it is in the correct format of v = 2d
Therefore we need 13 bits to represent the set number which a Main memory address belongs to.
Remaining 9 bits of the main memory block number is taken as the tag that identifies a particular
main memory block uniquely within the set.
Figure 18: Three main components of Main memory address
35 | P a g e
Cache Replacement Algorithms There is the possibility of mapped cache memory becoming fully occupied. At such an instance removing
an existing block from cache and loading the new block to cache is done.
Replacement is depends on the mapping mechanism.
Mapping mechanism Moment where replacing will be needed
How replacement mechanism is done
Direct Mapping if the mapped cache block is full No choice that particular block have to be replaced
Associative Mapping if all the cache blocks are full Hardware implemented algorithm (fast) * Least Recently Used (LRU) * First In First Out (FIFO) * Least Frequently Used (LFU) * Random
Set Associative Mapping if the mapped set is full
Least Recently Used (LRU)
Replace that block in the set that has been in the cache longest with no reference to it. For two – way
set associative, this is really implemented. Each cache line includes a USE bit. When a line is referenced,
its USE bit is set to 1 and the USE bit of the other line in that set is set to 0. When a block is to read into
the set, the line whose USE bit is 0 is used. Because we are assuming that more recently used memory
locations are more likely to be referenced, LRU should give the best hit ratio.
LRU is also relatively easy to implement for a fully associative cache. The cache mechanism maintains a
separate list of indexes to all the lines in the cache. When a line is referenced, it moves to the front of
the list. For replacement, the line at the back of the list is used. Because of its simplicity of
implementation, LRU is the most popular replacement algorithm.
First In First Out (FIFO)
Replace that block in the set that has been in the cache the longest. FIFO is easily implemented as a
round-robin or circular buffer technique.
Least Frequently Used (LFU)
Replace that block in the set that has experienced the fewer references. LFU could be implemented by
associating a counter with each line.
Random
A technique not based on usage (i.e., not LRU, LFU, FIFO, or some variant) is to pick a line at random
from among the candidate lines. Simulation studies have shown that random replacement provides only
slightly inferior performance to an algorithm based on usage.
36 | P a g e
Write Policy When a block that is in the cache is to be replaced, there are 2 cases to consider,
1. If the old block in the cache has not been modified, then overwriting can be done without any
issue.
2. If the old block in the cache has been modified, then main memory must be updated by writing
the line of cache out to the block of main memory before bringing the new block to that place.
There are 2 problems related to writing back to main memory:
1. More than one device have the access to main memory.
Ex: An I/O module may be able to read-write directly to memory. If a word has been
altered only in the cache, then the corresponding memory word is invalid. If the I/O
device has altered main memory, then the cache word is invalid.
2. Multiple processors are attached to the same bus and each processor has its own local cache.
If a word is altered in one cache, it could be conceivably invalidate a word in other
caches.
There are 2 techniques for Write Policy:
1. Write through policy
2. Write back policy
Write through policy
All write operations are made to main memory as well as to the cache, ensuring that main
memory is always valid.
Any other processor-cache module can monitor traffic to main memory to maintain consistency
within its own cache.
The main disadvantage of this technique is that it generates substantial memory traffic and may
create a bottleneck. Overall performance will go down this way.
Write back policy
In this technique updates are made only in the cache.
When an update occurs, a dirty bit, or use bit, associated with the line is set. Then, when a block
is replaced, it is written back to main memory if and only if the dirty bit is set.
The problem with write back policy is that portions of main memory are invalid, and hence
accesses by I/O modules can be allowed only through the cache. This makes for complex
circuitry and a potential bottleneck.
Sir did not talk about cache coherency
37 | P a g e
Line Size (Block Size) As the block size increases from very small to larger sizes, the hit ratio will at first increase because of
the principle of locality. The hit ratio will began to decrease as the block becomes even bigger.
Two specific effects come into play when block sizes are getting larger:
Reduces the number of blocks that fit into main memory
Some additional words are farther from the requested word and therefore less likely to be
needed in near future
Number of caches When caches were introduced originally systems used only one cache. More recently, the use of
multiple caches has become the norm.
Two aspects of this design issue concerns,
1. The number of cache levels
2. The use of unified vs split caches
Cache Performance Cache has an important effect on the overall system performance.