1 Department of Electrical & Computer Engineering EC 413 Computer Organization Prof. Michel A. Kinsy Summary Department of Electrical & Computer Engineering Computing Devices Then… Department of Electrical & Computer Engineering Computing Devices Now
23
Embed
EC 413 Computer Organization · 2019-12-03 · ADD PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU Overflow zero RegWrite Address Write
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Department of Electrical & Computer Engineering
EC 413Computer Organization
Prof. Michel A. Kinsy
Summary
Department of Electrical & Computer Engineering
Computing Devices Then…
Department of Electrical & Computer Engineering
Computing Devices Now
2
Department of Electrical & Computer Engineering
The Von Neumann Architecture§ Stored Program Computer
The Von Neumann Architecture§ The modern computer system has three major functional hardware
units: CPU, Main Memory, and Input/Output devices
Processor Memory
Control Bus
211417
100
ReadAddress
Instruction[31-0]
ADD
PC
4
Write Data
Read Addr 1
Read Addr 2
Write AddrRegister File
Read Data 1
Read Data 2
ALU
Overflow
zero
RegWrite
Address
Write Data
Read Data
MemWrite
MemRead
SignExtend
16 32
MemtoReg
ALUSrc
Shiftleft 2
ADD
PCSrc
RegDst
ALUControl
1
1
1
0
00
0
1
ALUOp
Instr[5-0]
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15 -11]
ControlUnit
Instr[31-26]
Branch
Device#1
Device#n
I/O Devices
…
Address Bus
Data Bus
…
External World
Department of Electrical & Computer Engineering
The Von Neumann Architecture
§ At the most basic sense, a computer is a device consisting of three units performing three distinctive functions • A processor to interpret and execute programs• A memory to store both data and programs• A mechanism for transferring data to and from the
Amdahl's Law Revisited § This law answers the critical question:
§ How much of a speedup one can get for a given parallelized task?
§ If s is the fraction of a calculation that is sequential, and (1-s) is the fraction that can be parallelized, then the maximum speed-up that can be achieved by using n processors is
§ Speed-up = 1
s+1− sn
Department of Electrical & Computer Engineering
Amdahl's Law Revisited§ If 80% of a calculation can be parallelized, i.e.
20% is sequential, then what is the maximum speed-up which can be achieved on 8 processors? § What if we double the number of processors (n =
16)?§ What if we double the number of processors again
(n = 32)?
§ What if the number of processors is 1000?
5
Department of Electrical & Computer Engineering
Amdahl's Law Revisited§ If 50% of a calculation can be parallelized, i.e.
50% is sequential, then what is the maximum speed-up which can be achieved on 8 processors? § What if we double the number of processors (n =
16)?§ What if we double the number of processors again
Time = Instructions Cycles TimeProgram Program * Instruction * Cycle
11
Department of Electrical & Computer Engineering
Amdahl's Law § By Gene Amdahl§ This law answers the critical question:
§ How much of a speedup one can get for a given architectural improvement/enhancement?§ The performance enhancement possible due to a given
design improvement is limited by the amount that the improved feature is used
§ Performance improvement or speedup due to enhancement E
Execution Time without E Performance with ESpeedup(E) = =
Execution Time with E Performance without E
Department of Electrical & Computer Engineering
Processor- Memory Gap§ Performance gap: CPU (55% each year) vs. DRAM (7%
each year)§ Processor operations take of the order of 1 ns§ Memory access requires 10s or even 100s of ns§ Each instruction executed involves at least one memory
access
1990 1980 2000 2010 1
10
10
Rel
ativ
e pe
rform
ance
Calendar year
Processor
Memory
3
6
Department of Electrical & Computer Engineering
Memory Technology§ Single-transistor DRAM cell is considerably simpler
than SRAM cell§ This leads to dense, high-capacity DRAM memory
chipsWord line
Capacitor
Bit line
Pass transistor
Word line
Bit line
Compl. bit line
Vcc
(a) DRAM cell (b) Typical SRAM cell DRAM Cell SRAM Cell
12
Department of Electrical & Computer Engineering
A Typical Memory Hierarchy
Register File
Instruction Cache Data Cache
L2 Cache
L3 Cache
Main Memory
Disk
Bypass Network
Capacity +Speed -
Speed +Capacity -
Inside the processor
Department of Electrical & Computer Engineering
Memory Organization§ A memory cannot be large and fast§ Increasing sizes of cache at each level
§ A hit at a level occurs if that level of the memory contains the data needed by the CPU
§ A miss occurs if the level does not contain the requested data
CPU L1 L2 DRAM
Department of Electrical & Computer Engineering
A Typical Memory Hierarchy
L1 Data Cache
L1 Instruction Cache Unified L2
Cache
RF Memory
Memory
Memory
Memory
Multi-ported register file (part of CPU)
Split instruction & data primary caches (on-chip SRAM)
Multiple interleaved memory banks(off-chip DRAM)
Large unified secondary cache (on-chip SRAM)
CPU
13
Department of Electrical & Computer Engineering
Multilevel Caches§ Cache is transparent to user (happens
automatically)
CPU CacheMemory
MainMemory
RegFile
WordLine
Data is in the cache fraction h
of the time Go to main 1 – h of the time
Department of Electrical & Computer Engineering
Caches§ Local miss rate = misses in cache / accesses to
cache§ Global miss rate = misses in cache / CPU memory
accesses§ Misses per instruction = misses in cache / number
of instructions
CPU L1 L2 DRAM
Department of Electrical & Computer Engineering
Address Bit-Field Partitioning§ The address (e.g., 32-bit) issued by the CPU is generally
divided into 3 fields § Tag
§ Serves as the unique identifier for a group of data§ Different regions of memory may be mapped to the same cache
location/block§ The tag is used to differentiate between them
§ Index § It is used to index into the cache structure
§ Block Offset§ The least significant bits are used to determine the exact data word§ If the block size is B then b = log2B bits will be needed in the address
to specify data word
BlockOffsetTag IndexAddress
t bits k bits b bits
14
Department of Electrical & Computer Engineering
Direct-Mapped Cache
Tag Data BlockV
=
BlockOffsetTag Index
tk b
t
HIT Data Word or Byte
2k
lines
Department of Electrical & Computer Engineering
Caching principles § Cache size (in bytes or words)
§ Total cache capacity § A larger cache can hold more of the program’s
useful data but is more costly and likely to be slower
§ Block or cache-line size § Unit of data transfer between cache and main§ With a larger cache line, more data is brought in
cache with each miss. This can improve the hit rate but also may bring low-utility data in cache
Department of Electrical & Computer Engineering
Caching principles § Placement policy
§ Determining where an incoming cache line is stored§ More flexible policies imply higher hardware cost and
may or may not have performance benefits (due to more complex data location)
§ Replacement policy § Determining which of several existing cache blocks
(into which a new cache line can be mapped) should be overwritten
§ Typical policies: choosing a random or the least recently used block
15
Department of Electrical & Computer Engineering
Caching Principles § Compulsory misses
§ With on-demand fetching, first access to any item is a miss
§ Capacity misses§ We have to evict some items to make room for others§ This leads to misses that are not incurred with an
infinitely large cache
§ Conflict misses§ The placement scheme may force us to displace useful
items to bring in other items§ This may lead to misses in future
Department of Electrical & Computer Engineering
Caching principles § Line width (2W)
§ Too small a value for W causes a lot of main memory accesses
§ Too large a value increases the miss penalty and may tie up cache space with low-utility items that are replaced before being used
§ Set size or associativity (2S)§ Direct mapping (S = 0) is simple and fast§ Greater associativity leads to more complexity, and
thus slower access, but tends to reduce conflict misses
Department of Electrical & Computer Engineering
Caching Principles § Cache contains copies of some of Main Memory
§ Those storage locations recently used§ When Main Memory address A is referenced in CPU§ Cache checked for a copy of contents of A
§ If found, cache hit§ Copy used§ No need to access Main Memory
§ If not found, cache miss§ Main Memory accessed to get contents of A§ Copy of contents also loaded into cache
16
Department of Electrical & Computer Engineering 46
Cache Performance Metrics§ Cache miss rate
§ Number of cache misses divided by number of accesses
§ Cache hit time§ Time between sending address and data returning
from cache§ Cache miss latency
§ Time between sending address and data returning from next-level cache/memory
§ Cache miss penalty§ Extra processor stall caused by next-level
cache/memory access
Department of Electrical & Computer Engineering
I/O Interface§ Basic I/O hardware
§ Ports, buses, devices and controllers § I/O Software