8/7/2019 Ch04_The_memory_system
1/45
Memory Hierarchy
Main Memory
Associative Memory
MEMORY ORGANIZATION
Cache Memory
Virtual Memory
Memory Management Hardware
8/7/2019 Ch04_The_memory_system
2/45
Memory Ideally,
1. Fast2. Large3. Inexpensive
Is it possible to meet all 3 requirements simultaneously ?
Some Basic Concepts
What is the max. size of memory?
Address space16-bit : 216 = 64K memory locations32-bit : 232 = 4G memory locations40-bit : 240 = 1 T memory locations
What is Byte addressable?
8/7/2019 Ch04_The_memory_system
3/45
Introduction
Even a sophisticated processor may
perform well below an ordinary
processor:
Unless supported by matching
performance by the memory system.
The focus of this module:
Study how memory systemperformance has been enhanced
through various innovations and
optimizations.
8/7/2019 Ch04_The_memory_system
4/45
MEMORY HIERARCHY
Magnetictapes
Magneticdisks
I/Oprocessor
CPU
Mainmemory
Cachememory
Auxiliary memory
Memory Hierarchy is to obtain the highest possibleaccess speed while minimizing the total cost of the memory system
Memory Hierarchy
Register
Cache
Main Memory
Magnetic Disk
Magnetic Tape
Increasing size
Increasing speed Increasing cost
8/7/2019 Ch04_The_memory_system
5/45
Basic Concepts of Memory
MARMemory
Up to 2 addressablek
k-bit address bus
n-bit data bus
Processor
locations
word length= n bits
CUControl lines
Connection of the memory to the processo
R / W , MFC , etc
8/7/2019 Ch04_The_memory_system
6/45
Basic Concepts of Memory
Data transfer between memory & processor takes place through MAR &
MDR. If MAR is of K-bit then memory unit contains 2K addressable location.
[K number of address lines]
If MDR is of n-bit then memory cycle n bits of data transferred betweenmemory & processor. [n number of data lines]
Bus also includes control lines Read / Write & MFC for coordinating
data transfer. Processor read o eration
MARin , Read / Write line = 1 , READ , WMFC , MDRin
Processor write operation
MDRin , MARin , MDRout , Read / Write line = 0 , WRITE , WMFC
Memory access is synchronized using a clock. Memory Access Time Time between start Read and MFC signal [Speed
of memory]
Memory Cycle Time Minimum time delay between initiation of twosuccessive memory operations.[ ]
8/7/2019 Ch04_The_memory_system
7/45
Basic Concepts of Memory
Processor
Registers
Cache L1
increasingsize
increasingspeed
increasingcost per bit
SRAMac e
Mainmemory
secondarystoragememory
ADRAM
8/7/2019 Ch04_The_memory_system
8/45
8/7/2019 Ch04_The_memory_system
9/45
8/7/2019 Ch04_The_memory_system
10/45
Internal Organization of Memory Chips
Memory cells are organized in an array [ Row & Column format ] where
each cell is capable of storing one bit of information.
Each row of cell contains memory word / data & all cells are connected to
a common word line, which is driven by address decoderon chip.
Cell in each column are connected to sense / write circuit by 2 bit lines.
sense / write circuits are connected to data I/O lines of chip.
READ Operation sense / write circuit Sense / Read information stored incells selected by a word line & transmit same information to o/p data line.
WRITE Operation sense / write circuit receive i/p information & store it in
the cell.
If a memory chip consist of 16 memory words of 8 bit each then it is
referred as 16 x 8 organization or 128 x 8 bit organization.
The data I/O of each sense / write circuit are connected to a singlebidirectional data line that can be connected to the data bus of a computer.
2 control lines Read / Write [Specifies the required operation ] & Chip
Select (CS ) [ select a chip in a multichip memory ].
It can store 128 bits & 14 external connections like address, data & control
lines.
8/7/2019 Ch04_The_memory_system
11/45
Internal Organization of Memory ChipAn Example
32 - to -1
O/P MUX
&
I/P DMUX
Data I/P & O/P
8/7/2019 Ch04_The_memory_system
12/45
Internal Organization of Memory ChipAn Example
1k [ 1024 ] Memory Cell
Design a memory of1k [ 1024 ] memory cells.
For 1k , we require 10 bits address line.
So 5 bits forrows & columns each to access address of the memory cell
represented in array.
row a ress se ec s a row o ce s, a o w c accesse n para e .
According to the column address, only one of these cells is connected to
the external data line by the output MUX & input DMUX.
8/7/2019 Ch04_The_memory_system
13/45
Static Memories Circuits capable of retaining their state as long as power is applied Static
RAM (SRAM) (volatile ).
2 inverters are cross connected to form a latch. Latch is connected to 2 bit lines by transistors T1 & T2.
transistors T1 & T2 act as switches can be opened & closed under controlof word line.
For ground level transistors turned off(initial time cell is in state 1, X=1& Y=0 ).
Read Operation :-
1. or ne ac va e o c oseswitches T1 & T2 .
2. Cell state either 1 or 0 & thesignal on bit line b and b arealways complements to eachother.
3. Sense / Write circuit set the endvalue of bit line as output.
Write Operation :-
1. State of the cell is set byplacing the actual values on bitline b & its complement on band activating the word line.
[Sense / Write circuit ]
Bit line
Word line
T1 T2X Y
A Static RAM Cell
8/7/2019 Ch04_The_memory_system
14/45
Asynchronous DRAM
SRAMs are fast but very costly due to much number oftransistors fortheir cells.
So, less expensive cell, which also cant retain their state indefinitelyturn into a memory as dynamic RAM [DRAM].
Data is stored in DRAM cell in form ofcharge on capacitorbut only fora period of tens of milliseconds.
An Example of DRAM
DRAM cell consist of a capacitor,, a rans s or, .
To store information in cell,transistor T is turn on, & providecorrect amount ofvoltage to bitline.
After transistorturn offcapacitorbegins to discharge.
So, Read operation must becompleted before capacitor drops voltage below somethreshold value [ by senseamplifier connected to bit line].
Single Transistor Dynamic memory CellSingle Transistor Dynamic memory CellSingle Transistor Dynamic memory CellSingle Transistor Dynamic memory Cell
8/7/2019 Ch04_The_memory_system
15/45
Design 16MB DRAM Chip
2 M x 8 memory chip .
Cells are organized in the form of4K x 4K . 4096 cells in each row divided into 512 group of 8. Hence 512 byte data can be stored in each row.
12 [ 512 x 8 = 212 ] bit address to select row & 9 [ 512 = 212 ] bits to specify a group of 8 bits in theselected row.
RSA [Row address strobe] & CSA [Column address strobe] will be crossed to find the proper bitto read or write.
The information on D7-0 lines is transferred to the selected circuit for write operation.
8/7/2019 Ch04_The_memory_system
16/45
Synchronous DRAM Directly synchronize with a clock signal = SDRAM
The address & data connections are buffered by means register.
During read operation all cells in selected row loaded into latch & then to O/P register. Refresh counter refresh contain of the cells only.
SDRAM can work on different modes by mode register like burst & self CSA activation. .
8/7/2019 Ch04_The_memory_system
17/45
Memory Controller
Row/ColumnAddress
Memory address are divided into 2 parts.
High order address bit which select row in the cell array, are provided first & latched into memorychip under control of RSA signal.
Low order address bit , which selects a column are provided on the same address & latchedthrough CSA signal.
Controller accepts a complete address & R/W signal from processor under control of REQUESTsignal, which indicates memory access operation is needed.
Controller forwards row & column address timing to have address multiplexing function.
Then R/W & CS are send to memory.
Data lines are directly connected between processor & memory.
Memory
MemoryController
data
ProcessorClock
R/W
CS
CAS
RASR/W
Request
Clock
8/7/2019 Ch04_The_memory_system
18/45
Reduces the search timeefficiently
Address is replaced bycontent of data called asContent Addressable Memory(CAM)
Called as Content based data.
Hardwired Requirement :
It contains memory array &
logic form words with n bitsper each word.
Associative Memory
Argument register (A) & Keyregister (k) each have n bits.
Match register (M) has m bits,one for each word in memory.
Each word in memory is
compared in parallel with thecontent of argument registerand key register.
If a match found for a wordwhich matches with the bitsof argument register & itscorresponding bits in thematch register then a search
for a data word is over.
8/7/2019 Ch04_The_memory_system
19/45
Relatively small SRAM [ Having low access time ]memory locatedphysically closer to processor.
Locality of Reference
The references to memory at any given time interval tend to be confinedwithin a localized areas
This area contains a set of information and the membership changesgradually as time goes by
Temporal Locality
The information which will be used in near future islikely to be in use already( e.g. Reuse of information in loops)
Cache Memory
Spatial Locality If a word is accessed, adjacent(near) words are likelyaccessed soon (e.g. Related data items (arrays) are usually storedtogether; instructions are executed sequentially)
Cache is a fast small capacity memory that should hold those informationwhich are most likely to be accessed
Main memory
Cache memory
CPU
8/7/2019 Ch04_The_memory_system
20/45
All the memory accesses are directedfirst to Cache
If the word is in Cache; Access cacheto provide it to CPU CACHE HIT
If the word is not in Cache; Bring ablock (or a line) including that word toreplace a block now in Cache CACHE MISS
Hit Ratio - % of memory accesses
Performance Of CacheMemory
Cache Write
Write Through
If Hit, both Cache and memory is
written in parallel
If Miss, Memory is written
For a read miss, missing block maybe overloaded onto a cache block
Write-Back (Copy-Back)
If Hit, only Cache is written
If Miss, missing block is brought to
Te: Effective memory access timein Cache memory system
Tc: Cache access time
Tm: Main memory access time
Te = h*Tc + (1Te = h*Tc + (1Te = h*Tc + (1Te = h*Tc + (1 ---- h) [Tc+Tm]h) [Tc+Tm]h) [Tc+Tm]h) [Tc+Tm] Example:
Tc = 0.4 s, Tm = 1.2s, h = 85%
Te = 0.85*0.4 + (1 - 0.85) * 1.6 = 0.58s
ac e an wr e n o ac e
Update only the cache location &mark it as updated with anassociated flag bit called as dirty /modified bit.
For a read miss, candidate blockmust be written back to the memory
Memory is not up-to-date, i.e., the sameitem in Cache and memory may havedifferent value called as cache coherenceproblem.
8/7/2019 Ch04_The_memory_system
21/45
MEMORY AND CACHE MAPPING - ASSOCIATIVE MAPPLING -
Associative mappingDirect mappingSet-associative mapping
Associative Mapping
Mapping Function Specification of correspondence between main memoryblocks and cache blocks
- Any block location in Cache can store any block in memory
Most flexible
Cache Memory
Fast, very Expensive
- Mapping Table Stores both address and the content of the memory word
address (15 bits)
Argument register
Address Data
0 1 0 0 0
0 2 7 7 72 2 2 3 5
3 4 5 0
6 7 1 01 2 3 4
CAM
8/7/2019 Ch04_The_memory_system
22/45
- DIRECT MAPPING -
Addressing Relationships Tag(6) Index(9)
32K x 12
00 000000
- Each memory block has only one place to load in Cache
- Mapping Table is made of RAM instead of CAM
- n-bit memory address consists of 2 parts; k bits of Index field and n-k bits ofTag field
- n-bit addresses are used to access main memory and k-bit Index is used to
access the Cache
Cache Memory
Direct Mapping Cache OrganizationMemoryaddress Memory data
00000 1 2 2 0
00777
01000
01777
02000
02777
2 3 4 0
3 4 5 0
4 5 6 05 6 7 0
6 7 1 0
Indexaddress Tag Data
000 0 0 1 2 2 0
0 2 6 7 1 0777
Cache memory
Main memoryAddress = 15 bitsData = 12 bits
Cache memoryAddress = 9 bitsData = 12 bits
77 777 777
8/7/2019 Ch04_The_memory_system
23/45
DIRECT MAPPINGOperation
1. CPU generates a memory request with memory address (TAG;INDEX)
2. Access Cache using INDEX ; (tag; data)Compare TAG on memory address and tag on cache memory
3. If two tags matches then it is cache Hit
4. Provide corresponding Cache [INDEX] to find (data) & then send it to CPU
5. If not match then a cache Miss & read required data from main memory.
6. Stores new data from Memory [tag : INDEX] on Cache together with tag
& data [INDEX] (data)
7. Cache [INDEX] (TAG ; M [ TAG ; INDEX ])
Cache Memory
Direct Mapping with block size of 8 words
. ac e a a
Index tag data
000 0 1 3 4 5 0
007 0 1 6 5 7 8
010
017
770 0 2
777 0 2 6 7 1 0
Block 0
Block 1
Block 63
Tag Block Word
6 6 3
INDEX
8/7/2019 Ch04_The_memory_system
24/45
- SET ASSOCIATIVE MAPPING
Set Associative Mapping Cache with set size of two or more words of memory
under same index address
Each memory block has a set of locations in the Cache to load
Index Tag Data
000 0 1 3 4 5 0 0 2 5 6 7 0
Tag Data
777 0 2 6 7 1 0 0 0 2 3 4 0OperationOperationOperationOperation
Cache Memory
1. CPU generates a memory address (TAG ; INDEX)
2. Access Cache with INDEX, (Cache word = (tag 0 : data 0); (tag 1 : data 1))
3. Compare TAG and tag 0 and then tag 1
4. If tag i = TAG then cache Hit, CPU get data i
5. If tag i TAG
then cache Miss, {or set is full }Replace either (tag 0, data 0) or (tag 1, data 1)
Assume (tag 0, data 0) is selected for replacement
(Why (tag 0, data 0) instead of (tag 1, data 1) ?)
Memory[tag 0, INDEX] Cache[INDEX](data 0)
Cache[INDEX](tag 0, data 0) Memory(TAG, M[TAG,INDEX]),
CPU Cache[INDEX](data 0)
8/7/2019 Ch04_The_memory_system
25/45
Paging
External fragmentation problem can be treated byPAGING.
Logical address space of a process can benoncontiguous; process is allocated physical memorywhenever space is available
Address mapping in paging scheme
Divide physical memory into fixed-sized blocks called, ,
bytes)
Divide logical memory into blocks of same size calledpages
Keep track of all free frames
To run a program of size n pages, need to find n freeframes and load program
Set up a page table to translate logical to physicaladdresses
Internal fragmentation
8/7/2019 Ch04_The_memory_system
26/45
Address Translation Scheme
Address generated by CPU (Logical Address ) is dividedinto
Page number (p) used as an index into apage table whichcontains base address of each page in physical memory
Page offset (d) combined with base address to define thephysical memory address that is sent to the memory unit
Base address + page offset = physical address
8/7/2019 Ch04_The_memory_system
27/45
Paging Model of Logical and Physical Memory
PageNumber
Asindex
BaseAddressOf eachPage inPhysicalMemory
8/7/2019 Ch04_The_memory_system
28/45
8/7/2019 Ch04_The_memory_system
29/45
Paging Example
PageNumber
Asindex
BaseAddressOf eachPage inPhysicalMemory
Page 0
Page 1
Page 2
Page 3
32-byte memory and 4-byte pages
Physical Address = (frame No.c page size) + Page offset
8/7/2019 Ch04_The_memory_system
30/45
LA Address Mapping PA
Physical Address = (frame No.c page size) + Page offset
Example:-
page size is 04 bytes.
Physical memory size is of 32bytes.
Hence total number of pages 32 bytes / 4 bytes = 08.
If logical address is 0 then, what is its corresponding physical
address?
Hence Page number is = 0, which is in frame 5 given as in page table.
So physical address = (5 x 4) + 0 = 20
If logical address is 03 then, what is its corresponding physicaladdress?
Page offset (d) = displacement within pages = 03 0 = 3 Hence Page number is = 0, which is in frame 5 given as in page table.
So physical address = (5 x 4) + 3 = 23
8/7/2019 Ch04_The_memory_system
31/45
Paging Example
If logical address is 04 then, what is its corresponding physicaladdress?
Page offset (d) = displacement within pages = 04 4 = 0 Hence Page number is = 1, which is in frame 6 given as in page table.
So physical address = (6 x 4) + 0 = 24
If logical address is 10 then, what is its corresponding physicaladdress?
Page offset (d) = displacement within pages = 10 8 = 2
Hence Page number is = 2, which is in frame 1 given as in page table.
If logical address is 13 then, what is its corresponding physicaladdress?
Page offset (d) = displacement within pages = 13 12 = 01
Hence Page number is = 3, which is in frame 2 given as in page table.
So physical address = (2 x 4) + 1 = 09 If logical address is 15 then, what is its corresponding physical
address?
Page offset (d) = displacement within pages = 15 12 = 3
Hence Page number is = 3, which is in frame 2 given as in page table.
So physical address = (2x 4) + 3 = 11
8/7/2019 Ch04_The_memory_system
32/45
Paging Hardware With TLB
As Associative register
8/7/2019 Ch04_The_memory_system
33/45
Virtual memory Virtual memory separation of user logical memory
from physical memory.
Only part of the program needs to be inmemory for execution
Logical address space can therefore bemuch larger than physical address space
several processes
Allows for more efficient process creation
Virtual memory can be implemented via:
Demand paging
Demand segmentation
8/7/2019 Ch04_The_memory_system
34/45
Virtual Memory That is Larger ThanPhysical Memory
8/7/2019 Ch04_The_memory_system
35/45
Page Replacement
8/7/2019 Ch04_The_memory_system
36/45
Page Replacement Algorithms
Want lowest page-fault rate
Evaluate algorithm by running it on a particular string ofmemory references (reference string) and computing thenumber of page faults on that string
8/7/2019 Ch04_The_memory_system
37/45
First-In-First-Out (FIFO) Algorithm
Replacement is depends upon the arrival time of a page tomemory.
A page is replaced when it is oldest (in the ascending order ofpage arrival time to memory).
As it is a FIFO queue no need to record the arrival time of aa e & the a e at the head of the ueue is re laced.
Performance is not good always
When a active page is replaced to bring a new page, then a pagefault occurs immediately to retrieve the active page.
To get the active page back some other page has to be
replaced. Hence the page fault increases.
8/7/2019 Ch04_The_memory_system
38/45
FIFO Page Replacement
HIT
TWOHITS
TWOHITS
15 PAGE FAULTS
8/7/2019 Ch04_The_memory_system
39/45
Problem with FIFO Algorithm
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 3 frames (3 pages can be in memory at a time per process)
1
2
3
1
2
3
4
1
2
5
3
4
9 page faults
4 frames
Beladys Anomaly: more framesmore page faults
1
2
3
1
2
3
5
1
2
4
5
10 page faults44 3
UNEXPECTEDPage faultPage faultPage faultPage faultincreasesincreasesincreasesincreases
8/7/2019 Ch04_The_memory_system
40/45
Optimal Algorithm
To recover from beladys anomaly problem : Use Optimalpage replacement algorithm
Replace the page that will not be used for longest period oftime.
This guarantees lowest possible page fault rate for a fixed
number of frames.
Example :
First we found 3 page faults to fill the frames.
Then replace page 7 with page 2 because it willnot needed up to the 18th place in referencestring.
Finally there are 09 page faults.
Hence it is better than FIFO algorithm (15 pageFaults).
8/7/2019 Ch04_The_memory_system
41/45
Optimal Page Replacement
HIT
HIT TWOHIT
S
TWOHITS
THREEHITS
TWOHITS
09 PAGE FAULTS
8/7/2019 Ch04_The_memory_system
42/45
Difficulty with Optimal Algorithm
Replace page that will not be used for longest period oftime
4 frames example
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1
2
4
Used for measuring how well your algorithm performs. It always needs future knowledge of reference string.
3
6 page faults
4 5
8/7/2019 Ch04_The_memory_system
43/45
Least Recently Used (LRU) Algorithm
LRU algorithm lies between FIFO & Optimal algorithm ( interms of page faults).
FIFO : time when page brought into memory.
OPTIMAL : time when a page will used.
LRU : Use the recent past as an approximation of nearfuture (so it cant be replaced), then we will
replace that page which has not been used for longestper o o me. ence s eas recen y use a gor m.
Example :
Up to 5th page fault it is same as optimal algorithm.
When page 4 occur LRU chose page 2 for replacement.
Here we find only 12 page faults.
8/7/2019 Ch04_The_memory_system
44/45
LRU Page Replacement
H HTWOHIT
H HTWOHIT
SS
IT IT
12 PAGE FAULTS
8/7/2019 Ch04_The_memory_system
45/45
Least Recently Used (LRU) Algorithm
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
5
2
4
3
1
2
3
4
1
2
5
4
1
2
5
3
1
2
4
3
Counter implementation
Every page entry has a counter; every time page is
referenced through this entry, copy the clock into thecounter
When a page needs to be changed, look at thecounters to determine which are to change