Top Banner

of 45

Ch04_The_memory_system

Apr 09, 2018

Download

Documents

Alok Singh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/7/2019 Ch04_The_memory_system

    1/45

    Memory Hierarchy

    Main Memory

    Associative Memory

    MEMORY ORGANIZATION

    Cache Memory

    Virtual Memory

    Memory Management Hardware

  • 8/7/2019 Ch04_The_memory_system

    2/45

    Memory Ideally,

    1. Fast2. Large3. Inexpensive

    Is it possible to meet all 3 requirements simultaneously ?

    Some Basic Concepts

    What is the max. size of memory?

    Address space16-bit : 216 = 64K memory locations32-bit : 232 = 4G memory locations40-bit : 240 = 1 T memory locations

    What is Byte addressable?

  • 8/7/2019 Ch04_The_memory_system

    3/45

    Introduction

    Even a sophisticated processor may

    perform well below an ordinary

    processor:

    Unless supported by matching

    performance by the memory system.

    The focus of this module:

    Study how memory systemperformance has been enhanced

    through various innovations and

    optimizations.

  • 8/7/2019 Ch04_The_memory_system

    4/45

    MEMORY HIERARCHY

    Magnetictapes

    Magneticdisks

    I/Oprocessor

    CPU

    Mainmemory

    Cachememory

    Auxiliary memory

    Memory Hierarchy is to obtain the highest possibleaccess speed while minimizing the total cost of the memory system

    Memory Hierarchy

    Register

    Cache

    Main Memory

    Magnetic Disk

    Magnetic Tape

    Increasing size

    Increasing speed Increasing cost

  • 8/7/2019 Ch04_The_memory_system

    5/45

    Basic Concepts of Memory

    MARMemory

    Up to 2 addressablek

    k-bit address bus

    n-bit data bus

    Processor

    locations

    word length= n bits

    CUControl lines

    Connection of the memory to the processo

    R / W , MFC , etc

  • 8/7/2019 Ch04_The_memory_system

    6/45

    Basic Concepts of Memory

    Data transfer between memory & processor takes place through MAR &

    MDR. If MAR is of K-bit then memory unit contains 2K addressable location.

    [K number of address lines]

    If MDR is of n-bit then memory cycle n bits of data transferred betweenmemory & processor. [n number of data lines]

    Bus also includes control lines Read / Write & MFC for coordinating

    data transfer. Processor read o eration

    MARin , Read / Write line = 1 , READ , WMFC , MDRin

    Processor write operation

    MDRin , MARin , MDRout , Read / Write line = 0 , WRITE , WMFC

    Memory access is synchronized using a clock. Memory Access Time Time between start Read and MFC signal [Speed

    of memory]

    Memory Cycle Time Minimum time delay between initiation of twosuccessive memory operations.[ ]

  • 8/7/2019 Ch04_The_memory_system

    7/45

    Basic Concepts of Memory

    Processor

    Registers

    Cache L1

    increasingsize

    increasingspeed

    increasingcost per bit

    SRAMac e

    Mainmemory

    secondarystoragememory

    ADRAM

  • 8/7/2019 Ch04_The_memory_system

    8/45

  • 8/7/2019 Ch04_The_memory_system

    9/45

  • 8/7/2019 Ch04_The_memory_system

    10/45

    Internal Organization of Memory Chips

    Memory cells are organized in an array [ Row & Column format ] where

    each cell is capable of storing one bit of information.

    Each row of cell contains memory word / data & all cells are connected to

    a common word line, which is driven by address decoderon chip.

    Cell in each column are connected to sense / write circuit by 2 bit lines.

    sense / write circuits are connected to data I/O lines of chip.

    READ Operation sense / write circuit Sense / Read information stored incells selected by a word line & transmit same information to o/p data line.

    WRITE Operation sense / write circuit receive i/p information & store it in

    the cell.

    If a memory chip consist of 16 memory words of 8 bit each then it is

    referred as 16 x 8 organization or 128 x 8 bit organization.

    The data I/O of each sense / write circuit are connected to a singlebidirectional data line that can be connected to the data bus of a computer.

    2 control lines Read / Write [Specifies the required operation ] & Chip

    Select (CS ) [ select a chip in a multichip memory ].

    It can store 128 bits & 14 external connections like address, data & control

    lines.

  • 8/7/2019 Ch04_The_memory_system

    11/45

    Internal Organization of Memory ChipAn Example

    32 - to -1

    O/P MUX

    &

    I/P DMUX

    Data I/P & O/P

  • 8/7/2019 Ch04_The_memory_system

    12/45

    Internal Organization of Memory ChipAn Example

    1k [ 1024 ] Memory Cell

    Design a memory of1k [ 1024 ] memory cells.

    For 1k , we require 10 bits address line.

    So 5 bits forrows & columns each to access address of the memory cell

    represented in array.

    row a ress se ec s a row o ce s, a o w c accesse n para e .

    According to the column address, only one of these cells is connected to

    the external data line by the output MUX & input DMUX.

  • 8/7/2019 Ch04_The_memory_system

    13/45

    Static Memories Circuits capable of retaining their state as long as power is applied Static

    RAM (SRAM) (volatile ).

    2 inverters are cross connected to form a latch. Latch is connected to 2 bit lines by transistors T1 & T2.

    transistors T1 & T2 act as switches can be opened & closed under controlof word line.

    For ground level transistors turned off(initial time cell is in state 1, X=1& Y=0 ).

    Read Operation :-

    1. or ne ac va e o c oseswitches T1 & T2 .

    2. Cell state either 1 or 0 & thesignal on bit line b and b arealways complements to eachother.

    3. Sense / Write circuit set the endvalue of bit line as output.

    Write Operation :-

    1. State of the cell is set byplacing the actual values on bitline b & its complement on band activating the word line.

    [Sense / Write circuit ]

    Bit line

    Word line

    T1 T2X Y

    A Static RAM Cell

  • 8/7/2019 Ch04_The_memory_system

    14/45

    Asynchronous DRAM

    SRAMs are fast but very costly due to much number oftransistors fortheir cells.

    So, less expensive cell, which also cant retain their state indefinitelyturn into a memory as dynamic RAM [DRAM].

    Data is stored in DRAM cell in form ofcharge on capacitorbut only fora period of tens of milliseconds.

    An Example of DRAM

    DRAM cell consist of a capacitor,, a rans s or, .

    To store information in cell,transistor T is turn on, & providecorrect amount ofvoltage to bitline.

    After transistorturn offcapacitorbegins to discharge.

    So, Read operation must becompleted before capacitor drops voltage below somethreshold value [ by senseamplifier connected to bit line].

    Single Transistor Dynamic memory CellSingle Transistor Dynamic memory CellSingle Transistor Dynamic memory CellSingle Transistor Dynamic memory Cell

  • 8/7/2019 Ch04_The_memory_system

    15/45

    Design 16MB DRAM Chip

    2 M x 8 memory chip .

    Cells are organized in the form of4K x 4K . 4096 cells in each row divided into 512 group of 8. Hence 512 byte data can be stored in each row.

    12 [ 512 x 8 = 212 ] bit address to select row & 9 [ 512 = 212 ] bits to specify a group of 8 bits in theselected row.

    RSA [Row address strobe] & CSA [Column address strobe] will be crossed to find the proper bitto read or write.

    The information on D7-0 lines is transferred to the selected circuit for write operation.

  • 8/7/2019 Ch04_The_memory_system

    16/45

    Synchronous DRAM Directly synchronize with a clock signal = SDRAM

    The address & data connections are buffered by means register.

    During read operation all cells in selected row loaded into latch & then to O/P register. Refresh counter refresh contain of the cells only.

    SDRAM can work on different modes by mode register like burst & self CSA activation. .

  • 8/7/2019 Ch04_The_memory_system

    17/45

    Memory Controller

    Row/ColumnAddress

    Memory address are divided into 2 parts.

    High order address bit which select row in the cell array, are provided first & latched into memorychip under control of RSA signal.

    Low order address bit , which selects a column are provided on the same address & latchedthrough CSA signal.

    Controller accepts a complete address & R/W signal from processor under control of REQUESTsignal, which indicates memory access operation is needed.

    Controller forwards row & column address timing to have address multiplexing function.

    Then R/W & CS are send to memory.

    Data lines are directly connected between processor & memory.

    Memory

    MemoryController

    data

    ProcessorClock

    R/W

    CS

    CAS

    RASR/W

    Request

    Clock

  • 8/7/2019 Ch04_The_memory_system

    18/45

    Reduces the search timeefficiently

    Address is replaced bycontent of data called asContent Addressable Memory(CAM)

    Called as Content based data.

    Hardwired Requirement :

    It contains memory array &

    logic form words with n bitsper each word.

    Associative Memory

    Argument register (A) & Keyregister (k) each have n bits.

    Match register (M) has m bits,one for each word in memory.

    Each word in memory is

    compared in parallel with thecontent of argument registerand key register.

    If a match found for a wordwhich matches with the bitsof argument register & itscorresponding bits in thematch register then a search

    for a data word is over.

  • 8/7/2019 Ch04_The_memory_system

    19/45

    Relatively small SRAM [ Having low access time ]memory locatedphysically closer to processor.

    Locality of Reference

    The references to memory at any given time interval tend to be confinedwithin a localized areas

    This area contains a set of information and the membership changesgradually as time goes by

    Temporal Locality

    The information which will be used in near future islikely to be in use already( e.g. Reuse of information in loops)

    Cache Memory

    Spatial Locality If a word is accessed, adjacent(near) words are likelyaccessed soon (e.g. Related data items (arrays) are usually storedtogether; instructions are executed sequentially)

    Cache is a fast small capacity memory that should hold those informationwhich are most likely to be accessed

    Main memory

    Cache memory

    CPU

  • 8/7/2019 Ch04_The_memory_system

    20/45

    All the memory accesses are directedfirst to Cache

    If the word is in Cache; Access cacheto provide it to CPU CACHE HIT

    If the word is not in Cache; Bring ablock (or a line) including that word toreplace a block now in Cache CACHE MISS

    Hit Ratio - % of memory accesses

    Performance Of CacheMemory

    Cache Write

    Write Through

    If Hit, both Cache and memory is

    written in parallel

    If Miss, Memory is written

    For a read miss, missing block maybe overloaded onto a cache block

    Write-Back (Copy-Back)

    If Hit, only Cache is written

    If Miss, missing block is brought to

    Te: Effective memory access timein Cache memory system

    Tc: Cache access time

    Tm: Main memory access time

    Te = h*Tc + (1Te = h*Tc + (1Te = h*Tc + (1Te = h*Tc + (1 ---- h) [Tc+Tm]h) [Tc+Tm]h) [Tc+Tm]h) [Tc+Tm] Example:

    Tc = 0.4 s, Tm = 1.2s, h = 85%

    Te = 0.85*0.4 + (1 - 0.85) * 1.6 = 0.58s

    ac e an wr e n o ac e

    Update only the cache location &mark it as updated with anassociated flag bit called as dirty /modified bit.

    For a read miss, candidate blockmust be written back to the memory

    Memory is not up-to-date, i.e., the sameitem in Cache and memory may havedifferent value called as cache coherenceproblem.

  • 8/7/2019 Ch04_The_memory_system

    21/45

    MEMORY AND CACHE MAPPING - ASSOCIATIVE MAPPLING -

    Associative mappingDirect mappingSet-associative mapping

    Associative Mapping

    Mapping Function Specification of correspondence between main memoryblocks and cache blocks

    - Any block location in Cache can store any block in memory

    Most flexible

    Cache Memory

    Fast, very Expensive

    - Mapping Table Stores both address and the content of the memory word

    address (15 bits)

    Argument register

    Address Data

    0 1 0 0 0

    0 2 7 7 72 2 2 3 5

    3 4 5 0

    6 7 1 01 2 3 4

    CAM

  • 8/7/2019 Ch04_The_memory_system

    22/45

    - DIRECT MAPPING -

    Addressing Relationships Tag(6) Index(9)

    32K x 12

    00 000000

    - Each memory block has only one place to load in Cache

    - Mapping Table is made of RAM instead of CAM

    - n-bit memory address consists of 2 parts; k bits of Index field and n-k bits ofTag field

    - n-bit addresses are used to access main memory and k-bit Index is used to

    access the Cache

    Cache Memory

    Direct Mapping Cache OrganizationMemoryaddress Memory data

    00000 1 2 2 0

    00777

    01000

    01777

    02000

    02777

    2 3 4 0

    3 4 5 0

    4 5 6 05 6 7 0

    6 7 1 0

    Indexaddress Tag Data

    000 0 0 1 2 2 0

    0 2 6 7 1 0777

    Cache memory

    Main memoryAddress = 15 bitsData = 12 bits

    Cache memoryAddress = 9 bitsData = 12 bits

    77 777 777

  • 8/7/2019 Ch04_The_memory_system

    23/45

    DIRECT MAPPINGOperation

    1. CPU generates a memory request with memory address (TAG;INDEX)

    2. Access Cache using INDEX ; (tag; data)Compare TAG on memory address and tag on cache memory

    3. If two tags matches then it is cache Hit

    4. Provide corresponding Cache [INDEX] to find (data) & then send it to CPU

    5. If not match then a cache Miss & read required data from main memory.

    6. Stores new data from Memory [tag : INDEX] on Cache together with tag

    & data [INDEX] (data)

    7. Cache [INDEX] (TAG ; M [ TAG ; INDEX ])

    Cache Memory

    Direct Mapping with block size of 8 words

    . ac e a a

    Index tag data

    000 0 1 3 4 5 0

    007 0 1 6 5 7 8

    010

    017

    770 0 2

    777 0 2 6 7 1 0

    Block 0

    Block 1

    Block 63

    Tag Block Word

    6 6 3

    INDEX

  • 8/7/2019 Ch04_The_memory_system

    24/45

    - SET ASSOCIATIVE MAPPING

    Set Associative Mapping Cache with set size of two or more words of memory

    under same index address

    Each memory block has a set of locations in the Cache to load

    Index Tag Data

    000 0 1 3 4 5 0 0 2 5 6 7 0

    Tag Data

    777 0 2 6 7 1 0 0 0 2 3 4 0OperationOperationOperationOperation

    Cache Memory

    1. CPU generates a memory address (TAG ; INDEX)

    2. Access Cache with INDEX, (Cache word = (tag 0 : data 0); (tag 1 : data 1))

    3. Compare TAG and tag 0 and then tag 1

    4. If tag i = TAG then cache Hit, CPU get data i

    5. If tag i TAG

    then cache Miss, {or set is full }Replace either (tag 0, data 0) or (tag 1, data 1)

    Assume (tag 0, data 0) is selected for replacement

    (Why (tag 0, data 0) instead of (tag 1, data 1) ?)

    Memory[tag 0, INDEX] Cache[INDEX](data 0)

    Cache[INDEX](tag 0, data 0) Memory(TAG, M[TAG,INDEX]),

    CPU Cache[INDEX](data 0)

  • 8/7/2019 Ch04_The_memory_system

    25/45

    Paging

    External fragmentation problem can be treated byPAGING.

    Logical address space of a process can benoncontiguous; process is allocated physical memorywhenever space is available

    Address mapping in paging scheme

    Divide physical memory into fixed-sized blocks called, ,

    bytes)

    Divide logical memory into blocks of same size calledpages

    Keep track of all free frames

    To run a program of size n pages, need to find n freeframes and load program

    Set up a page table to translate logical to physicaladdresses

    Internal fragmentation

  • 8/7/2019 Ch04_The_memory_system

    26/45

    Address Translation Scheme

    Address generated by CPU (Logical Address ) is dividedinto

    Page number (p) used as an index into apage table whichcontains base address of each page in physical memory

    Page offset (d) combined with base address to define thephysical memory address that is sent to the memory unit

    Base address + page offset = physical address

  • 8/7/2019 Ch04_The_memory_system

    27/45

    Paging Model of Logical and Physical Memory

    PageNumber

    Asindex

    BaseAddressOf eachPage inPhysicalMemory

  • 8/7/2019 Ch04_The_memory_system

    28/45

  • 8/7/2019 Ch04_The_memory_system

    29/45

    Paging Example

    PageNumber

    Asindex

    BaseAddressOf eachPage inPhysicalMemory

    Page 0

    Page 1

    Page 2

    Page 3

    32-byte memory and 4-byte pages

    Physical Address = (frame No.c page size) + Page offset

  • 8/7/2019 Ch04_The_memory_system

    30/45

    LA Address Mapping PA

    Physical Address = (frame No.c page size) + Page offset

    Example:-

    page size is 04 bytes.

    Physical memory size is of 32bytes.

    Hence total number of pages 32 bytes / 4 bytes = 08.

    If logical address is 0 then, what is its corresponding physical

    address?

    Hence Page number is = 0, which is in frame 5 given as in page table.

    So physical address = (5 x 4) + 0 = 20

    If logical address is 03 then, what is its corresponding physicaladdress?

    Page offset (d) = displacement within pages = 03 0 = 3 Hence Page number is = 0, which is in frame 5 given as in page table.

    So physical address = (5 x 4) + 3 = 23

  • 8/7/2019 Ch04_The_memory_system

    31/45

    Paging Example

    If logical address is 04 then, what is its corresponding physicaladdress?

    Page offset (d) = displacement within pages = 04 4 = 0 Hence Page number is = 1, which is in frame 6 given as in page table.

    So physical address = (6 x 4) + 0 = 24

    If logical address is 10 then, what is its corresponding physicaladdress?

    Page offset (d) = displacement within pages = 10 8 = 2

    Hence Page number is = 2, which is in frame 1 given as in page table.

    If logical address is 13 then, what is its corresponding physicaladdress?

    Page offset (d) = displacement within pages = 13 12 = 01

    Hence Page number is = 3, which is in frame 2 given as in page table.

    So physical address = (2 x 4) + 1 = 09 If logical address is 15 then, what is its corresponding physical

    address?

    Page offset (d) = displacement within pages = 15 12 = 3

    Hence Page number is = 3, which is in frame 2 given as in page table.

    So physical address = (2x 4) + 3 = 11

  • 8/7/2019 Ch04_The_memory_system

    32/45

    Paging Hardware With TLB

    As Associative register

  • 8/7/2019 Ch04_The_memory_system

    33/45

    Virtual memory Virtual memory separation of user logical memory

    from physical memory.

    Only part of the program needs to be inmemory for execution

    Logical address space can therefore bemuch larger than physical address space

    several processes

    Allows for more efficient process creation

    Virtual memory can be implemented via:

    Demand paging

    Demand segmentation

  • 8/7/2019 Ch04_The_memory_system

    34/45

    Virtual Memory That is Larger ThanPhysical Memory

  • 8/7/2019 Ch04_The_memory_system

    35/45

    Page Replacement

  • 8/7/2019 Ch04_The_memory_system

    36/45

    Page Replacement Algorithms

    Want lowest page-fault rate

    Evaluate algorithm by running it on a particular string ofmemory references (reference string) and computing thenumber of page faults on that string

  • 8/7/2019 Ch04_The_memory_system

    37/45

    First-In-First-Out (FIFO) Algorithm

    Replacement is depends upon the arrival time of a page tomemory.

    A page is replaced when it is oldest (in the ascending order ofpage arrival time to memory).

    As it is a FIFO queue no need to record the arrival time of aa e & the a e at the head of the ueue is re laced.

    Performance is not good always

    When a active page is replaced to bring a new page, then a pagefault occurs immediately to retrieve the active page.

    To get the active page back some other page has to be

    replaced. Hence the page fault increases.

  • 8/7/2019 Ch04_The_memory_system

    38/45

    FIFO Page Replacement

    HIT

    TWOHITS

    TWOHITS

    15 PAGE FAULTS

  • 8/7/2019 Ch04_The_memory_system

    39/45

    Problem with FIFO Algorithm

    Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 3 frames (3 pages can be in memory at a time per process)

    1

    2

    3

    1

    2

    3

    4

    1

    2

    5

    3

    4

    9 page faults

    4 frames

    Beladys Anomaly: more framesmore page faults

    1

    2

    3

    1

    2

    3

    5

    1

    2

    4

    5

    10 page faults44 3

    UNEXPECTEDPage faultPage faultPage faultPage faultincreasesincreasesincreasesincreases

  • 8/7/2019 Ch04_The_memory_system

    40/45

    Optimal Algorithm

    To recover from beladys anomaly problem : Use Optimalpage replacement algorithm

    Replace the page that will not be used for longest period oftime.

    This guarantees lowest possible page fault rate for a fixed

    number of frames.

    Example :

    First we found 3 page faults to fill the frames.

    Then replace page 7 with page 2 because it willnot needed up to the 18th place in referencestring.

    Finally there are 09 page faults.

    Hence it is better than FIFO algorithm (15 pageFaults).

  • 8/7/2019 Ch04_The_memory_system

    41/45

    Optimal Page Replacement

    HIT

    HIT TWOHIT

    S

    TWOHITS

    THREEHITS

    TWOHITS

    09 PAGE FAULTS

  • 8/7/2019 Ch04_The_memory_system

    42/45

    Difficulty with Optimal Algorithm

    Replace page that will not be used for longest period oftime

    4 frames example

    1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5

    1

    2

    4

    Used for measuring how well your algorithm performs. It always needs future knowledge of reference string.

    3

    6 page faults

    4 5

  • 8/7/2019 Ch04_The_memory_system

    43/45

    Least Recently Used (LRU) Algorithm

    LRU algorithm lies between FIFO & Optimal algorithm ( interms of page faults).

    FIFO : time when page brought into memory.

    OPTIMAL : time when a page will used.

    LRU : Use the recent past as an approximation of nearfuture (so it cant be replaced), then we will

    replace that page which has not been used for longestper o o me. ence s eas recen y use a gor m.

    Example :

    Up to 5th page fault it is same as optimal algorithm.

    When page 4 occur LRU chose page 2 for replacement.

    Here we find only 12 page faults.

  • 8/7/2019 Ch04_The_memory_system

    44/45

    LRU Page Replacement

    H HTWOHIT

    H HTWOHIT

    SS

    IT IT

    12 PAGE FAULTS

  • 8/7/2019 Ch04_The_memory_system

    45/45

    Least Recently Used (LRU) Algorithm

    Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5

    5

    2

    4

    3

    1

    2

    3

    4

    1

    2

    5

    4

    1

    2

    5

    3

    1

    2

    4

    3

    Counter implementation

    Every page entry has a counter; every time page is

    referenced through this entry, copy the clock into thecounter

    When a page needs to be changed, look at thecounters to determine which are to change