Operating Systems Lecture 6: Memory System Design and Implementation William M. Mongan Max Shevertalov Jay Kothari *This lecture was derived from material in the Operating System Concepts, 8 th Edition textbook and accompanying slides. Contains Copyrighted Material Some slides adapted from the text and are copyright: 2009 Silbershatz, Galvin and Gagne, All Rights Reserved Lec 6 Operating Systems 1
89
Embed
Operating Systems Lecture 6: Memory System Design and Implementation William M. Mongan Max Shevertalov Jay Kothari *This lecture was derived from material.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Operating Systems
Lecture 6: Memory System Design and Implementation
William M. MonganMax Shevertalov
Jay Kothari
*This lecture was derived from material in the Operating System Concepts, 8 th Edition textbook and accompanying slides. Contains Copyrighted Material
Some slides adapted from the text and are copyright: 2009 Silbershatz, Galvin and Gagne, All Rights Reserved
Lec 6 Operating Systems 1
Chapter 8: Memory Management• Background• Swapping • Contiguous Memory Allocation• Paging• Structure of the Page Table• Segmentation• Example: The Intel Pentium
Lec 6 Operating Systems 2
Objectives• To provide a detailed description of various ways
of organizing memory hardware• To discuss various memory-management
techniques, including paging and segmentation• To provide a detailed description of the Intel
Pentium, which supports both pure segmentation and segmentation with paging
Lec 6 Operating Systems 3
Virtualizing Resources
• Physical Reality– Different Processes/Threads share the same hardware
• Need to multiplex CPU
• Need to multiplex use of memory
• Need to muliplex disk and services
• The complete working state of a process and/or kernel is defined by its data in memory (and registers)
– Consequently, cannot just let different threads of control use the same memory
• Physics: two different pieces of data cannot occupy the same locations in memory
– Moreover, probably don’t want different processes to even have access to each other’s memory (protection)
Lec 6 Operating Systems 4
Important Aspects of Memory Multiplexing
• Controlled Overlap:– Separate state of threads should not collide in physical memory.– Would like to overlap when desired for communication
• Protection– Prevent access to private memory of other processes
• Different pages of memory can be given special behavior (read only, invisible to user programs, etc.)
• Kernel data protected from User programs
• Programs protected from themselves
Lec 6 Operating Systems 5
Example of General Address Translation
Lec 6 Operating Systems 6
Two Views of Memory
• Address Space:– All the addresses and state a process can touch– Each process and kernel as a different address space
• Consequently, two views of memory:– View from the CPU (what program sees – virtual memory)– View from memory (physical memory)– Translation “box” converts between the two views; only the kernel can modify
• Translation helps to implement protection– If task A cannot even gain access to task B’s data, no way for A to adversely affect B
• With translation, every program can be linked/loaded into same region of user address space
– Overlap avoided through translation, not relocation– Every main() thinks it is starting at the same address!
Lec 6 Operating Systems 7
Memory-Management Unit (MMU)• Hardware device that maps virtual to physical address
• In MMU scheme, the value in the relocation register is added to every address generated by a user process at the time it is sent to memory
• The user program deals with logical addresses; it never sees the real physical addresses
Lec 6 Operating Systems 8
User Kernel (System Call)• We can’t let the inmate (user) get out of this padded cell on
its own– Would defeat the purpose of protection!– So, how does the user program get back into the kernel?
• System call: voluntary procedure call into the kernel– Hardware for controlled User Kernel transition– Can any kernel routine be called? No! only specific ones– System call ID is encoded into the system call instruction
• This indexing forces a well defined interface with the kernel
Lec 6 Operating Systems 9
Background
• Program must be brought (from disk) into memory and placed within a process for it to be run
• Main memory and registers are only storage CPU can access directly
• Register access in one CPU clock (or less)• Main memory can take many cycles• Cache sits between main memory and CPU registers• Protection of memory required to ensure correct
operation
Lec 6 Operating Systems 10
Binding of Instructions and Data to Memory
• Address binding of instructions and data to memory addresses can happen at three different stages
– Compile time: If memory location known a priori, absolute code can be generated; must recompile code if starting location changes
– Load time: Must generate relocatable code if memory location is not known at compile time
– Execution time: Binding delayed until run time if the process can be moved during its execution from one memory segment to another. Need hardware support for address maps (e.g., base and limit registers)
Lec 6 Operating Systems 11
Multistep Processing of a User Program
Lec 6 Operating Systems 12
Dynamic relocation using a relocation register
Lec 6 Operating Systems 13
Base and Limit Registers• A pair of base and limit
registers define the logical address space (segment)
• Translate virtual address to physical address by adding “base” – so long as it is within the range specified by the “limit.”
• All addresses relative to the segment, so no relocation needed when the base or limit changes.
Lec 6 Operating Systems 14
Base and Limit Registers• This gives the illusion
that the program is running on its own dedicated machine, with memory starting at 0
– Program gets continuous region of memory
– Addresses within the program do not have to be relocated when the program is placed in a different region of DRAM, or when the program is moved to another segment.
Lec 6 Operating Systems 15
Contiguous Allocation• Main memory usually into two partitions:
– Resident operating system, usually held in low memory with interrupt vector
– User processes then held in high memory
• Relocation registers used to protect user processes from each other, and from changing operating-system code and data
– Base register contains value of smallest physical address– Limit register contains range of logical addresses – each
logical address must be less than the limit register – MMU maps logical address dynamically
Lec 6 Operating Systems 16
Issues with Contiguous Allocation Method
• Though it is simple, it suffers from some problems:– Fragmentation problem
• Not every process is the same size
• Over time, memory space becomes fragmented
• It is difficult for the process memory space to grow dynamically (e.g. heap)
– Hard to do interprocess sharing• Want to share code segments when possible (libraries)
• Want to share memory between processes
• How to do this without sharing the entire segment?– Helped by providing multiple segments per process
addresses; each table entry has:– base – contains the starting physical address where the
segments reside in memory– limit – specifies the length of the segment
• Segment-table base register (STBR) points to the segment table’s location in memory
• Segment-table length register (STLR) indicates number of segments used by a program;
segment number s is legal if s < STLR
Lec 6 Operating Systems 26
Segmentation Architecture• Protection
– With each entry in segment table associate:• validation bit = 0 illegal segment
• read/write/execute privileges
• Protection bits associated with segments; code sharing occurs at segment level
• Since segments vary in length, memory allocation is a dynamic storage-allocation problem
• A segmentation example is shown in the following diagram
Lec 6 Operating Systems 27
Segmentation Hardware
Lec 6 Operating Systems 28
Implementing the Multi-Segment Model
• Segment map resides in processor• Segment number mapped into base/limit pair• Base added to offset to generate physical address (and checked
against range limit)
• As many chunks of physical memory as entries• Segment addressed by portion of virtual address
• What is V/N?• Can mark segments as invalid – why not just clear entries?
Lec 6 Operating Systems 29
Paging• Logical address space of a process can be
noncontiguous; process is allocated physical memory whenever the latter is available
• Divide physical memory into fixed-sized blocks called frames (size is power of 2, between 512 bytes and 8,192 bytes)
• Divide logical memory into blocks of same size called pages
• Keep track of all free frames
• To run a program of size n pages, need to find n free frames and load program
• Set up a page table to translate logical to physical addresses
• Internal fragmentation
Lec 6 Operating Systems 30
Implementing Paging
• Page Table (one per process)– Resides in physical memory– Contains physical page and permission for each virtual page (valid, read/write, etc.)
• Virtual address mapping– Offset from virtual address copied to physical address
• 10 bit offset 1024-byte pages
• Addresses are relative to a page
– Virtual page # is all remaining bits• Example for 32 bits: 32-10=22 bits, i.e. 4 million entries
• Physical page # copied from table into physical address
• Check page table bounds and permissions
Lec 6 Operating Systems 31
Address Translation Scheme• Address generated by CPU is divided into:
– Page number (p) – used as an index into a page table which contains base address of each page in physical memory
– Page offset (d) – combined with base address to define the physical memory address that is sent to the memory unit
– For given logical address space 2m and page size 2n
page number page offset
p d
m - n n
Lec 6 Operating Systems 32
Paging Hardware
Lec 6 Operating Systems 33
Paging Model of Logical and Physical Memory
Lec 6 Operating Systems 34
Paging Example
32-byte memory and 4-byte pages
Lec 6 Operating Systems 35
Free Frames
Before allocation After allocation
Lec 6 Operating Systems 36
Implementation of Page Table• Page table is kept in main memory• Page-table base register (PTBR) points to the page
table• Page-table length register (PRLR) indicates size of the
page table• In this scheme every data/instruction access requires
two memory accesses. One for the page table and one for the data/instruction.
• The two memory access problem can be solved by the use of a special fast-lookup hardware cache called associative memory or translation look-aside buffers (TLBs)
• Some TLBs store address-space identifiers (ASIDs) in each TLB entry – uniquely identifies each process to provide address-space protection for that process
Lec 6 Operating Systems 37
Memory Protection• Memory protection implemented by associating
protection bit with each frame
• Valid-invalid bit attached to each entry in the page table:
– “valid” indicates that the associated page is in the process’ logical address space, and is thus a legal page
– “invalid” indicates that the page is not in the process’ logical address space
Lec 6 Operating Systems 38
Valid (v) or Invalid (i) Bit In A Page Table
Lec 6 Operating Systems 39
Shared Pages• Shared code
– One copy of read-only (reentrant) code shared among processes (i.e., text editors, compilers, window systems).
– Shared code must appear in same location in the logical address space of all processes
• Private code and data – Each process keeps a separate copy of the code and data– The pages for the private code and data can appear
anywhere in the logical address space
Lec 6 Operating Systems 40
Shared Pages Example
Lec 6 Operating Systems 41
Structure of the Page Table• Hierarchical Paging (Forward Page Tables)
• Inverted Page Tables
• Hashed Page Tables
Lec 6 Operating Systems 42
Hierarchical Page Tables• Break up the logical address space into multiple page
tables
• A simple technique is a two-level page table
Lec 6 Operating Systems 43
Multi-Level Translation
• What must be saved/restored on context switch?
• Sharing– Complete segments– Individual pages
Lec 6 Operating Systems 44
Multi-Level Translation
• What must be saved/restored on context switch?– Contents of top level segment registers (for this example)– Pointer to top level table (page table)
• Sharing– Complete segments– Individual pages
Lec 6 Operating Systems 45
Two-Level Page-Table Scheme
Lec 6 Operating Systems 46
Two-Level Paging Example• A logical address (on 32-bit machine with 1K page size) is divided into:
– a page number consisting of 22 bits– a page offset consisting of 10 bits
• Since the page table is paged, the page number is further divided into:– a 12-bit page number – a 10-bit page offset
• Thus, a logical address is as follows:
where pi is an index into the outer page table, and p2 is the displacement within the page of the outer page table
page number page offset
pi p2 d
12 10 10
Lec 6 Operating Systems 47
Address-Translation Scheme
Lec 6 Operating Systems 48
Three-level Paging Scheme
Lec 6 Operating Systems 49
Inverted Page Table
• The size of these page tables is at least as large as the amount of virtual memory allocated to each process.
• Physical memory may be much less– Much of process space may be out on disk or not in use
• Use a hash table instead– Inverted page table is independent of virtual address space– Directly related to amount of physical memory instead– Very attractive option for 64-bit address spaces
• Cons: complexity in managing hash changes– Often in hardware!
Lec 6 Operating Systems 50
Inverted Page Table
• One entry for each real page of memory• Entry consists of the virtual address of the page
stored in that real memory location, with information about the process that owns that page
• Decreases memory needed to store each page table, but increases time needed to search the table when a page reference occurs
• Use hash table to limit the search to one — or at most a few — page-table entries
Lec 6 Operating Systems 51
Inverted Page Table Architecture
Lec 6 Operating Systems 52
Hashed Page Tables• Common in address spaces > 32 bits
• The virtual page number is hashed into a page table– This page table contains a chain of elements hashing to the same
location
• Virtual page numbers are compared in this chain searching for a match
– If a match is found, the corresponding physical frame is extracted
Lec 6 Operating Systems 53
Hashed Page Table
Lec 6 Operating Systems 54
Example: The Intel Pentium
• Supports both segmentation and segmentation with paging• CPU generates logical address
– Given to segmentation unit• Which produces linear addresses
– Linear address given to paging unit• Which generates physical address in main memory
• Paging units form equivalent of MMU
Lec 6 Operating Systems 55
Logical to Physical Address Translation in Pentium
Lec 6 Operating Systems 56
What is in a Page Table Entry (PTE)
• Pointer to the next-level page table or the actual page• Permission bits: valid, read-only, read-write, write-only• Example: Intel x86 PTE
– Address format shown above– Intermediate page tables called directories
Lec 6 Operating Systems 57
Intel Pentium Segmentation
Lec 6 Operating Systems 58
Pentium Paging Architecture
Lec 6 Operating Systems 59
Linear Address in LinuxBroken into four parts:
Lec 6 Operating Systems 60
Three-level Paging in Linux
Lec 6 Operating Systems 61
Dynamic Loading• Routine is not loaded until it is called• Better memory-space utilization; unused routine is
never loaded• Useful when large amounts of code are needed to
handle infrequently occurring cases• No special support from the operating system is
required implemented through program design
Lec 6 Operating Systems 62
Dynamic Linking• Linking postponed until execution time• Small piece of code, stub, used to locate the
appropriate memory-resident library routine• Stub replaces itself with the address of the routine,
and executes the routine• Operating system needed to check if routine is in
processes’ memory address• Dynamic linking is particularly useful for libraries• System also known as shared libraries
Lec 6 Operating Systems 63
Swapping• A process can be swapped temporarily out of memory to a backing
store, and then brought back into memory for continued execution
• Backing store – fast disk large enough to accommodate copies of all memory images for all users; must provide direct access to these memory images
• Roll out, roll in – swapping variant used for priority-based scheduling algorithms; lower-priority process is swapped out so higher-priority process can be loaded and executed
• Major part of swap time is transfer time; total transfer time is directly proportional to the amount of memory swapped
• Modified versions of swapping are found on many systems (i.e., UNIX, Linux, and Windows)
• System maintains a ready queue of ready-to-run processes which have memory images on disk
Lec 6 Operating Systems 64
Schematic View of Swapping
Lec 6 Operating Systems 65
• Extreme form of context switch• In order to make room for the next process, some or all of the previous
process is moved to disk• Likely need to send out complete segments
• This greatly increases the cost of context switching
• Desirable alternative• Some way to keep only active portions of a process in memory at any
one time• Need finer granularity control over physical memory
Summary of Memory Allocation• Multi-Segment Model enables each process to think it has access to
all the memory it needs• Pages finer granularity of data so that not everything needs to be
swapped during a context switch• Swapping means moving some of the process’ memory from
physical memory to the backing store– At an extreme, we could bring in every page for every process just-in-time,
maximizing use of memory by only holding pages that are currently in demand.– This is called demand paging
• Simple paging page table may get very large for a sparse address space
– Combine segmentation and paging multi-level translation– May be slow if lots of levels
• Inverted page table store a hash table of physical addresses to save space
Lec 6 Operating Systems 66
The Memory Hierarchy: Cache
• Cache: a repository for copies that can be accessed more quickly than the original
– Can’t make all memory available in small cache on-demand– Make frequent case (hit) fast and infrequent case (miss) less dominant
• Caching underlies many of the techniques that are used today to make computers fast
– Can cache: memory locations, address translations (page tables TLB), pages, file blocks, file names, network routes, etc.
• Only good if:– Frequent case frequent enough and– Infrequent case not too expensive
• Average Memory Access Time = (Hit Rate * Hit Time) + (Miss Rate * Miss Time)
Lec 6 Operating Systems 67
Why Cache?• Processor-DRAM latency gap• Processor: ~60% per year (doubles every 1.5 years)• Memory: ~9% per year (doubles every 10 years)• Gap grows 50% every year
Lec 6 Operating Systems 68
Why Cache?
• Take advantage of temporal and spatial locality– Keep things that were accessed recently closer to the processor for
temporal locality– Move contiguous blocks to upper levels for spatial locality
Lec 6 Operating Systems 69
Why Cache?
• Take advantage of temporal and spatial locality– Keep things that were accessed recently closer to the processor for
temporal locality– Move contiguous blocks to upper levels for spatial locality
Lec 6 Operating Systems 70
Where does data go into a cache?
Lec 6 Operating Systems 71
Where does data go into a cache?
Lec 6 Operating Systems 72
Sources of Cache Misses
• Compulsory (cold start or process migration, first reference)
• Capacity
• Conflict (collision)
• Coherence (invalidation)
Lec 6 Operating Systems 73
Sources of Cache Misses
• Compulsory (cold start or process migration, first reference)
– Cold fact of life: not a whole lot you can do about it– If you are going to run “billions” of instructions, these are insignificant
• Capacity– Cache cannot contain all blocks accessed by the program– Solution: increase cache size
• Conflict (collision)– Multiple memory locations mapped to the same location– Solution: increase cache size and/or associativity
• Coherence (invalidation)– Other processes (e.g. I/O) updates memory
Lec 6 Operating Systems 74
Accessing Data in a Cache
• Index Used to Lookup Candidates in Cache– Index identifies the set
• Tag used to identify actual copy (otherwise, cache miss)
• Block is minimum quantum of caching– Data select field used to select data within block– Many caching applications don’t have data select field
Lec 6 Operating Systems 75
Lec 6 Operating Systems 76
Locating data in the Cache
• Compare cache index (mapping) to Tag (high-order bits) to see if element is currently in cache.
• Valid bit used to indicate whether data in cache is valid.
• A hit occurs when the data requested is in cache, otherwise it is a miss.
• The extra time required when a cache miss occurs is called the miss penalty.
Address (showing bit positions)
20 10
Byteoffset
Valid Tag DataIndex
0
1
2
1021
1022
1023
Tag
Index
Hit Data
20 32
31 30 13 12 11 2 1 0
Lec 6 Operating Systems 77
Mapping an Address to a Multiword Cache Block
• (Block address) mod (Number of cache blocks)• Range: l = Byte address/Bytes per block Bytes per block
r = l + (Bytes per block - 1)Address (showing bit positions)
16 12 Byteoffset
V Tag Data
Hit Data
16 32
4Kentries
16 bits 128 bits
Mux
32 32 32
2
32
Block offsetIndex
Tag
31 16 15 4 32 1 0
Lec 6 Operating Systems 78
Locating a Block in Cache
• Check the tag of every cache block in the appropriate set
• Address consists of 3 parts
• Replacement strategy:
E.G. Least Recently Used (LRU)
tag index block offset
Program Assoc. I miss rate D miss rate Combined rategcc 1 2.0% 1.7% 1.9%
2 1.6% 1.4% 1.5%4 1.6% 1.4% 1.5%
Address
22 8
V TagIndex
0
1
2
253
254255
Data V Tag Data V Tag Data V Tag Data
3222
4-to-1 multiplexor
Hit Data
123891011123031 0
What happens on a write?
• Write-through: Write to both the cache and the block in lower level memory
• Write-back: Only write to the cache for speed– Modified cache block is written to main memory only when it is replaced– Dirty bit indicates whether this write is required
• Pros and Cons– Write-Through
• Read misses cannot result in writes (+)• Processor held up on writes unless writes buffered (-)
– Write-Back• Repeated writes not sent to DRAM (+)• Processor not held up on writes (+)• More complex (-)• Read miss may require writeback of dirty data (-)
Lec 6 Operating Systems 79
Caching Address Translation (TLB)
• Can’t afford to touch the page table and translate on every access
– At least 3 DRAM access per actual DRAM access– Or possibly disk access!
• Worse, what if we are using caching to make memory access faster than DRAM access? Then we touch DRAM to access cache.
• Cache translations with the TLB
Lec 6 Operating Systems 80
Caching Address Translation (TLB)
• Does page locality exist?– Instruction accesses spend a lot of time on the same page (since
accesses sequential)– Stack addresses have definite locality of reference– Data accesses have less page locality, but still some
• TLB hierarchy possible, with multiple levels at different sizes and speeds (just like traditional data cache!)
– If p is in associative register, get frame # out– Otherwise get frame # from page table in memory
Lec 6 Operating Systems 82
Paging Hardware With TLB
Lec 6 Operating Systems 83
Effective Access Time• Associative Lookup = time unit• Assume memory cycle time is 1 microsecond• Hit ratio – percentage of times that a page number is
found in the associative registers; ratio related to number of associative registers
• Hit ratio = • Effective Access Time (EAT)
EAT = (1 + ) + (2 + )(1 – )
= 2 + –
Lec 6 Operating Systems 84
What Happens on a TLB Miss?
• Hardware traversed page tables:– On TLB miss, hardware in MMU looks at current page table to fill TLB
(may walk multiple levels)• If PTE valid, hardware fills TLB and processor never knows
• If PTE marked as invalid, page fault, and the kernel decides what to do
• Processor fault handler typically handles this fault first for speed
– Kernel traverses page table to find PTE• If PTE valid, fill TLB and return from fault
• If PTE invalid, internally call page fault handler
• Most chipsets provide hardware traversal– Modern OS tend to have more TLB faults since they use translation for
many things, like shared segments and user-level portions of the OS
Lec 6 Operating Systems 85
What Happens on a Context Switch?
• TLB maps virtual addresses to physical addresses• Since the address space just changed, TLB entries are no
longer valid• Options
– Invalid TLB (simple but expensive – what if there are frequent context switches?)
– Include PID in the TLB (requires hardware support)
• What if translation tables change?– What if a page is swapped in or out, or replaced?– MUST invalidate TLB entry
• Otherwise, might think the page is still in memory!
Lec 6 Operating Systems 86
What TLB Organization Makes Sense?• Sits between CPU and cache• As such, needs to be very fast
– Critical path of memory access: this reduces effective cache speed since TLB lookup is at a minimum on accesses to cache
– Hence, L1 cache is often virtually addressed
• So, TLB should be direct mapped or be low associativity?– But we can’t afford conflict misses!– TLB miss time is very high– So the cost of a conflict miss is much higher than a slightly increased hit time
• What if we use low order bits of page as index into TLB?– First page of code, data, stack may map to same entry– Need at least 3-way associativity
• What if we use high order bits as index?– TLB mostly unused for small programs
Lec 6 Operating Systems 87
TLB Organization
• TLB is usually small: 128-512 entries• Small size supports high associativity: often fully
associative• Lookup is by virtual address, PTE is returned• What if fully-associative is too slow for me?
Lec 6 Operating Systems 88
TLB Organization
• TLB is usually small: 128-512 entries• Small size supports high associativity: often fully
associative• Lookup is by virtual address, PTE is returned• What if fully-associative is too slow for me?
– Put a small (4-16) entry direct mapped cache in front– Called a “TLB slice”
• Also, overlap TLB lookup with cache access, since the offset is already available