Top Banner
Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001
52

Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

Jan 06, 2018

Download

Documents

Susan Haynes

3 © Alvin R. Lebeck 2001 CPS 220 I/O Bus Core Chip Set Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface Graphics Network interrupts System Organization
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

Operating Systems & Memory Systems: Address Translation

CPS 220Professor Alvin R. Lebeck

Fall 2001

Page 2: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 2© Alvin R. Lebeck 2001

Outline

• Address Translation– basics– 64-bit Address Space

• Managing memory• OS PerformanceThroughout• Review Computer Architecture• Interaction with Architectural Decisions

Page 3: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 3© Alvin R. Lebeck 2001

I/O Bus

Core Chip Set

Processor

Cache

MainMemory

DiskController

Disk Disk

GraphicsController

NetworkInterface

Graphics Network

interrupts

System Organization

Page 4: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 4© Alvin R. Lebeck 2001

Computer Architecture

• Interface Between Hardware and Software

Hardware

SoftwareOperatingSystem

Compiler

Applications

CPU Memory I/O

Multiprocessor Networks

This is IT

Page 5: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 5© Alvin R. Lebeck 2001

Memory Hierarchy 101

P

$

Memory

Very fast 1ns clockMultiple Instructionsper cycle SRAM, Fast, Small

Expensive

DRAM, Slow, Big,Cheap(called physical or main)

=> Cost Effective Memory System (Price/Performance)

Magnetic, Really Slow,Really Big, Really Cheap

Page 6: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 6© Alvin R. Lebeck 2001

Virtual Memory: Motivation

• Process = Address Space + thread(s) of control

• Address space = PA– programmer controls

movement from disk– protection?– relocation?

• Linear Address space– larger than physical

address space» 32, 64 bits v.s. 28-bit

physical (256MB)

• Automatic management

Virtual

Physical

Page 7: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 7© Alvin R. Lebeck 2001

Virtual Memory

• Process = virtual address space + thread(s) of control• Translation

– VA -> PA– What physical address does virtual address A map to– Is VA in physical memory?

• Protection (access control)– Do you have permission to access it?

Page 8: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 8© Alvin R. Lebeck 2001

Virtual Memory: Questions

• How is data found if it is in physical memory?

• Where can data be placed in physical memory? Fully Associative, Set Associative, Direct Mapped

• What data should be replaced on a miss? (Take CPS210 …)

Page 9: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 9© Alvin R. Lebeck 2001

Segmented Virtual Memory

• Virtual address (232, 264) to Physical Address mapping (230)

• Variable size, base + offset, contiguous in both VA and PA

Virtual

Physical0x1000

0x6000

0x9000

0x00000x1000

0x2000

0x11000

Page 10: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 10© Alvin R. Lebeck 2001

Intel Pentium Segmentation

Seg Selector Offset

Logical Address

SegmentDescriptor

Global DescriptorTable (GDT)

Segment Base Address

Physical Address Space

Page 11: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 11© Alvin R. Lebeck 2001

Pentium Segmention (Continued)

• Segment Descriptors– Local and Global– base, limit, access rights– Can define many

• Segment Registers– contain segment descriptors (faster than load from mem)– Only 6

• Must load segment register with a valid entry before segment can be accessed

– generally managed by compiler, linker, not programmer

Page 12: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 12© Alvin R. Lebeck 2001

Paged Virtual Memory

• Virtual address (232, 264) to Physical Address mapping (228)

– virtual page to physical page frame• Fixed Size units for access control & translation

Virtual

Physical0x1000

0x6000

0x9000

0x00000x1000

0x2000

0x11000

Virtual page number Offset

Page 13: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 13© Alvin R. Lebeck 2001

Page Table

• Kernel data structure (per process)• Page Table Entry (PTE)

– VA -> PA translations (if none page fault)– access rights (Read, Write, Execute, User/Kernel, cached/uncached)– reference, dirty bits

• Many designs– Linear, Forward mapped, Inverted, Hashed, Clustered

• Design Issues– support for aliasing (multiple VA to single PA)– large virtual address space– time to obtain translation

Page 14: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 14© Alvin R. Lebeck 2001

Alpha VM Mapping (Forward Mapped)

• “64-bit” address divided into 3 segments

– seg0 (bit 63=0) user code/heap– seg1 (bit 63 = 1, 62 = 1) user stack– kseg (bit 63 = 1, 62 = 0)

kernel segment for OS• Three level page table, each one

page– Alpha 21064 only 43 unique bits of VA– (future min page size up to 64KB => 55

bits of VA)• PTE bits; valid, kernel & user read

& write enable (No reference, use, or dirty bit)

– What do you do for replacement?

2110

POL3L2L1

base+

10 10 13

+

+

phys pageframe number

seg 0/1

Page 15: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 15© Alvin R. Lebeck 2001

Inverted Page Table (HP, IBM)

• One PTE per page frame

– only one VA per physical frame

• Must search for virtual address

• More difficult to support aliasing

• Force all sharing to use the same VA

Virtual page number Offset

VA PA,ST

Hash Anchor Table (HAT)

Inverted Page Table (IPT)

Hash

Page 16: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 16© Alvin R. Lebeck 2001

Intel Pentium Segmentation + Paging

Seg Selector Offset

Logical Address

SegmentDescriptor

Global DescriptorTable (GDT)

Segment Base Address

Linear Address Space

PageDir

Physical Address Space

Dir OffsetTable

PageTable

Page 17: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 17© Alvin R. Lebeck 2001

The Memory Management Unit (MMU)

• Input– virtual address

• Output– physical address– access violation (exception, interrupts the processor)

• Access Violations– not present– user v.s. kernel– write– read– execute

Page 18: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 18© Alvin R. Lebeck 2001

Translation Lookaside Buffers (TLB)

• Need to perform address translation on every memory reference

– 30% of instructions are memory references– 4-way superscalar processor– at least one memory reference per cycle

• Make Common Case Fast, others correct• Throw HW at the problem• Cache PTEs

Page 19: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 19© Alvin R. Lebeck 2001

Fast Translation: Translation Buffer

• Cache of translated addresses• Alpha 21164 TLB: 48 entry fully associative

Page Number

Pageoffset

. . . . . .

v r w tag phys frame

. . .

48:1 mux

1 2

. . .

483

4

Page 20: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 20© Alvin R. Lebeck 2001

TLB Design

• Must be fast, not increase critical path• Must achieve high hit ratio• Generally small highly associative• Mapping change

– page removed from physical memory– processor must invalidate the TLB entry

• PTE is per process entity– Multiple processes with same virtual addresses– Context Switches?

• Flush TLB• Add ASID (PID)

– part of processor state, must be set on context switch

Page 21: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 21© Alvin R. Lebeck 2001

Hardware Managed TLBs

• Hardware Handles TLB miss

• Dictates page table organization

• Compilicated state machine to “walk page table”

– Multiple levels for forward mapped

– Linked list for inverted

• Exception only if access violation

Control

Memory

TLB

CPU

Page 22: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 22© Alvin R. Lebeck 2001

Software Managed TLBs

• Software Handles TLB miss

• Flexible page table organization

• Simple Hardware to detect Hit or Miss

• Exception if TLB miss or access violation

• Should you check for access violation on TLB miss?

Control

Memory

TLB

CPU

Page 23: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 23© Alvin R. Lebeck 2001

Kernel

Mapping the Kernel

• Digital Unix Kseg– kseg (bit 63 = 1, 62 = 0)

• Kernel has direct access to physical memory

• One VA->PA mapping for entire Kernel

• Lock (pin) TLB entry– or special HW detection

UserStack

Kernel

User Code/Data

PhysicalMemory

0

264-1

Page 24: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 24© Alvin R. Lebeck 2001

Considerations for Address Translation

Large virtual address space• Can map more things

– files– frame buffers– network interfaces– memory from another workstation

• Sparse use of address space• Page Table Design

– space– less locality => TLB misses

OS structure• microkernel => more TLB misses

Page 25: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 25© Alvin R. Lebeck 2001

Address Translation for Large Address Spaces

• Forward Mapped Page Table– grows with virtual address space

» worst case 100% overhead not likely– TLB miss time: memory reference for each level

• Inverted Page Table– grows with physical address space

» independent of virtual address space usage– TLB miss time: memory reference to HAT, IPT, list search

Page 26: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 26© Alvin R. Lebeck 2001

Hashed Page Table (HP)

• Combine Hash Table and IPT [Huck96]

– can have more entries than physical page frames

• Must search for virtual address

• Easier to support aliasing than IPT

• Space– grows with physical space

• TLB miss– one less memory ref than

IPT

Virtual page number Offset

VA PA,ST

Hashed Page Table (HPT)Hash

Page 27: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 27© Alvin R. Lebeck 2001

Clustered Page Table (SUN)

• Combine benefits of HPT and Linear [Talluri95]

• Store one base VPN (TAG) and several PPN values

– virtual page block number (VPBN)

– block offset

VPBN Offset

VPBNnext

PA0 attrib

Hash

Boff

VPBNnext

PA0 attrib

..... .

PA1 attribPA2 attribPA3 attrib

VPBNnext

PA0 attribVPBNnext

PA0 attrib

Page 28: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 28© Alvin R. Lebeck 2001

Reducing TLB Miss Handling Time

• Problem– must walk Page Table on TLB miss– usually incur cache misses– big problem for IPC in microkernels

• Solution– build a small second-level cache in SW– on TLB miss, first check SW cache

» use simple shift and mask index to hash table

Page 29: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 29© Alvin R. Lebeck 2001

Next Time

• More TLB issues• Virtual Memory & Caches• Multiprocessor Issues

Page 30: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

Operating Systems & Memory Systems: Managing the Memory System

CPS 220Professor Alvin R. Lebeck

Page 31: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 31© Alvin R. Lebeck 2001

Review: Address Translation

• Map from virtual address to physical address• Page Tables, PTE

– va->pa, attributes– forward mapped, inverted, hashed, clustered

• Translation Lookaside Buffer– hardware cache of most recent va->pa translation– misses handled in hardware or software

• Implications of larger address space– page table size– possibly more TLB misses

• OS Structure– microkernels -> lots of IPC -> more TLB misses

Page 32: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 32© Alvin R. Lebeck 2001

Cache Memory 102

• Block 7 placed in 4 block cache:

– Fully associative, direct mapped, 2-way set associative

– S.A. Mapping = Block Number Modulo Number Sets

– DM = 1-way Set Assoc

• Cache Frame– location in cache

• Bit-selection

0 1 2 3 7

0 1 2 3 0 1 2 3

FADM7 mod 4

0 1 2 3

SA7 mod 2

Set 0

Set 1

MainMemory

Page 33: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 33© Alvin R. Lebeck 2001

Cache Indexing

• Tag on each block– No need to check index or block offset

• Increasing associativity shrinks index, expands tag

Fully Associative: No indexDirect-Mapped: Large index

Block offset

Block Address

TAG Index

Page 34: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 34© Alvin R. Lebeck 2001

Address Translation and Caches

• Where is the TLB wrt the cache?• What are the consequences?

• Most of today’s systems have more than 1 cache– Digital 21164 has 3 levels – 2 levels on chip (8KB-data,8KB-inst,96KB-unified)– one level off chip (2-4MB)

• Does the OS need to worry about this?

Definition: page coloring = careful selection of va->pa mapping

Page 35: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 35© Alvin R. Lebeck 2001

TLBs and Caches

CPU

TLB

$

MEM

VA

PA

PA

ConventionalOrganization

CPU

$

TLB

MEM

VA

VA

PA

Virtually Addressed CacheTranslate only on miss

Alias (Synonym) Problem

CPU

$ TLB

MEM

VA

PATags

PA

Overlap $ accesswith VA translation:requires $ index to

remain invariantacross translation

VATags

L2 $

Page 36: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 36© Alvin R. Lebeck 2001

Virtual Caches

• Send virtual address to cache. Called Virtually Addressed Cache or just Virtual Cache vs. Physical Cache or Real Cache

• Avoid address translation before accessing cache– faster hit time to cache

• Context Switches?– Just like the TLB (flush or pid)– Cost is time to flush + “compulsory” misses from empty cache– Add process identifier tag that identifies process as well as address

within process: can’t get a hit if wrong process

• I/O must interact with cache

Page 37: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 37© Alvin R. Lebeck 2001

I/O Bus

Memory Bus

Processor

Cache

MainMemory

DiskController

Disk Disk

GraphicsController

NetworkInterface

Graphics Network

interrupts

I/O and Virtual Caches

I/O Bridge

VirtualCache

PhysicalAddresses

I/O is accomplishedwith physical addressesDMA• flush pages from cache• need pa->va reverse translation• coherent DMA

Page 38: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 38© Alvin R. Lebeck 2001

Aliases and Virtual Caches

• aliases (sometimes called synonyms); Two different virtual addresses map to same physical address

• But, but... the virtual address is used to index the cache

• Could have data in two different locations in the cache

Kernel

UserStack

Kernel

User Code/Data

PhysicalMemory

0

264-1

Page 39: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 39© Alvin R. Lebeck 2001

• If index is physical part of address, can start tag access in parallel with translation so that can compare to physical tag

• Limits cache to page size: what if want bigger caches and use same trick?

– Higher associativity– Page coloring

Index with Physical Portion of Address

Page Address Page Offset

Address Tag Index Block Offset

Page 40: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 40© Alvin R. Lebeck 2001

Page Coloring for Aliases

• HW that guarantees that every cache frame holds unique physical address

• OS guarantee: lower n bits of virtual & physical page numbers must have same value; if direct-mapped, then aliases map to same cache frame

– one form of page coloring

Page Address

Page Offset

Address Tag

Index

Block Offset

Page 41: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 41© Alvin R. Lebeck 2001

Virtual Memory and Physically Indexed Caches

• Notion of bin– region of cache that may

contain cache blocks from a page

• Random vs careful mapping

• Selection of physical page frame dictates cache index

• Overall goal is to minimize cache misses

Cache Page frames

Page 42: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 42© Alvin R. Lebeck 2001

Careful Page Mapping

[Kessler92, Bershad94]• Select a page frame such that cache conflict misses

are reduced– only choose from available pages (no replacement induced)

• static– “smart” selection of page frame at page fault time

• dynamic– move pages around

Page 43: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 43© Alvin R. Lebeck 2001

Page Coloring

• Make physical index match virtual index• Behaves like virtual index cache

– no conflicts for sequential pages

• Possibly many conflicts between processes– address spaces all have same structure (stack, code, heap)– modify to xor PID with address (MIPS used variant of this)

• Simple implementation• Pick abitrary page if necessary

Page 44: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 44© Alvin R. Lebeck 2001

Bin Hopping

• Allocate sequentially mapped pages (time) to sequential bins (space)

• Can exploit temporal locality– pages mapped close in time will be accessed close in time

• Search from last allocated bin until bin with available page frame

• Separate search list per process• Simple implementation

Page 45: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 45© Alvin R. Lebeck 2001

Best Bin

• Keep track of two counters per bin– used: # of pages allocated to this bin for this address space– free: # of available pages in the system for this bin

• Bin selection is based on low values of used and high values of free

• Low used value– reduce conflicts within the address space

• High free value– reduce conflicts between address spaces

Page 46: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 46© Alvin R. Lebeck 2001

Hierarchical

• Best bin could be linear in # of bins• Build a tree

– internal nodes contain sum of child <used,free> values

• Independent of cache size– simply stop at a particular level in the tree

Page 47: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 47© Alvin R. Lebeck 2001

Benefit of Static Page Coloring

• Reduces cache misses by 10% to 20%• Multiprogramming

– want to distribute mapping to avoid inter-address space conflicts

Page 48: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 48© Alvin R. Lebeck 2001

Dynamic Page Coloring

• Cache Miss Lookaside (CML) buffer [Bershad94]– proposed hardware device

• Monitor # of misses per page• If # of misses >> # of cache blocks in page

– must be conflict misses– interrupt processor – move a page (recolor)

• Cost of moving page << benefit

Page 49: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 49© Alvin R. Lebeck 2001

Outline

• Page Coloring• Page Size

Page 50: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 50© Alvin R. Lebeck 2001

A Case for Large Pages

• Page table size is inversely proportional to the page size

– memory saved

• Fast cache hit time easy when cache <= page size (VA caches);

– bigger page makes it feasible as cache size grows

• Transferring larger pages to or from secondary storage, possibly over a network, is more efficient

• Number of TLB entries are restricted by clock cycle time,

– larger page size maps more memory– reduces TLB misses

Page 51: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 51© Alvin R. Lebeck 2001

A Case for Small Pages

• Fragmentation– large pages can waste storage– data must be contiguous within page

• Quicker process start for small processes(??)

Page 52: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

CPS 220 52© Alvin R. Lebeck 2001

Superpages

• Hybrid solution: multiple page sizes– 8KB, 16KB, 32KB, 64KB pages– 4KB, 64KB, 256KB, 1MB, 4MB, 16MB pages

• Need to identify candidate superpages– Kernel– Frame buffers– Database buffer pools

• Application/compiler hints• Detecting superpages

– static, at page fault time– dynamically create superpages

• Page Table & TLB modifications