Top Banner
Design Tradeoffs For Software- Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown
24

Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Dec 30, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Design Tradeoffs For Software-Managed TLBsAuthers; Nagle, Uhlig, Stanly

Sechrest, Mudge & Brown

Page 2: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Definition

The virtual to physical address translation operation sits on the critical path between the CPU and the cache.

If every request for a memory location out from the processor required one or more accesses to main memory (to read page

table entries), then the processor would be very slow. TLB is a cache for page table entries. It works in much the

same way as the data cache, it stores recently accessed page table entries.

Page 3: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Operations on an address request by the CPU

Each TLB entry covers a whole page of physical memory

a relatively small number of TLB entries will cover a large amount of memory

The large coverage of main memory by each TLB entry means that TLBs have a high hit rate

Page 4: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

TLB types

Fully associative in early TLB design Set associative, is more common in new

design

Page 5: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

The Problem.

This paper discusses software managed TLB design tradeoffs and their interaction with a range of operating systems however software management can impose considerable penalties, which can highly dependent on the operating system structure and its use of virtual memory

Namely memory references that require mappings not in the TLB result in misses that must be serviced either by hardware or by software.

Page 6: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Test Environment DECstation 3100 with MIPS R2000 processor R2000 contains 64 entry fully-associative TLB R2000 TLB hardware supports partitioning into two

sets, an upper and lower set Lower set consists of entries 0-7 and is used for Page

Table Entries with slow retrieval Upper set consists of entries 8-63 and contains more

frequently used level 1 user PTEs

Page 7: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Test Tools.

a system analysis tool called Monster, which enables us to monitor actual miss handling costs in CPU cycles.

a TLB simulator called Tapeworm which is compiled directly into the kernel so that it can intercept all of the actual TLB misses caused by both user processes and OS kernel memory references.

TLB information that Tapeworm extracts from the running system is used to obtain TLB miss

counts and to simulate different TLB configurations.

Page 8: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

System monitoring with monster. Monster is a hardware monitoring system, its

comprised of a monitored DECstation 3100, an attached logic analyzer and a controlling workstation .

Measures the amount of time to handle each TLB miss

Page 9: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

TLB Simulation with Tapeworm. The Tapeworm simulator is built into the

operating system and is invoked whenever there is a TLB miss.

The simulator uses the real TLB misses to simulate its own TLB configuration.

Page 10: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.
Page 11: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Trace Driven Simulation Trace driven simulation was used because it’s

good for studying the components of a computer memory systems like TLBs.

a sequence of memory references to the simulation model to mimic the way that a real processor might exercise the design.

Page 12: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Problems with Trace driven simulation

Difficult to obtain accurate traces. Consumes a considerable processing and

storage resources It assumes that address traces are invariant to

changes in the structural parameters of a simulated TLB

Page 13: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Solution. Compiling the TLB simulator Tapeworn, directly

onto the operating system kernel. This allows us to account for all system activity, including multiple process and kernel interactions.

It does not require address trace It considers all TLB misses, caused by user level

tasks, or kernel.

Page 14: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Benchmarks

Page 15: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Operating Systems

Page 16: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Test Results

Page 17: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

OS Impact on software managed TLBs Different OS gave different results, although

the same application were run on each system.

There is a difference in TLB misses & total TLB service time

Page 18: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Increasing TLB Performance Additional TLB Miss Vectors. Increase Lower Slots in TLB Partition. Increase TLB Size. Modify TLB Associativity.

Page 19: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

TLB Miss Vectors L1 User - on level 1 user PTE L1 Kernel - miss on level 1 kernel PTE L2 - miss on level 2 PTE, after level 1 user

miss L3 - miss on level 3 PTE, after level 1 kernel

miss Modify - miss on protection violation Invalid – page fault

Page 20: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

TLB Miss Vector Results

Page 21: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Modifying Lower TLB Partition OSF/1 OS - increase from 4 to 5 lower slots

decreases miss handling time by 50% Mach 3.0 OS – performance increase up to 8

slots Microkernel's benefit from lower TLB

partition increase because many system calls (e.g. Unix server on Mach 3.0) mapped to L2 PTEs

Page 22: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Increasing TLB size

Page 23: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Increasing TLB size

• Building TLBs with additional upper slots.• The most significant component is L1k

misses, that’s due to the large number of mapped data structure in the kernel.

• Allowing the uTLB handler to service L1k misses reduces the TLB service time.

• In each system there is a noticeable improvement in the TLB service time as the TLB increases.

Page 24: Design Tradeoffs For Software-Managed TLBs Authers; Nagle, Uhlig, Stanly Sechrest, Mudge & Brown.

Conclusion.

Software-management of TLBs magnifies the importance of the interactions between TLBs and operating systems, because of the large variation in TLB miss service times that can exist. TLB behavior depends upon the kernel’s use of virtual memory to map its own data structures, including the page tables themselves. TLB behavior is also dependent upon the division of service functionality between the kernel and separate user tasks.