Lecture notes for CS 333 - Chapter 5, part 2 11/6/18 Sarita Adve 1 Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.5) Memory Technologies Dynamic Random Access Memory (DRAM) Optimized for density, not speed One transistor cells Multiplexed address pins Row Address Strobe (RAS) Column Address Strobe (CAS) Cycle time > access time Destructive reads Must refresh every few ms Access every row Sold as dual inline memory modules (DIMMs) Memory Technologies, cont. Static Random Access Memory (SRAM) Optimized for speed, then density Typically 6 transistors per cell Separate address pins Static Þ No Refresh Greater power dissipation than DRAM Access time = cycle time DRAM Organization DIMM Rank Bank Array Row buffer
8
Embed
Lecture notes for CS 333 -Chapter 5, part 2 11/6/18...Lecture notes for CS 333 -Chapter 5, part 2 11/6/18 Sarita Adve 1 Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Lecture notes for CS 333 - Chapter 5, part 2 11/6/18
Memory Technologies, cont.Static Random Access Memory (SRAM)
Optimized for speed, then density Typically 6 transistors per cell Separate address pins
Static Þ No Refresh Greater power dissipation than DRAM Access time = cycle time
DRAM OrganizationDIMM
Rank
Bank
Array
Row buffer
Lecture notes for CS 333 - Chapter 5, part 2 11/6/18
Sarita Adve 2
DRAM OrganizationRank: chips needed to respond to a single request
Assume 64 bit data busFor 8 bit DRAM, need 8 chips in a rankFor 4 bit DRAM, need 16 chips in a rank
Can have multiple ranks per DIMMBank: A chip is divided into multiple independent banks for pipelined access
Array: A bank consists of many arrays, 1 array per bit of output, for parallel access
Row buffer: A “cache” that preserves the last row read from a bank
DRAM OrganizationSee figure 1.5 in
The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake ItBy Bruce JacobSynthesis Lectures on Computer Architecture, Morgan & ClaypoolSeries editor: Mark Hill
Downloadable from U of I accounts
Internals of a DRAM ArraySee Figure 1.6 of the synthesis lectureSteps to access a bit
Pre-charge bit linesActivate row: turn on word line for the row, brings data to sense
ampsColumn read: send subset of data (columns)(Restore data)
DRAM Optimizations – Page ModeUnoptimized DRAM
First read entire row Then select column from row Stores entire row in a buffer
Page Mode Row buffer acts like an SRAMBy changing column address, random bits can be accessed
within a row.
Lecture notes for CS 333 - Chapter 5, part 2 11/6/18
Sarita Adve 3
DRAM Optimizations – Synchronous DRAMPreviously, DRAM had asynchronous interfaceEach transfer involves handshaking with controllerSynchronous DRAM (SDRAM)
Clock added to interfaceRegister to hold number of bytes requested
Send multiple bytes per requestDouble Data Rate (DDR)
Send data on rising and falling edge of clock
Simple Main MemoryConsider a memory with these parameters:
1 cycle to send address 6 cycles to access each word 1 cycle to send word back to CPU/Cache
What's the miss penalty for a 4word block? (1 + 6 cycles + 1 cycle) ´ 4 words
= 32 cycles
How can we speed this up?
Wider Main MemoryMake the memory wider
Read out 2 (or more) words in parallel Memory parameters:
1 cycle to send address 6 cycles to access each doubleword1 cycle to send doubleword back to CPU/Cache
Miss penalty for a 4 word block: (1 + 6 cycles + 1 cycle) ´ 2 doublewords
= 16 cycles Cost
Wider bus Larger expansion size
Interleaved Main MemoryOrganize memory in banks
Subsequent words map to different banks Word A in bank (A mod M) Within a bank, word A in location (A div M)
Word address
BankWord in Bank
How many banks to include?
Lecture notes for CS 333 - Chapter 5, part 2 11/6/18
Figure 2.4 Capacity and access times for DDR SDRAMs by year of production. Access time is for a random memory word and assumes a new row must be opened. If the row is in a different bank, we assume the bank is precharged; if the row is not open, then a precharge is required, and the access time is longer. As the number of banks has increased, the ability to hide the precharge time has also increased. DDR4 SDRAMs were initially expected in 2014, but did not begin production until early 2016.
Figure 2.5 Clock rates, bandwidth, and names of DDR DRAMS and DIMMs in 2016. Note the numerical relationship between the columns. The third column is twice the second, and the fourth uses the number from the third column in the name of the DRAM chip. The fifth column is eight times the third column, and a rounded version of this number is used in the name of the DIMM. DDR4 sawsignificant first use in 2016.
Figure 2.6 Power consumption for a DDR3 SDRAM operating under three conditions: low-power (shutdown) mode, typical system mode (DRAM is active 30% of the time for reads and 15% for writes), and fully active mode, where the DRAM is continuously reading or writing. Reads and writes assume bursts of eight transfers. These data are based on a Micron 1.5V 2GB DDR3-1066, although similar savings occur in DDR4 SDRAMs.
Other TechnologiesGraphics Data RAMS (GDDR)
Wider (32 bits), higher clock, connect directly to GPUs (soldered to board vs. DIMMs)
Die stacked DRAMs / 3D / High Bandwidth Memory (HBM)
Nonvolatile memory (later)FlashPhase change
Reliability: Parity, ECC, chipkill
Lecture notes for CS 333 - Chapter 5, part 2 11/6/18
Figure 2.7 Two forms of die stacking. The 2.5D form is available now. 3D stacking is under development and faces heat management challenges due to the CPU.
Virtual MemoryUser operates in a virtual address space, mapping between virtual space
and main memory is determined at runtime
Original MotivationAvoid overlays Use main memory as a cache for disk
Current motivation Relocation Protection Sharing Fast startup
Engineered differently than CPU caches Miss access time O(1,000,000) Miss access time >> miss transfer time
Virtual Memory, cont.Blocks, called pages, are 512 to 16K bytes.
Page identification Address translation -- virtual to physical address Indirection through one or two page tables Translation cached in translation buffer
Page replacement Approx. LRU
Write strategy Writeback (with page dirty bit)
Address Translation
Logical Path Two memory operations Often two or three levels of page tables TOO SLOW!
virtual page number page offset
page-table-base-register
protectiondirty bitreference bitin-memory?
page offsetpage frame number
+
Page Table
XXXXXXXXX
Lecture notes for CS 333 - Chapter 5, part 2 11/6/18
Sarita Adve 6
Address Translation
Fast Path Translation Lookaside Buffer (TLB, TB) A cache w/ PTEs for data Number of entries 32 to 1024