Cache Physical ImplementationCache Physical Implementation
Panayiotis CharalambousXi Research Group
Panayiotis CharalambousXi Research Group
ContentsContents
Cache Logical ViewPhysical ViewCase Study – Power 4 L2 Cache
Cache Logical ViewPhysical ViewCase Study – Power 4 L2 Cache
Logical Cache StructureLogical Cache Structure
n-way associative cachen-way associative cachen-elements per set
2m Sets
Tag Index
Address (32 bits)
= =
DataHit
m
32 – m - k
…
Offset
k
or
Cache StructureCache Structure
Cache AccessCache Access
Steps1. Decode address2. Enable the word line3. Raise the bit lines to high4. Get the tag value from the tag array5. Check for tag match6. Select data output
Steps1. Decode address2. Enable the word line3. Raise the bit lines to high4. Get the tag value from the tag array5. Check for tag match6. Select data output
Conventional Cache Organization
Conventional Cache Organization
Memory Cell
Memory CellMemory Cell
bit' bit
Read: Set bit and bit´
high If the value in the
cell is 1, then bit´ is discharged. It the value is 0, then bit is discharged
Write: Set bit´ to 0. This
forces 1 in the latch.
Read: Set bit and bit´
high If the value in the
cell is 1, then bit´ is discharged. It the value is 0, then bit is discharged
Write: Set bit´ to 0. This
forces 1 in the latch.
Decoder with DriverDecoder with Driver
Various ComponentsVarious Components
Comparator is xor logic Multiplexer hierarchy for offset. First get
block (from output drive), then word, then byte
Comparator is xor logic Multiplexer hierarchy for offset. First get
block (from output drive), then word, then byte
Output Driver Maximum of one
input bits high If input 0, then high
resistant output
Output Driver Maximum of one
input bits high If input 0, then high
resistant output
…I0 I1 I7
BankingBanking
Idea: Support Multiple Cache Accesses
Solution: Use multiporting
on bit cells (Cost is big)
Divide the cache into independent banks
Idea: Support Multiple Cache Accesses
Solution: Use multiporting
on bit cells (Cost is big)
Divide the cache into independent banks
Cache SearchCache Search
Steps:1. Find Bank (bank index)2. Find Set in Bank (index)3. Check if data is valid and in the
cache (tag match)4. If all ok return data (block and byte
offset), else check lower level memory
Steps:1. Find Bank (bank index)2. Find Set in Bank (index)3. Check if data is valid and in the
cache (tag match)4. If all ok return data (block and byte
offset), else check lower level memory
Case Study - Power 4Case Study - Power 4
Dual Core 64-bit Processors
32KB L1 D-Cache (Per Processor) 2-way associative 128 Bytes Line
64KB L1 I-Cache (Per Processor) Direct Mapped 128 Bytes Line (4
sectors x 32B) ~1.5MB L2 Cache
8-way set associative 128 Bytes line
Dual Core 64-bit Processors
32KB L1 D-Cache (Per Processor) 2-way associative 128 Bytes Line
64KB L1 I-Cache (Per Processor) Direct Mapped 128 Bytes Line (4
sectors x 32B) ~1.5MB L2 Cache
8-way set associative 128 Bytes line
Power4 FloorplanPower4 Floorplan
Power4 L2 Logical ViewPower4 L2 Logical View
Cache Split into 3 Parts, 0.5Mb each
Control by 4 Coherency Processors
1 64B Store Queue per Processor
Cache Split into 3 Parts, 0.5Mb each
Control by 4 Coherency Processors
1 64B Store Queue per Processor
Power4 L2UPower4 L2U
~512 KB 8 Banks 128 B block size 8-way associative
~512 KB 8 Banks 128 B block size 8-way associative
Word lines
Bit lines
Decoders
Address Bus
Power4Power4
L2 Cache Block Size C = 512 KB = 219 B Block Size = 128 B = 27 B 8-way associative 8 Banks per Cache Block Therefore:
Set Size is 23*27 B= 210 B Sets in Cache are 219/210 =29 sets Sets per Bank are 29 / 23 = 26 sets
L2 Cache Block Size C = 512 KB = 219 B Block Size = 128 B = 27 B 8-way associative 8 Banks per Cache Block Therefore:
Set Size is 23*27 B= 210 B Sets in Cache are 219/210 =29 sets Sets per Bank are 29 / 23 = 26 sets
tag index offset
bank index set index
64-bit
79
63
Power4: CACTI ResultsPower4: CACTI Resultscacti 524288 128 8 0.8um 8
---------- CACTI version 3.2 ----------
Cache Parameters: Number of Subbanks: 8 Total Cache Size: 524288 Size in bytes of Subbank: 65536 Number of sets: 64 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V
Access Time (ns): 12.3473Cycle Time (wave pipelined) (ns): 4.97337Total Power all Banks (nJ): 418.337Total Power Without Routing (nJ): 198.563Total Routing Power (nJ): 219.774Maximum Bank Power (nJ): 63.5175
Best Ndwl (L1): 16Best Ndbl (L1): 1Best Nspd (L1): 1Best Ntwl (L1): 1Best Ntbl (L1): 1Best Ntspd (L1): 1Nor inputs (data): 2Nor inputs (tag): 2
cacti 524288 128 8 0.8um 8
---------- CACTI version 3.2 ----------
Cache Parameters: Number of Subbanks: 8 Total Cache Size: 524288 Size in bytes of Subbank: 65536 Number of sets: 64 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V
Access Time (ns): 12.3473Cycle Time (wave pipelined) (ns): 4.97337Total Power all Banks (nJ): 418.337Total Power Without Routing (nJ): 198.563Total Routing Power (nJ): 219.774Maximum Bank Power (nJ): 63.5175
Best Ndwl (L1): 16Best Ndbl (L1): 1Best Nspd (L1): 1Best Ntwl (L1): 1Best Ntbl (L1): 1Best Ntspd (L1): 1Nor inputs (data): 2Nor inputs (tag): 2
cacti 524288 128 8 0.8um 16
---------- CACTI version 3.2 ----------
Cache Parameters: Number of Subbanks: 16 Total Cache Size: 524288 Size in bytes of Subbank: 32768 Number of sets: 32 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V
Access Time (ns): 12.434Cycle Time (wave pipelined) (ns): 4.85483Total Power all Banks (nJ): 793.381Total Power Without Routing (nJ): 341.424Total Routing Power (nJ): 451.957Maximum Bank Power (nJ): 63.1382
Best Ndwl (L1): 16Best Ndbl (L1): 1Best Nspd (L1): 1Best Ntwl (L1): 1Best Ntbl (L1): 1Best Ntspd (L1): 1Nor inputs (data): 2Nor inputs (tag): 2
cacti 524288 128 8 0.8um 16
---------- CACTI version 3.2 ----------
Cache Parameters: Number of Subbanks: 16 Total Cache Size: 524288 Size in bytes of Subbank: 32768 Number of sets: 32 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V
Access Time (ns): 12.434Cycle Time (wave pipelined) (ns): 4.85483Total Power all Banks (nJ): 793.381Total Power Without Routing (nJ): 341.424Total Routing Power (nJ): 451.957Maximum Bank Power (nJ): 63.1382
Best Ndwl (L1): 16Best Ndbl (L1): 1Best Nspd (L1): 1Best Ntwl (L1): 1Best Ntbl (L1): 1Best Ntspd (L1): 1Nor inputs (data): 2Nor inputs (tag): 2
CACTICACTI
Data Array Ndwl: World line split factor Ndbl: Bit line split factor Nspd: Number of sets mapped to a single word line
(sectors) Tag Array
Ntwl: World line split factor Ntbl: Bit line split factor Nspt: Number of sets mapped to a single word line
(sectors) Increase of Ndbl, Nspd, Ntbl, Nspt requires the
increase of sense amplifiers Increase of Ndwl and Ntwl increases the
number of word line drivers
Data Array Ndwl: World line split factor Ndbl: Bit line split factor Nspd: Number of sets mapped to a single word line
(sectors) Tag Array
Ntwl: World line split factor Ntbl: Bit line split factor Nspt: Number of sets mapped to a single word line
(sectors) Increase of Ndbl, Nspd, Ntbl, Nspt requires the
increase of sense amplifiers Increase of Ndwl and Ntwl increases the
number of word line drivers
Thank YouThank You