By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz

“NAHALAL : Cache Organization for Chip Multiprocessors”

New LSU Policy

By : Ido Shayevitz and Yoav Shargil

Supervisor: Zvika Guz

NAHALAL ARCHTECTURENAHALAL architecture defines the memory cache banks of the L2 cache.

Each processor has a private backyard bank and all processors shared a small bank.

The architecture is based on the hot shared line phenomenon.

LSU Improvement Placement Policy Replacement Policy from Private Bank : LRU Replacement Policy from Public Bank :

NAHALAL

LRUX LSU

LSU policy wisely select the Least

Shared Used line to throw from the

public bank.

LSU Implementation Shift-register with N cells for each Line.

Each cell in the shift-register hold CPU num

In throwing by CPUi : For each shift-register do XOR between each cell and the ID of CPUi. The shift-register on which the XOR produce 0, will be the chosen one. If non produce 0 then do regular LRU.

In order ro reduce memory overhead, define N=4. Therefore 2 *4*3 = 0.1875MB 18.75% memory overhead.

14

Simple, short time algorithm in HW

Simulation Structure in SimicsUsing pyhton script we defined :

Writing BenchmarksWriting Benchmarks is done in the simulated target console :

Writing Benchmarks Using Threads with pthread library

Each Thread is associated to a CPU using sched library.

Parallel code is written in the benchmark

Also OS code and pthread code cause to Parallel code.

Each benchmark we run first without LSU and second with LSU.

Collecting StatisticsCache statistics: l2c----------------- Total number of transactions: 610349 Total memory stall time: 31402835 Total memory hit stall time: 28251635

Device data reads (DMA): 0 Device data writes (DMA): 0

Uncacheable data reads: 17 Uncacheable data writes: 30738 Uncacheable instruction fetches: 0

Data read transactions: 403488 Total read stall time: 17488735 Total read hit stall time: 14383135 Data read remote hits: 0 Data read misses: 10352 Data read hit ratio: 97.43% Instruction fetch transactions: 0 Instruction fetch misses: 0

Data write transactions: 176106 Total write stall time: 4687600 Total write hit stall time: 4687600 Data write remote hits: 0 Data write misses: 0 Data write hit ratio: 100.00%

Copy back transactions: 0

Number of replacments in the middle (NAHALAL): 557

Results

1. Improvement of 54% in average stall time per transaction.

2. Improvement of 61% in average stall time per transaction.

3. 8.375% from the transactions cause a replacement in the middle without LSU, and with LSU only 0.09% ! Improvement of ∆=8.28%

4. 8.75% from the transactions cause a replacement in the middle without LSU, and with LSU only 0.02% ! Improvement of ∆=8.73%

1 2

3 4

ConclusionsLSU policy significantly improve average stall time per transaction, Therefore :

LSU Policy implemented in NAHALAL architecture significantly reduce number of cycles for a benchmark.

LSU policy significantly reduce number of replacements in the middle, Therefore :

LSU Policy implemented in NAHALAL architecture, better keep the hot shared lines in the public bank.

According to our implementation, LRU is activated if LSU did not find a line, Therefore :

LSU Policy as we implemented is always preferable then LRU.

By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz

Documents