Top Banner
“NAHALAL : Cache Organization for Chip Multiprocessors” New LSU Policy By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz
10

By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz

Jan 02, 2016

Download

Documents

abel-hammond

“ NAHALAL : Cache Organization for Chip Multiprocessors ” New LSU Policy. By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz. NAHALAL ARCHTECTURE NAHALAL architecture defines the memory cache banks of the L2 cache. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz

“NAHALAL : Cache Organization for Chip Multiprocessors”

New LSU Policy

By : Ido Shayevitz and Yoav Shargil

Supervisor: Zvika Guz

Page 2: By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz

NAHALAL ARCHTECTURENAHALAL architecture defines the memory cache banks of the L2 cache.

Each processor has a private backyard bank and all processors shared a small bank.

The architecture is based on the hot shared line phenomenon.

Page 3: By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz

LSU Improvement Placement Policy Replacement Policy from Private Bank : LRU Replacement Policy from Public Bank :

NAHALAL

LRUX LSU

LSU policy wisely select the Least

Shared Used line to throw from the

public bank.

Page 4: By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz

LSU Implementation Shift-register with N cells for each Line.

Each cell in the shift-register hold CPU num

In throwing by CPUi : For each shift-register do XOR between each cell and the ID of CPUi. The shift-register on which the XOR produce 0, will be the chosen one. If non produce 0 then do regular LRU.

In order ro reduce memory overhead, define N=4. Therefore 2 *4*3 = 0.1875MB 18.75% memory overhead.

14

Simple, short time algorithm in HW

Page 5: By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz

Simulation Structure in SimicsUsing pyhton script we defined :

Page 6: By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz

Writing BenchmarksWriting Benchmarks is done in the simulated target console :

Page 7: By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz

Writing Benchmarks Using Threads with pthread library

Each Thread is associated to a CPU using sched library.

Parallel code is written in the benchmark

Also OS code and pthread code cause to Parallel code.

Each benchmark we run first without LSU and second with LSU.

Page 8: By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz

Collecting StatisticsCache statistics: l2c----------------- Total number of transactions: 610349 Total memory stall time: 31402835 Total memory hit stall time: 28251635

Device data reads (DMA): 0 Device data writes (DMA): 0

Uncacheable data reads: 17 Uncacheable data writes: 30738 Uncacheable instruction fetches: 0

Data read transactions: 403488 Total read stall time: 17488735 Total read hit stall time: 14383135 Data read remote hits: 0 Data read misses: 10352 Data read hit ratio: 97.43% Instruction fetch transactions: 0 Instruction fetch misses: 0

Data write transactions: 176106 Total write stall time: 4687600 Total write hit stall time: 4687600 Data write remote hits: 0 Data write misses: 0 Data write hit ratio: 100.00%

Copy back transactions: 0

Number of replacments in the middle (NAHALAL): 557

Page 9: By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz

Results

1. Improvement of 54% in average stall time per transaction.

2. Improvement of 61% in average stall time per transaction.

3. 8.375% from the transactions cause a replacement in the middle without LSU, and with LSU only 0.09% ! Improvement of ∆=8.28%

4. 8.75% from the transactions cause a replacement in the middle without LSU, and with LSU only 0.02% ! Improvement of ∆=8.73%

1 2

3 4

Page 10: By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz

ConclusionsLSU policy significantly improve average stall time per transaction, Therefore :

LSU Policy implemented in NAHALAL architecture significantly reduce number of cycles for a benchmark.

LSU policy significantly reduce number of replacements in the middle, Therefore :

LSU Policy implemented in NAHALAL architecture, better keep the hot shared lines in the public bank.

According to our implementation, LRU is activated if LSU did not find a line, Therefore :

LSU Policy as we implemented is always preferable then LRU.