Top Banner
P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing Technology [email protected] PADS 2010, May 18, 2010
20

P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

Dec 14, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

P-GAS: Parallelizing a Many-Core Processor Simulator

Using PDES

Huiwei Lv, Yuan Cheng, Lu Bai,

Mingyu Chen, Dongrui Fan, Ninghui Sun

Institute of Computing Technology

[email protected]

PADS 2010, May 18, 2010

Page 2: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

Motivation• Multi-core platforms

are common now

Courtesy: Sun® UltraSPARC T2

Courtesy: AMD® Opeteron 6000

Courtesy: Intel® Nehalem

• System Simulators still sequential

Page 3: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

Motivation• Multi-core platforms

are common now

courtesy: Sun® UltraSPARC T2

courtesy: AMD® Phenom

courtesy: Intel® Nehalem

• System Simulators still sequential

Multi-core is wasted Multi-core is wasted

Simulation speed is limited by single core performance

Simulation speed is limited by single core performance

Page 4: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

Poor Scalability of Single-threaded Simulator

• Slowdown grow exponentially

• Not able to simulate future many-core systems

1000+ cores

Too slow to simulate future many-coresToo slow to simulate future many-cores

Page 5: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

Goal: fast and accurate computer system simulation

Functional CycleAccuracyAccuracy

Speed(slowdown)

Speed(slowdown)

Speedup 10x without accuracy lostSpeedup 10x without accuracy lost

COTSonCOTSon(HPCA’10)

(SIGOPS Oper. Syst. Rev.’09)

(MICRO’06)

(J. Comput.’09)

Page 6: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

Outline

• Motivation• Implementation

BackgroundFrom DES to PDESOptimization

• Evaluation• Conclusion

Page 7: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

Godson-T Architecture Simulator

• Discrete Event Simulation (DES)

one global event queueevent assigned to sinkersnew event insert back into event queue

• Fine-grained

EVENT A

EVENT B

Page 8: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

SimK: PDES Framework

• Open source• Conservative PDES• Highly optimized

pthreadslock-free user-level thread scheduling

• Modularizeduse SimK API to implement a LP

schedule, execschedule, execschedule, execschedule, exec

commu, sync, buffer, deploycommu, sync, buffer, deploycommu, sync, buffer, deploycommu, sync, buffer, deploy

APIAPIAPIAPI

LP

LP

LP

LP

LP……

core core core core

Host

SimK

LP

Page 9: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

From DES to PDES

• Seperate global queue

• Group sinkers into logical processes(LP), 1 queue/LP

• Event across LPs is wrapped with PDES time

router

core

cache

PDES time wrapper

router

core

cache

LP

LP

Page 10: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

router 1

E.g. Router Event

• before

PDES time wrapper

router 0

core 0

cache 0

router 1

core 1

cache 1

LP 0

LP 1

router 0

core 0

cache 0

core 1

cache 1

• after

Event Queue

Router 0 send a event to router 1

Page 11: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

Events from DES to PDES

• Single-thread multi-threads• Conservative PDES

Simulation Time

Thread 1

Thread 2

Thread 3

Thread 4

1 cycle

event

dependence

Page 12: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

Grouping Into Big LPs

• ProblemAvg. speedup is 1.8x with 16 thread (16 1-core LPs proto.)

• Cause of Problemtoo many LPs + lookahead is extremely small high sync cost

• Solutiongrouping adjacent LPs into one big LP

LP

Page 13: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

Final Parallelized version

• Parallel Discrete Event Simulation

sinkers grouped into big LPsLPs binded to threads using SimK APItime sync between LPs using PDESsched and exec under SimK framework

schedule, execschedule, execschedule, execschedule, exec

commu, sync, buffer, deploycommu, sync, buffer, deploycommu, sync, buffer, deploycommu, sync, buffer, deploy

APIAPIAPIAPI

core core core core Host

SimK

Page 14: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

Outline

• Motivation• Implementation• Evaluation

AccuracySpeedup

• Conclusion

Page 15: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

Evaluation Setup

• GAS v.s. P-GAS• 4 Quad-Core AMD Opteron 8347 SMP

16 cores total, 64GB Memory

• Benchmark: SPLASH-2 kernelcount benchmark computing time in wall-clock time

Page 16: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

Cycle Count Error

• Avg. cycle count error: 0.04%

16

Page 17: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

P-GAS Speedup

• 16 threads, SPLASH-2 Kernel Avg. speedup is 9.8x• best speedup 13.6x(LU,16 threads)• 5.3x super-linear speedup with 4 threads

Avg. 9.8

Max. 13.6

5.3

Page 18: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

Why super-linear speedup?

• More cores, more caches to use• The insert-to-queue time is shorter

18

5.3x super-linear speedup with 4 threads

Page 19: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

Conclusion

• P-GAS use PDES to speedup a cycle-accurate many-core processor simulator

speedup 9.8x on a 16-core SMPcycle error < 0.04%

• Highly optimized conservative PDES could be used in fast and accurate system simulation

multi-core/many-core processor simulationSMP cluster, many-core cluster ...

Page 20: P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES

Please email me the questions:

[email protected] source release of our PDES framework:

http://simk.sf.net