Top Banner
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin Kwon¹, Young-ri Choi², and Jaehyuk Huh¹ ¹ KAIST(Korea Advanced Institute of Science and Technology) ² KISTI(Korea Institute of Science and Technology Information)
36

KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Apr 01, 2015

Download

Documents

Marissa Luxford
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

KAISTComputer Architecture Lab.

The Effect of Multi-core on HPC Applica-tions in Virtualized Systems

Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin Kwon¹, Young-ri Choi², and Jaehyuk Huh¹

¹ KAIST(Korea Advanced Institute of Science and Technology)

² KISTI(Korea Institute of Science and Technology Information)

Page 2: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Outline

• Virtualization for HPC

• Virtualization on Multi-core

• Virtualization for HPC on Multi-core

• Methodology

• PARSEC – shared memory model

• NPB – MPI model

• Conclusion

2

Page 3: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Outline

• Virtualization for HPC

• Virtualization on Multi-core

• Virtualization for HPC on Multi-core

• Methodology

• PARSEC – shared memory model

• NPB – MPI model

• Conclusion

3

Page 4: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Benefits of Virtualization

4

Hardware

Virtual Machine Monitor

VM VM VM

• Improve system utilization by consolidation

Page 5: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Benefits of Virtualization

5

Hardware

Virtual Machine Monitor

VMWin-dows

VM

Linux

VM

Solaris

• Improve system utilization by consolidation• Support for multiple types of OSes on a system

Page 6: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Benefits of Virtualization

6

Hardware

Virtual Machine Monitor

VMWin-dows

VM

Linux

VM

Solaris

• Improve system utilization by consolidation• Support for multiple types of OSes on a system• Fault isolation

Page 7: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Benefits of Virtualization

7

Hardware

Virtual Machine Monitor

VMWin-dows

VM

Linux

VM

Solaris

Hardware

Virtual Machine Monitor

• Improve system utilization by consolidation• Support for multiple types of OSes on a system• Fault isolation• Flexible resource management

Page 8: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Benefits of Virtualization

8

• Improve system utilization by consolidation• Support for multiple types of OSes on a system• Fault isolation• Flexible resource management

Hardware

Virtual Machine Monitor

VMWin-dows

VM

Linux

VM

Solaris

Hardware

Virtual Machine Monitor

Page 9: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Benefits of Virtualization

9

• Improve system utilization by consolidation• Support for multiple types of OSes on a system• Fault isolation• Flexible resource management• Cloud computing

VMWin-dows

VM

Linux

VM

Solaris Cloud

Hardware

Virtual Machine Monitor

Page 10: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Virtualization for HPC

• Benefits of virtualization

– Improve system utilization by consolidation

– Support for multiple types of OSes on a system

– Fault isolation

– Flexible resource management

– Cloud computing

• HPC is performance-sensitive

• Virtualization can help HPC workloads

10

resource-sensitive

Page 11: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Outline

• Virtualization for HPC

• Virtualization on Multi-core

• Virtualization for HPC on Multi-core

• Methodology

• PARSEC – shared memory model

• NPB – MPI model

• Conclusion

11

Page 12: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Virtualization on Multi-core

12

core

• More VMs on a physical machine• More complex memory hierarchy (NUCA, NUMA)

VM

VM

core

VM

VM

core

VM

VM

core

VM

VM

core

VM

VM

core

VM

VM

Shared cache Shared cache

Memory Memory

core

VM

VM

core

VM

VM

Page 13: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Challenges

• VM management cost • Semantic gaps– vCPU scheduling, NUMA

13

Virtual Machine Monitor

VM

VM

VM

VM

VM

VM

VM

VM

Scheduling, Mem-ory, Communica-

tion,I/O multiplexing…

Mem

Mem

core

core

core

core

core

core

core

core

Virtual Machine Monitor

core

core

core

core

OS

Memory

$ $

Page 14: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Outline

• Virtualization for HPC

• Virtualization on Multi-core

• Virtualization for HPC on Multi-core

• Methodology

• PARSEC – shared memory model

• NPB – MPI model

• Conclusion

14

Page 15: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Virtualization for HPC on Multi-core

• Virtualization may help HPC• Virtualization on multi-core may have some overheads• For servers, improving system utilization is a key factor• For HPC, performance is a key factor.

15

How much overheads are there?

Where do they come from?

Page 16: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Outline

• Virtualization for HPC

• Virtualization on Multi-core

• Virtualization for HPC on Multi-core

• Methodology

• PARSEC – shared memory model

• NPB – MPI model

• Conclusion

16

Page 17: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Machines

• Single Socket System– 12-cores AMD processor– Uniform memory access la-

tency– Two 6MB L3 caches shared

by 6 cores

• Dual Socket System – 2x 4-core Intel processor– Non-uniform memory ac-

cess latency– Two 8MB L3 caches shared

by 4 cores

17

P

L2

P

L2

L3

P

L2

P

L2P

L2

P

L2

P

L2

P

L2

L3

P

L2

P

L2P

L2

P

L2

Single socket: 12-core CPU

Memory

P

L2

P

L2

P

L2

P

L2

L3

P

L2

P

L2

P

L2

P

L2

L3

Dual socket: 2x 4-core CPUs

Page 18: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Workloads

• PARSEC– Shared memory model– Input: native– On one machine

• Single and Dual socket

– Fix: One VM– Vary: 1, 4, 8 vCPUs

• NAS Parallel Benchmark– MPI model– Input: class C– On two machines (dual socket)

• 1Gb Ethernet switch

– Fix: 16 vCPUs– Vary: 2 ~ 16 VMs

18

Mem

Mem

core

core

core

core

core

core

core

core

Virtual Machine Monitor

core

core

core

core

OS

Memory

$ $

Virtual Machine Moni-tor

VM

VM

VM

VM

VM

VM

VM

VM

Hardware

Virtual Machine Moni-tor

VM

VM

VM

VM

VM

VM

VM

VM

Hardware

Semantic gaps VM management cost

Page 19: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Outline

• Virtualization for HPC

• Virtualization on Multi-core

• Virtualization for HPC on Multi-core

• Methodology

• PARSEC – shared memory model

• NPB – MPI model

• Conclusion

19

Page 20: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

PARSEC – Single Socket

• Single socket• No NUMA effect• Very low virtualization overheads

20

blacksc

holes

cannea

lfer

ret

fluid

anim

ate

freqm

ine

strea

mclu

ster

swap

tions

x264AVG

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.81 vCPU4 vCPUs8 vCPUs

2~4 %

Execution times normalized to native runs

Page 21: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

PARSEC – Single Socket

• Single socket + pin vCPU to each pCPU• Reduce semantic gaps by prevent vCPU migration• vCPU migration has negligible effect

21

blacksc

holes

cannea

lfer

ret

fluid

anim

ate

freqm

ine

strea

mclu

ster

swap

tions

x264AVG

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.81 vCPU4 vCPUs8 vCPUs

Execution times normalized to native runs

Similar to un-pinned

Page 22: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

PARSEC – Dual Socket

• Dual socket, unpinned vCPUs• NUMA effect semantic gap• Significant increase of overheads

22

blacksc

holes

cannea

lfer

ret

fluid

anim

ate

freqm

ine

strea

mclu

ster

swap

tions

x264AVG

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8 1 vCPU4 vCPUs8 vCPUs

16~37 %

Execution times normalized to native runs

Page 23: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

PARSEC – Dual Socket

• Dual socket, pinned vCPUs• May reduce NUMA effect also• Reduced overheads with 1 and 4 vCPUs

23

blacksc

holes

cannea

lfer

ret

fluid

anim

ate

freqm

ine

strea

mclu

ster

swap

tions

x264AVG

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.81 vCPU4 vCPUs8 vCPUs

Execution times normalized to native runs

Page 24: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

XEN and NUMA machine

• Memory allocation policy– Allocate up to 4GB chunk on

one socket

• Scheduling policy– Pinning to allocated socket– Nothing more

• Pinning 1 ~ 4 vCPUs on the socket mem. allocated is possible

• Impossible with 8 vCPUs

24

Mem

core

core

core

core

core

core

core

core

$ $

Mem

VM

0VM

1

VM

2VM

3

Page 25: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Mitigating NUMA Effects

• Range pinning

– Pin vCPUs of a VM on a socket

– Work only if # of vCPUs < # of cores on a socket

– Range-pinned (best): memory of VM in the same socket

– Range-pinned (worst): memory of VM in the other socket

• NUMA-first scheduler

– If there is an idle core in the socket memory allocated, pick it

– If not, anyway, pick a core in the machine

– All vCPUs are not active all the time (sync. or I/O)

25

Page 26: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Range Pinning

• For 4 vCPUs case• Range-pinned(best) ≈ Pinned

26

blacksc

holes

cannea

lfer

ret

fluid

anim

ate

freqm

ine

strea

mclu

ster

swap

tions

x264AVG

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Unpinned

Range-pinned (worst)

Range-pinned (best)

Pinned

Execution times normalized to native runs

Page 27: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

NUMA-first Scheduler

• For 8 vCPUs case• Significant improvement by NUMA-first scheduler

27

blacksc

holes

cannea

lfer

ret

fluid

anim

ate

freqm

ine

strea

mclu

ster

swap

tions

x264AVG

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8Unpinned

Pinned

NUMA-first

Execution times normalized to native runs

Page 28: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Outline

• Virtualization for HPC

• Virtualization on Multi-core

• Virtualization for HPC on Multi-core

• Methodology

• PARSEC – shared memory model

• NPB – MPI model

• Conclusion

28

Page 29: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

VM Granularity for MPI model

• Fine-grained VMs– Few processes in a VM– Small VM: vCPUs, memory– Fault isolation among pro-

cesses in different VMs– Many VMs on a machine– MPI communications

mostly through the VMM

• Coarse-grained VMs– Many processes in a VM– Large VM: vCPUs, memory– Single failure point for pro-

cesses in a VM– Few VMs on a machine– MPI communications

mostly within a VM

29

VMM

Hardware

VMM

Hardware

VMM

Hardware

VMM

Hardware

Page 30: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

NPB - VM Granularity• Work to do are same for all granularity• 2 VMs: each VM has 8 vCPUs, 8 MPI processes• 16 VMs: each VM has 1 vCPU, 1 MPI processes

30

BT CG EP FT IS LU MG SP AVG0

0.5

1

1.5

2

2.5

3 2 VMs4 VMs8 VMs16 VMs

Execution times normalized to native runs

11~54 %

Page 31: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

NPB - VM Granularity

• Fine-grained VMs significant overheads (avg. 54%)

– MPI communications mostly through VMM

• Worst in CG with high communication ratio

– Small memory per VM

– VM management costs of VMM

• Coarse-grained VMs much less overheads (avg. 11%)

– Still dual socket, but less overheads than shared memory model

the bottle neck is moved to communication

– MPI communication largely within VM

– Large memory per VM

31

Page 32: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Outline

• Virtualization for HPC

• Virtualization on Multi-core

• Virtualization for HPC on Multi-core

• Methodology

• PARSEC – shared memory model

• NPB – MPI model

• Conclusion

32

Page 33: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Conclusion

• Questions on virtualization for HPC on multi-core system– How much overheads are there?– Where do they come from?

• For shared memory model– Without NUMA little overheads– With NUMA large overheads from semantic gaps

• For MPI model– Less NUMA effect communication is important– Fine-grained VMs have large overheads

• Communication mostly through VMM• Small memory / VM management cost

• Future Works– NUMA-aware VMM scheduler– Optimize communication among VMs in a machine

33

Page 34: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

34

Thank you!

Page 35: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

35

Backup slides

Page 36: KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

PARSEC CPU Usage

• Environments: native linux, turn on only 8 cores (use 8 threads mode)

• Get CPU usage every seconds, then average them

• For all workloads, less than 800% (fully parallel) NUMA-first can work

36

blackscholes canneal ferret fluidanimate freqmine streamcluster swaptions x264 Avg.0.00%

100.00%

200.00%

300.00%

400.00%

500.00%

600.00%

700.00%

800.00%