Top Banner
Nuttapong Chakthranont , Phonlawat Khunphet , Ryousei Takano , Tsutomu Ikegami King Mongkut’s University of Technology North Bangkok, Thailand National Institute of Advanced Industrial Science and Technology, Japan IEEE CloudCom 2014@Singapore, 18 Dec. 2014 Exploring the Performance Impact of Virtualization on an HPC Cloud
21
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Exploring the Performance Impact of Virtualization on an HPC Cloud

Nuttapong Chakthranont†, Phonlawat Khunphet†, !Ryousei Takano‡, Tsutomu Ikegami‡

†King Mongkut’s University of Technology North Bangkok, Thailand!‡National Institute of Advanced Industrial Science and Technology, Japan

IEEE CloudCom 2014@Singapore, 18 Dec. 2014

Exploring the Performance Impact of Virtualization !

on an HPC Cloud�

Page 2: Exploring the Performance Impact of Virtualization on an HPC Cloud

Outline �•  HPC Cloud•  AIST Super Green Cloud (ASGC)•  Experiment

•  Conclusion

2

Page 3: Exploring the Performance Impact of Virtualization on an HPC Cloud

HPC Cloud �•  HPC users begin to take an interest in the Cloud.

–  e.g., CycleCloud, Penguin on Demand

•  Virtualization is a key technology.–  Pro: a customized software environment, elasticity, etc

–  Con: a large overhead, spoiling I/O performance.•  VMM-bypass I/O technologies, e.g., PCI passthrough and !

SR-IOV, can significantly mitigate the overhead.

Page 4: Exploring the Performance Impact of Virtualization on an HPC Cloud

Motivating Observation �•  Performance evaluation of HPC cloud

–  (Para-)virtualized I/O incurs a large overhead.–  PCI passthrough significantly mitigate the overhead.

0

50

100

150

200

250

300

BT CG EP FT LU

Exe

cutio

n tim

e [s

econ

ds]�

BMM (IB) BMM (10GbE) KVM (IB) KVM (virtio)

The overhead of I/O virtualization on the NAS Parallel Benchmarks 3.3.1 class C, 64 processes.�

BMM: Bare Metal Machine�

KVM (virtio)�VM1�

10GbE NIC�

VMM�

Guestdriver�

Physical driver�

Guest OS�

KVM (IB)VM1�

IB QDR HCA�

VMM�

Physicaldriver�

Guest OS�

Bypass

Improvement by !PCI passthrough

Page 5: Exploring the Performance Impact of Virtualization on an HPC Cloud

HPC Cloud �•  HPC users begin to take an interest in the Cloud.

–  e.g., CycleCloud, Penguin on Demand

•  Virtualization is a key technology.–  Pro: a customized software environment, elasticity, etc

–  Con: a large overhead, spoiling I/O performance.•  VMM-bypass I/O technologies, e.g., PCI passthrough and !

SR-IOV, can significantly mitigate the overhead.

We quantitatively assess the performance impact of virtualization !to demonstrate the feasibility of HPC Cloud.

Page 6: Exploring the Performance Impact of Virtualization on an HPC Cloud

Outline �•  HPC Cloud•  AIST Super Green Cloud (ASGC)•  Experiment

•  Conclusion

6

Page 7: Exploring the Performance Impact of Virtualization on an HPC Cloud

Easy to Use Supercomputer !- Usage Model of AIST Cloud - �

7

Allow users to customize their virtual clusters

Web appsBigDataHPC

1. Choose a template image of! a virtual machine �

2. Add required software! package �

VM template image files

HPC

+Ease of use �

Virtual cluster

deploy �take snapshots �

Launch a virtual machine when necessary

3. Save a user-customized !template image �

Page 8: Exploring the Performance Impact of Virtualization on an HPC Cloud

8

Login node

Image!repository

sgc-tools

Create a virtual cluster

Easy to Use Supercomputer !- Elastic Virtual Cluster - �

Frontend node (VM)

cmp. !node

cmp. !node

cmp. !node

NFSdJob scheduler

Virtual Cluster

InfiniBand

Scale in/!scale out

Submit a job

Note: a single VM runs on a node.

Page 9: Exploring the Performance Impact of Virtualization on an HPC Cloud

ASGC Hardware Spec.�

9

Compute Node�CPU Intel Xeon E5-2680v2/2.8GHz !

(10 core) x 2CPU

Memory 128 GB DDR3-1866

InfiniBand Mellanox ConnectX-3 (FDR)

Ethernet Intel X520-DA2 (10 GbE)

Disk Intel SSD DC S3500 600 GB

•  155 node-cluster consists of Cray H2312 blade server•  The theoretical peak performance is 69.44 TFLOPS•  The operation started from July, 2014 �

Page 10: Exploring the Performance Impact of Virtualization on an HPC Cloud

ASGC Software Stack �Management Stack

–  CentOS 6.5 (QEMU/KVM 0.12.1.2)

–  Apache CloudStack 4.3 + our extensions•  PCI passthrough/SR-IOV support (KVM only)

•  sgc-tools: Virtual cluster construction utility

–  RADOS cluster storage

HPC Stack (Virtual Cluster)–  Intel Compiler/Math Kernel Library SP1 1.1.106

–  Open MPI 1.6.5–  Mellanox OFED 2.1

–  Torque job scheduler

10

Page 11: Exploring the Performance Impact of Virtualization on an HPC Cloud

Outline �•  HPC Cloud•  AIST Super Green Cloud (ASGC)•  Experiment

•  Conclusion

11

Page 12: Exploring the Performance Impact of Virtualization on an HPC Cloud

Benchmark Programs�Micro benchmark

–  Intel Micro Benchmark (IMB) version 3.2.4

Application-level benchmark–  HPC Challenge (HPCC) version 1.4.3

•  G-HPL

•  EP-STREAM

•  G-RandomAccess

•  G-FFT

–  OpenMX version 3.7.4

–  Graph 500 version 2.1.4

 

Page 13: Exploring the Performance Impact of Virtualization on an HPC Cloud

MPI Point-to-point communication�

13

0.1$

1$

10$

1$ 1024$

Throughp

ut)(G

B/s)�

Message)Size)(KB)�

�Physical$Cluster$�Virtual$Cluster$

5.85GB/s5.69GB/s

The overhead is less than 3% with large message,though it is up to 25% with small message.

IMB

Page 14: Exploring the Performance Impact of Virtualization on an HPC Cloud

MPI Collectives (64bytes)�

 

0

1000

2000

3000

4000

5000

0 32 64 96 128Exec

utio

n T

ime

(use

c)�

Number of Nodes �

Physical ClusterVirtual Cluster

0

200

400

600

800

1,000

1,200

0 32 64 96 128

Exec

utio

n T

ime

(use

c)�

Number of Nodes �

Physical ClusterVirtual Cluster

0

2000

4000

6000

0 32 64 96 128Exec

utio

n T

ime

(use

c)�

Number of Nodes �

Physical ClusterVirtual Cluster

Allgather � Allreduce �

Alltoall �

IMB

The overhead becomes significant as the number of nodes increases.

… load imbalance?

+77% +88%

+43%

Page 15: Exploring the Performance Impact of Virtualization on an HPC Cloud

G-HPL (LINPACK)�

15

0

10

20

30

40

50

60

0 32 64 96 128

Perf

orm

ance

(TFL

OP

S)�

Number of Nodes �

Physical ClusterVirtual Cluster

Performance degradation: ! 5.4 - 6.6%

Efficiency* on 128 nodesPhysical: 90%Virtual: 84%

*) Rmax / Rpeak

HPCC

Page 16: Exploring the Performance Impact of Virtualization on an HPC Cloud

EP-STREAM and G-FFT �

 

0

2

4

6

0 32 64 96 128

Perf

orm

ance

(GB

/s)�

Number of Nodes �

Physical Cluster

Virtual Cluster0

40

80

120

160

0 32 64 96 128

Perf

orm

ance

(GFL

OP

S)�

Number of Nodes �

Physical Cluster

Virtual Cluster

EP-STREAM G-FFT

HPCC

The overheads are ignorable.

memory intensivewith no communication

all-to-all communication!with large messages

Page 17: Exploring the Performance Impact of Virtualization on an HPC Cloud

Graph500 (replicated-csc, scale 26) �

17

1.00E+07

1.00E+08

1.00E+09

1.00E+10

0 16 32 48 64

Perf

orm

ance

(TEP

S)

Number of Nodes

Physical ClusterVirtual Cluster

Graph500

Performance degradation: !2% (64node)

Graph500 is a Hybrid parallel program (MPI + OpenMP).We used a combination of 2 MPI processes and 10 OpenMP threads.

Page 18: Exploring the Performance Impact of Virtualization on an HPC Cloud

Findings�•  PCI passthrough is effective in improving the I/O

performance, however, it is still unable to achieve the low communication latency of a physical cluster due to a virtual interrupt injection.

•  VCPU pinning improves the performance for HPC applications.

•  Almost all MPI collectives suffer from the scalability issue.

•  The overhead of virtualization has less impact on actual applications.

 

Page 19: Exploring the Performance Impact of Virtualization on an HPC Cloud

Outline �•  HPC Cloud•  AIST Super Green Cloud (ASGC)•  Experiment

•  Conclusion

19

Page 20: Exploring the Performance Impact of Virtualization on an HPC Cloud

Conclusion and Future work �•  HPC Cloud is promising.

–  Micro benchmark: MPI collectives have the scalability issue.

–  Application-level benchmarks: the negative impact is limited and the virtualization overhead is about 5%.

–  Our HPC Cloud operation started from July, 2014.

•  Virtualization can contribute to system utilization improvements.–  SR-IOV

–  VM placement optimization based on the workloads of virtual clusters

20

Page 21: Exploring the Performance Impact of Virtualization on an HPC Cloud

Question? �

Thank you for your attention!

21

Acknowledgments:The authors would like to thank Assoc. Prof. Vara Varavithya, KMUTNB, and Dr. Yoshio Tanaka, AIST, for valuable guidance and advice. In addition, the authors would also like to thank the ASGC support team for preparation and trouble shooting of the experiments. This work was partly supported by JSPS KAKENHI Grant Number 24700040. �