Top Banner
Dynamic virtual private clusters with OpenNebula and SGE [email protected] SURFsara
14

OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SGE - Lykle Voort

Jul 14, 2015

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SGE - Lykle Voort

Dynamic virtual private clusters with OpenNebula and

SGE

[email protected]

SURFsara

Page 2: OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SGE - Lykle Voort

HPC Infrastructures at SURFsara

• Cartesius: 40000 cpu cores (Top-500 #45)

• Lisa: 9000 cpu cores

• Grid: (LHC, Life sciences)

• Hadoop

• Cloud (30 x 32 cores, 256GB)

Page 3: OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SGE - Lykle Voort

HPC Cloud architecture

800TB

Shared

Storage

(NFS)

2×10Gbps

4

Service

Nodes

Virtualised

Services

30 large

compute

nodes

. . . .

. .

10 small

compute

nodes

Page 4: OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SGE - Lykle Voort

Who was using our cloud in 2014?

Cell genetics 45%

Linguistics 10%

Medicine 6%

Economy 4%

Marketing 5%

Ecology 4%

Geography 2%

Civil engineering 7%

Physics 5%

Business 3%

Computer sciences 7%

Other 2%

Page 5: OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SGE - Lykle Voort

Our cluster is old, we want a new cluster... in your cloud!

Page 6: OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SGE - Lykle Voort

A typical cluster

• Lots of worker nodes

• Central user management (NIS, LDAP)

• Shared home file system

• Local disks for fast I/O

• A job scheduler (Torque, SGE, SLURM)

• Fixed size, bare metal

Page 7: OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SGE - Lykle Voort

Typical job

Page 8: OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SGE - Lykle Voort

Queue monitor

• ~400 lines of ruby code

• Runs within an EventMachine mainloop

• Uses qstat to monitor queues

• Uses qconf to add/remove nodes to/from queues

• Uses OCA to start/stop nodes

Page 9: OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SGE - Lykle Voort

Adding nodes...

Start VM

Inspect queue

jobs waiting?

machine

started?

tell scheduler wait...

wait...

qstat...

ruby OCA

yes

no yes

no

ssh...

qconf

Page 10: OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SGE - Lykle Voort

Removing nodes...

tell scheduler

Inspect queue

nodes idle?

wait...

shutdown VM

yes

no ruby OCA

qconf

qstat...

Page 11: OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SGE - Lykle Voort

Does it work?

• Yes, in principle...

• But...

Page 12: OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SGE - Lykle Voort

Future of our cloud

• OpenNebula 4.x (January 2015)

• More compute nodes (February 2015)

• Ceph storage (February 2015)

• Local SSDs (February 2015)

• GPUs

Page 13: OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SGE - Lykle Voort

Conclusions

• Integration with OCA/XML-RPC is possible and flexible

• Know your users and what they want (cattle? pets?)

Page 14: OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SGE - Lykle Voort

?