OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

Dynamic virtual private clusters with OpenNebula and

SGE

[email protected]

SURFsara

HPC Infrastructures at SURFsara

• Cartesius: 40000 cpu cores (Top-500 #45)

• Lisa: 9000 cpu cores

• Grid: (LHC, Life sciences)

• Hadoop

• Cloud (30 x 32 cores, 256GB)

HPC Cloud architecture

800TB

Shared

Storage

(NFS)

2×10Gbps

4

Service

Nodes

Virtualised

Services

30 large

compute

nodes

. . . .

. .

10 small

compute

nodes

Who was using our cloud in 2014?

Cell genetics 45%

Linguistics 10%

Medicine 6%

Economy 4%

Marketing 5%

Ecology 4%

Geography 2%

Civil engineering 7%

Physics 5%

Business 3%

Computer sciences 7%

Other 2%

Our cluster is old, we want a new cluster... in your cloud!

A typical cluster

• Lots of worker nodes

• Central user management (NIS, LDAP)

• Shared home file system

• Local disks for fast I/O

• A job scheduler (Torque, SGE, SLURM)

• Fixed size, bare metal

Typical job

Queue monitor

• ~400 lines of ruby code

• Runs within an EventMachine mainloop

• Uses qstat to monitor queues

• Uses qconf to add/remove nodes to/from queues

• Uses OCA to start/stop nodes

Adding nodes...

Start VM

Inspect queue

jobs waiting?

machine

started?

tell scheduler wait...

wait...

qstat...

ruby OCA

yes

no yes

no

ssh...

qconf

Removing nodes...

tell scheduler

Inspect queue

nodes idle?

wait...

shutdown VM

yes

no ruby OCA

qconf

qstat...

Does it work?

• Yes, in principle...

• But...

Future of our cloud

• OpenNebula 4.x (January 2015)

• More compute nodes (February 2015)

• Ceph storage (February 2015)

• Local SSDs (February 2015)

• GPUs

Conclusions

• Integration with OCA/XML-RPC is possible and flexible

• Know your users and what they want (cattle? pets?)

?

OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

Software

startstop nodes

queue nodes idle

cloud opennebula

addremove nodes tofrom

ruby oca qconf qstat

typical cluster

gb hpc cloud architecture

life sciences hadoop