Top Banner
Dynamic virtual private clusters with OpenNebula and SGE [email protected] SURFsara
14

OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

Jul 11, 2015

Download

Software

NETWAYS
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

Dynamic virtual private clusters with OpenNebula and

SGE

[email protected]

SURFsara

Page 2: OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

HPC Infrastructures at SURFsara

• Cartesius: 40000 cpu cores (Top-500 #45)

• Lisa: 9000 cpu cores

• Grid: (LHC, Life sciences)

• Hadoop

• Cloud (30 x 32 cores, 256GB)

Page 3: OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

HPC Cloud architecture

800TB

Shared

Storage

(NFS)

2×10Gbps

4

Service

Nodes

Virtualised

Services

30 large

compute

nodes

. . . .

. .

10 small

compute

nodes

Page 4: OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

Who was using our cloud in 2014?

Cell genetics 45%

Linguistics 10%

Medicine 6%

Economy 4%

Marketing 5%

Ecology 4%

Geography 2%

Civil engineering 7%

Physics 5%

Business 3%

Computer sciences 7%

Other 2%

Page 5: OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

Our cluster is old, we want a new cluster... in your cloud!

Page 6: OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

A typical cluster

• Lots of worker nodes

• Central user management (NIS, LDAP)

• Shared home file system

• Local disks for fast I/O

• A job scheduler (Torque, SGE, SLURM)

• Fixed size, bare metal

Page 7: OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

Typical job

Page 8: OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

Queue monitor

• ~400 lines of ruby code

• Runs within an EventMachine mainloop

• Uses qstat to monitor queues

• Uses qconf to add/remove nodes to/from queues

• Uses OCA to start/stop nodes

Page 9: OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

Adding nodes...

Start VM

Inspect queue

jobs waiting?

machine

started?

tell scheduler wait...

wait...

qstat...

ruby OCA

yes

no yes

no

ssh...

qconf

Page 10: OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

Removing nodes...

tell scheduler

Inspect queue

nodes idle?

wait...

shutdown VM

yes

no ruby OCA

qconf

qstat...

Page 11: OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

Does it work?

• Yes, in principle...

• But...

Page 12: OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

Future of our cloud

• OpenNebula 4.x (January 2015)

• More compute nodes (February 2015)

• Ceph storage (February 2015)

• Local SSDs (February 2015)

• GPUs

Page 13: OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

Conclusions

• Integration with OCA/XML-RPC is possible and flexible

• Know your users and what they want (cattle? pets?)

Page 14: OpenNebula Conf 2014 | Dynamic virtual private clusters with OpenNebula and SGE by Lykle Voort

?