Top Banner
Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly, David Wolinsky, J. Rhett Aultman, P. Oscar Boykin, ACIS Lab, University of Florida http://wow.acis.ufl.edu
29

Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Jan 13, 2016

Download

Documents

Garey Cole
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory

Self-configuring Condor Virtual Machine Appliances for Ad-Hoc

Grids

Renato FigueiredoArijit Ganguly, David Wolinsky, J. Rhett Aultman, P.

Oscar Boykin,

ACIS Lab, University of Floridahttp://wow.acis.ufl.edu

Page 2: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 2

Outline

Motivations Background Condor Virtual Appliance: features On-going and future work

Page 3: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 3

Motivations Goal: plug-and-play deployment of Condor grids

• High-throughput computing; LAN and WAN

• Collaboration: file systems, messaging, ..

Synergistic approach: VM + virtual network + Condor

• “WOWs” are wide-area NOWs, where:• Nodes are virtual machines

• Network is virtual: IP-over-P2P (IPOP) overlay

• VMs provide:• Sandboxing; software packaging; decoupling

• Virtual network provides:• Virtual private LAN over WAN; self-configuring and

capable of firewall/NAT traversal

• Condor provides:• Match-making, reliable scheduling, … unmodified

Page 4: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 4

1. Prime base VM image with O/S, Condor, Virtual network; publish (Web/Torrent)

Condor WOWs - outlook

2. Download image; boot usingfree VM monitor (e.g. VMwarePlayer or Server)

4. Download base and customVM images; boot up

3. Create virtual IP namespacefor pool: MyGrid:10.0.0.0/255.0.0.0Prime custom image with virtualnamespace, desired toolsBootstrap manager(s)

10.0.0.1

5. VMs obtain IP addresses from MyGridVirtual DHCP server, join virtual IP network,discover available manager(s), and join pool

10.0.0.2

10.0.0.310.0.0.4

10.0.0.1

5b. VMs obtain IP addresses from OtherGridVirtual DHCP server, join virtual IP network,discover available manager(s), and join pool

10.0.0.2

10.0.0.310.0.0.4

Page 5: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 5

Condor WOW snapshot

Zurich

Gainesville

Long Beach

Page 6: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 6

Roadmap

The basics:1.1 VMs and appliances

1.2 IPOP: IP-over-P2P virtual network

1.3 Grid Appliance and Condor

The details:2.1 Customization, updates

2.2 User interface

2.3 Security

2.4 Performance

Usage experience

Page 7: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 7

1.1: VMs and appliances

System VMs:• VMware, KVM, Xen

Homogenous system Sandboxing Co-exist with

unmodified hosts Virtual appliances:

• Hardware/software configuration packaged in easy to deploy VM images

• Only dependences: ISA (x86), VMM

Page 8: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 8

1.2: IPOP virtual networking

Key technique: IP-over-P2P tunneling• Interconnect VM appliances

• WAN VMs perceive a virtual LAN environment

IPOP is self-configuring• Avoid administrative overhead of VPNs

• NAT and firewall traversal

IPOP is scalable and robust• P2P routing deals with node joins and leaves

IPOP networks are isolated• One or more private IP address spaces

• Decentralized DHCP serves addresses for each space

Page 9: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 9

1.2: IPOP virtual networking

AppAppIPOP

Node B

eth0(139.70.24.100)

IPOPNode A

eth0(128.227.136.244)

A

B

tap0(10.0.0.3)

tap0(10.0.0.2)

P2P Overlay

Structured overlay network topology• Bootstrap 1-hop IP tunnels on demand

• Discover NAT mappings; decentralized hole punching

• VM keeps IPOP address even if it migrates on WAN

• [Ganguly et al, IPDPS 2006, HPDC 2006]

Page 10: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 10

1.3 Grid appliance and Condor

Base: Debian Linux; Condor; IPOP• Works on x86 Linux/Windows/MacOS;

VMware, KVM/QEMU

• 157MB zipped

Uses NAT and host-only NICs• No need to get IP address on host network

Managed negotiator/collector VMs Easy to deploy schedd/startd VMs

• Flocking is easy – virtual network is a LAN

Page 11: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 11

2.1: Customization and updates

VM image: Virtual Disks• Portable medium for data

• Growable after distribution

Disks are logically stacked• Leverage UnionFS file system

• Three stacks:• Base – O/S, Condor, IPOP

• Module – site specific configuration (e.g. nanoHUB)

• Home – user persistent data

Major updates: replace base/module• Minor updates: automatic, apt-based

Page 12: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 12

2.2: User interface (Windows host)VM console: X11 GUIHost-mounted loop-back Samba folder

LoopbackSSH

Page 13: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 13

2.2: User interface (Mac host)VM console: X11 GUIHost-mounted loop-back Samba folder

LoopbackSSH

Page 14: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 14

2.2: User interface (Linux host)VM console: X11 GUIHost-mounted loop-back Samba folder

LoopbackSSH

Page 15: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 15

2.3 Security

Appliance firewall• eth0: block all outgoing Internet packets

• Except DHCP, DNS, IPOP’s UDP port• Only traffic within WOW allowed

• eth1 (host-only): allow ssh, Samba IPsec

• X.509 host certificates• Authentication and end-to-end encryption

• VM joins WOW only with signed certificate bound to its virtual IP

• Private net/netmask: ~10 lines of IPsec configuration for an entire class A network!

Page 16: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 16

2.4: Performance

User-level C# IPOP implementation (UDP):• Link bandwidth: 25-30Mbit/s

• Latency overhead: ~4ms Connection times:

• ~5-10s to join P2P ring and obtain DHCP address

• ~10s to create shortcuts, UDP hole-punching

79.92

89.35

80.18

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

Tim

e

Physical VMWare Xen

SimpleScalar 3.0 (cycle-accurate CPU simulator)

Page 17: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 17

Experiences

Bootstrap WOW with VMs at UF and partners• Currently ~300 VMs, IPOP overlay routers (Planetlab)

• Exercised with 10,000s of Condor jobs from real users

• nanoHUB: 3-week long, 9,000-job batch (BioMoca) submitted via a Condor-G gateway

• P2Psim, CH3D, SimpleScalar

Pursuing interactions with users and the Condor community for broader dissemination

Page 18: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 18

Time scales and expertise

Development of baseline VM image:• VM/Condor/IPOP expertise; weeks/months

Development of custom module:• Domain-specific expertise; hours/days/weeks

Deployment of VM appliance:• No previous experience with VMs or Condor

• 15-30 minutes to download and install VMM

• 15-30 minutes to download and unzip appliance

• 15-30 minutes to boot appliance, automatically connect to a Condor pool, run condor_status and a demo condor_submit job

Page 19: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 19

On-going and future work

Enhancing self-organization at the Condor level:

• Structured P2P for manager publish/discovery• Distributed hash table (DHT); primary and flocking

• Condor integration via configuration files, DHT scripts

• Unstructured P2P for matchmaking• Publish/replicate/cache classads on P2P overlay

• Support for arbitrary queries

• Condor integration: proxies for collector/negotiator

Decentralized storage, cooperative caching• Virtual file systems (NFS proxies)

• Distribution of updates, read-only code repositories

• Caching and COW for diskless, net-boot appliances

Page 20: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 20

Acknowledgments National Science Foundation NMI, CI-TEAM SURA SCOOP (Coastal Ocean Observing and Prediction)

http://wow.acis.ufl.eduPublications, Brunet/IPOP code (GPL’ed C#), Condor Grid appliance

Page 21: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 21

Questions?

Page 22: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 22

Self-organizing NAT traversal, shortcuts

Node A Node B

CTM request: connect to me at my NAT IP:port

Sends CTM request

- A starts exchanging IP packets with B - Traffic inspection triggers request to create shortcut- Connect-to-me (CTM)- “A” tells “B” its known address(es): - “A” had learned NATed public IP/port when it joined overlay

Page 23: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 23

- “B” sends CTM reply – routed through overlay - “B” tells “A” its address(es)- “B” initiates linking protocol by attempting to connect to “A” directly

Node A Node B

CTM reply through overlay: send NAT (IP:port)B

Self-organizing NAT traversal, shortcuts

Link request: NAT endpoint (IP:port)A

Page 24: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 24

- B’s linking protocol message to A pokes hole on B’s NAT- A’s linking protocol message to B pokes hole on A’s NATCTM protocol establishes direct shortcut

A Gets CTM reply; initiates linking

Node A Node B

Self-organizing NAT traversal, shortcuts

Page 25: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 25

Performance considerations

CPU-intensive application, Condor• SimpleScalar 3.0d execution-driven computer

architecture simulator

79.92

89.35

80.18

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

Tim

e

Physical VMWare Xen

Page 26: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 26

Performance considerations

I/O: PostMark• Version 1.51

• Parameters:• Minimum file

size: 500 bytes

• Maximum file

size: 4.77 MB

• Transactions:

5,000

9.93

11.94

4.47

5.38

3.56

4.28

0

2

4

6

8

10

12

MB

s

Host Vmware Xen

Read

Write

Page 27: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 27

Performance considerations

User-level C# IPOP implementation (UDP):• Link bandwidth: 25-30Mbit/s (LAN)

• Latency overhead: ~4ms Connection times:

• (Fine-tuning has reduced mean acquire time to ~ 6-10s, with degree of redundancy n=8)

Page 28: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 28

Condor Appliance on a desktop

Linux,Condor,

IPOP

Domain-specific

tools

User files

SwapVM Hardware configuration

Page 29: Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,

Advanced Computing and Information Systems laboratory 29

Related Work

Virtual Networking• VIOLIN

• VNET; topology adaptation

• ViNe Internet Indirection Infrastructure (i3)

• Support for mobility, multicast, anycast

• Decouples packet sending from receiving

• Based on Chord p2p protocol IPv6 tunneling

• IPv6 over UDP (Teredo protocol)

• IPv6 over P2P (P6P)