Top Banner
Apr 2012 Remote Condor 1 UCSD HEP Group Trainings Wedding convenience and control with RemoteCondor by Igor Sfiligoi RemoteCondor co-developed with J. Dost UC San Diego
26

Wedding convenience and control with RemoteCondor

Jan 15, 2015

Download

Technology

Igor Sfiligoi

This presentation explains why Condor is not suitable for use on user-owned machines, and why RemoteCondor is the best available solution to the problem.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 1

UCSD HEP Group Trainings

Weddingconvenience and control

withRemoteCondor

by Igor SfiligoiRemoteCondor co-developed with J. Dost

UC San Diego

Page 2: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 2

The Condor Batch System

● Condor is a Workload Management System● i.e. a batch system

● Strong points● Fault tolerant● Robust feature set● Flexible

● Large community base● Both commercial and scientific

http://research.cs.wisc.edu/condor/

Page 3: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 3

Condor Architecture

● Clearly separates● Resource providers

from● Resource consumers

● Each has a daemonprocess to represent it● Startd for resource provides● Schedd for resource consumers

● A central service connects them all● Managed by a Collector/Negotiator pair

Machines (aka worker nodes)CPUs, Memory, IO,...

Job queues (aka submit nodes)Jobs submitted by users

Page 4: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 4

Startd

Condor Architecture

Schedd

Schedd Startd

..

....

CollectorNegotiator

in a picture

Page 5: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 5

The truth about submit nodes

● Corollary● The submit node is a server!

● There is no real “Condor client”● The cmdline tools are just a convenience

to talk to the daemon process

Schedd

condor_submitcondor_q

Submit node

CollectorNegotiator

Startd

Page 6: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 6

Implications

● Being a server has several implications● Security implications

● Will have incoming connectivity● All security configuration on the submit node● Submit node controls user

authentication and authorization

● Unfriendly to non-dedicated hardware● Requires always on operation● Must be on a public&static IP address

Page 7: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 7

Implications

● Being a server has several implications● Security implications

● Will have incoming connectivity● All security configuration on the submit node● Submit node controls user

authentication and authorization

● Unfriendly to non-dedicated hardware● Requires always on operation● Must be on a public&static IP address

High exploit risk

Requires high trustbetween all nodes

in the cluster

Impossible touse on a laptop

Page 8: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 8

Implications

● Being a server has several implications● Security implications

● Will have incoming connectivity● All security configuration on the submit node● Submit node controls user

authentication and authorization

● Unfriendly to non-dedicated hardware● Requires always on operation● Must be on a public&static IP address

High exploit risk

Requires high trustbetween all nodes

in the cluster

Impossible touse on a laptop

Not suitablefor an unmanaged

user machine

Page 9: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 9

What are the alternatives?

● Out of the box, Condor provides● Remote submission● Condor-C

● In the contrib sections, you can find● RemoteCondor

Page 10: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 10

What are the alternatives?

● Out of the box, Condor provides● Remote submission● Condor-C

● In the contrib sections, you can find● RemoteCondor

This presentationargues that this isthe best solution

Page 11: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 11

What are the alternatives?

● Out of the box, Condor provides● Remote submission● Condor-C

● In the contrib sections, you can find● RemoteCondor

This presentationargues that this isthe best solution

So what is wrong with these?

Page 12: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 12

Schedd

Schedd node

Remote submission

● Essentially, connecting to a remote Schedd● condor_submit -remote … + condor_transfer_data

and● condor_q -name ..., condor_rm -name ..., …

● So no daemon processes on the submit node● A true client solution!

Scheddcondor_submit

condor_qcondor_transfer_data

Submit node

CollectorNegotiator

StartdAu

thhttp://research.cs.wisc.edu/condor/manual/v7.6/condor_submit.html

http://research.cs.wisc.edu/condor/manual/v7.6/condor_transfer_data.html

Page 13: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 13

So, what's the problem?

● No local user log file● Must use

condor_qto monitor progress

● Fully Condor-based user authentication● While rich, not what users expect

(e.g. no user/password)

● Hard to tie into campus-wide auth

● Staged input data not shared

● Annoying at best● High monitoring load● And it does not work

with DAGMan

Could be a problem with large datasets

Page 14: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 14

Condor-C

● Based on the Grid paradigm● Submit locally, then delegate to remote Schedd

● Still running a daemon process● But requires no incoming connections

Schedd

Schedd node

Schedd

condor_submitcondor_q

Submit node

CollectorNegotiator

StartdAu

th

● Secure● Laptop

friendly

Schedd

http://research.cs.wisc.edu/condor/manual/v7.6/5_3Grid_Universe.html#sec:Condor-C

Page 15: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 15

What are the drawbacks?

● Awkward syntax● At least compared to Vanilla universe● See the Condor manual for examples

● Has scalability problems● Could likely be improved,

but this is the current state-of-the-art

● Fully Condor-based user authentication● Staged input data not shared

Same as remotesubmissions

Can be mitigatedwith Job Router

(but adds anotherlayer of complexity)

Page 16: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 16

Introducing

RemoteCondor

Page 17: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 17

What's the big idea?

● Let the users login into a remote machine● And run the cmdline tools there True client

approach

Page 18: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 18

What's the big idea?

● Let the users login into a remote machine● And run the cmdline tools there

Advantages:● True local Condor experience● Standard system authentication and authorization

● No admin privileges for the users

● Trust based on “central” Schedd admin skills● Can regulate and transform Condor submissions

● Minimize security risk● Central handling● Familiar to users

No exceptions

Page 19: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 19

What's the big idea?

● Let the users login into a remote machine● And run the cmdline tools there

Advantages:● True local Condor experience● Standard system authentication and authorization

● No admin privileges for the users

● Trust based on “central” Schedd admin skills● Can regulate and transform Condor submissions

● Minimize security risk● Central handling● Familiar to users

No exceptions

Big deal!

Where's the news?

Page 20: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 20

What's the big idea?

● Let the users login into a remote machine● And run the cmdline tools there

● … while preserving the local look-and-feel● RemoteCondor provides

● Wrappers around major Condor cmdline tools● Integration with sshfs

https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=RemoteCondor

Page 21: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 21

RemoteCondor wrappers

● Provide wrappers that use ssh under the hood● Users (almost) unaware of the trick

● But may be prompted for a password● Works best with public key authentication

sshd

Schedd node

Schedd

condor_submitcondor_q

Submit nodeCollector

Negotiator

StartdAu

th

condor_submitcondor_q

Page 22: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 22

RemoteCondor and sshfs

● But being able to talk to Condor is not enough● Users must be able to create and read data!

● Using sshfs solves the problem● Schedd-local disk mounted on submit node● Using ssh as a tunnel● All in user space (FUSE)

● RemoteCondor will properly convert paths(within certain limits)

http://fuse.sourceforge.net/sshfs.html

Disk local to Scheddfor maximum performance

Page 23: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 23

RemoteCondor and sshfs

● But being able to talk to Condor is not enough● Users must be able to create and read data!

● Using sshfs solves the problem● Schedd-local disk mounted on submit node

sshd

Schedd node

Schedd

Submit nodeCollector

Negotiator

StartdAu

th

Real disksshfs

Page 24: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 24

Using RemoteCondor

● Distributed in the Condor src tarball● In the Contrib section

● Requires a “make install”● To put the proper files in place

● Plus minimal configuration● Where is the remote Schedd node?● What username to use?● Where to mount the sshfs partition?

https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=RemoteCondor

Page 25: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 25

Summary

● Traditional Condor not suitable for user machines● Keeping Schedd nodes professionally maintained

highly desirable● To minimize security risks and control job flow

● RemoteCondor allows this operation modewhile preserving the local look-and-feel● Requires minimal local install

Page 26: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 26

Acknowledgements

This work is partially sponsored by ● the US National Science Foundation under Grants No. OCI-0943725 (STCI) and PHY-0612805 (CMS Maintenance & Operations),

and ● the US Department of Energy under Grant No. DE-FC02-06ER41436 subcontract No. 647F290 (OSG).