gLExec, SCAS and the paths forward Introduction to pilot jobs and gLExec and SCAS framework
Post on 25-Feb-2016
41 Views
Preview:
DESCRIPTION
Transcript
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
gLExec, SCASand the paths forward
Introduction to pilot jobs and gLExec and SCAS framework
David GroepNikhef
release 8
Enabling Grids for E-sciencE
INFSO-RI-508833
Outline
• Late Binding and the Distribution of Access Control• Distributing site access control in-depth using gLExec• gLExec deployment scenarios• Coordinating Site Access Control with SCAS
gLExec, SCAS, and the road towards distributes access control 2
Enabling Grids for E-sciencE
INFSO-RI-508833
Jobs: from early to late binding
User submits his jobs to a resource through a ‘cloud’ of intermediaries
Direct binding of payload and submitted grid job• job contains all the user’s business• access control is done at the site’s edge• inside the site, the user job has a specific, site-local, system identity
gLExec, SCAS, and the road towards distributes access control 3
Enabling Grids for E-sciencE
INFSO-RI-508833
Binding Late
user’s system for job management
job container binds to actual workload
Late binding of work load using ‘pilot jobs’• generic job containers are sent, which can verify the ‘surroundings’• retrieve payload from a repository ‘elsewhere’• if the repository is run by the user, on a per-user bases, then it is likely that it’s the users’ payload – if communication is secure
gLExec, SCAS, and the road towards distributes access control 4
Enabling Grids for E-sciencE
INFSO-RI-508833
Multi-User Pilot Jobs
What if the user ‘outsources’ the running of the pilot jobs?• then whoever runs the pilot jobs, will run workload for multiple users• but the site only grants access to the ‘service provider’ (VO) …
gLExec, SCAS, and the road towards distributes access control 5
Enabling Grids for E-sciencE
INFSO-RI-508833
Impact of late binding on sites and credentials
At the site itself, what does a user job look like?
gLExec, SCAS, and the road towards distributes access control 6
Enabling Grids for E-sciencE
INFSO-RI-508833
Pushing access control downwards
gLExec, SCAS, and the road towards distributes access control 7
Classic model
Enabling Grids for E-sciencE
INFSO-RI-508833
Pushing access control downwards
gLExec, SCAS, and the road towards distributes access control 8
Multi-user pilot jobs hiding in the classic model
Enabling Grids for E-sciencE
INFSO-RI-508833
MUPJ security issues
With multi users use a common pilot job deployment Users, by design, will use the same account at the site
•Accountabilityno longer clear at the site who is responsible for activity
•Integritya compromise of any user using the MUPJ framework ‘compromises’ the entire framework
the framework can’t protect itself against such compromiseunless you allow change of system uid/gid
•Site access control policies are ignored
•… and several more …
gLExec, SCAS, and the road towards distributes access control 9
Enabling Grids for E-sciencE
INFSO-RI-508833
Pushing access control downwards
gLExec, SCAS, and the road towards distributes access control 10
Making multi-user pilot jobs explicit with distributedSite Access Control (SAC)
- on a cooperative basis -
Enabling Grids for E-sciencE
INFSO-RI-508833
Implementing distributed SAC
Component 1: gLExec
a thin layerto change Unix domain credentials
based on grid identity and attribute information
you can think of it as:• ‘a replacement for the gatekeeper’• ‘a griddy version of Apache’s suexec’• ‘a program wrapper around LCAS, LCMAPS or GUMS’
gLExec, SCAS, and the road towards distributes access control 11
Enabling Grids for E-sciencE
INFSO-RI-508833
Pilot Jobs and gLExec
On success: gLExec will set the uid/gid to the new user’s job and execute itOn failure: gLExec returns with an error, and pilot job can terminate or obtain other user’s job
gLExec, SCAS, and the road towards distributes access control 12
Enabling Grids for E-sciencE
INFSO-RI-508833
gLExec deployment modes
• Identity Mapping Mode – ‘just like on the CE’– have the VO query (and by policy honour) all site policies– actually change uid based on the true user’s grid identity– enforce per-user isolation and auditing using uids and gids– requires gLExec to have setuid capability
• Non-Privileged (‘Logging Only’) Mode – declare only– have the VO query (and by policy honour) all site policies– do not actually change uid: no isolation or auditing per user– the gLExec invocation will be logged, with the user identity– does not require setuid powers – job keeps running in pilot space
• ‘Empty Shell’ – do nothing but execute the command…
gLExec, SCAS, and the road towards distributes access control 13
Enabling Grids for E-sciencE
INFSO-RI-508833
Identity change
Let’s assume you make it setuid. Fine. Where to map to:
• To a shared set of common pool accounts– Uid and gid mapping on CE corresponds to the WN– Requires SCAS or shared state (gridmapdir) directory– Clear view on who-does-what
• To a per-WN set of pool accounts– No site-wide configuration needed– Only limited (and generic) set of pool uids on the WN– Need only as many pool accounts as you have job slots– Makes cleanup easier, ‘local’ to the node
• Or something in between ... e.g. 1 pool for CE other for WN
But if it is not setuid, it cannot isolate & protect the pilot.
gLExec, SCAS, and the road towards distributes access control 14
Enabling Grids for E-sciencE
INFSO-RI-508833 gLExec: gluing grid computing to the Unix world – CHEP 2007 15
But all pieces should go together
1. glexec on the worker-node deployment
2. way to keep the pilot jobs submitters to their word– mainly: monitor for compromised pilot submitters credentials– system-level auditing of the pilot jobs,
but auditing data on the WN is useful for incident investigations only
3. ‘internal accounting should be done by the VO’– the regular site accounting mechanisms are via the batch system, and
these will see the pilot job identity– the site can easily show from those logs the usage by the pilot job
– making a site do accounting based glexec jobs is non-standard, and requires non-trivial effort
Enabling Grids for E-sciencE
INFSO-RI-508833
Batch system and OS compatibility
How does gLExec affect the basic functions of a batch system?1. Job Submission2. Job Suspend/Resume3. Job Kill
4. CPU time accounting– No change with respect
to current behaviour of jobs– Times are accumulated
on wait and collated with the gLExec usage
by keeping the process tree, gLExec is transparent for the
tested batch systems
tests based on work by Ulrich SchwickerathgLExec, SCAS, and the road towards distributes access control 16
Enabling Grids for E-sciencE
INFSO-RI-508833
gLExec: where are we now?
You can deploy without changes if• you run LSF or Torque and
don’t manage disk or processes• you run LSF or Torque and
use TMPDIR and process-tree based style job slaughtering
You should update your scripts to use the back-mapping dir if• you use LSF or Torque and use uid recognition for pruning
stray processes (but you ought to change this anyway)• you use uid recognition for file cleaning
gLExec, SCAS, and the road towards distributes access control 17
Enabling Grids for E-sciencE
INFSO-RI-508833
What Happens to Access Control?
So, as the workload binding get pushed deeper into the site, access control by the site has to become layered as well …
… how does that affect site access control software and its deployment ?
gLExec, SCAS, and the road towards distributes access control 18
Enabling Grids for E-sciencE
INFSO-RI-508833
Site Access Control today
PRO already deployedno need for external components, amenable to MPI
CON when used for MU pilot jobs, all jobs run with a single identityend-user payload can back-compromise pilots, and cross-infect other jobsincidents impact large community (everyone utilizing the MUPJ framework)
gLExec, SCAS, and the road towards distributes access control 19
Enabling Grids for E-sciencE
INFSO-RI-508833
Node-local access control
PRO no single points of failurewell defined number of pool accounts (as many as there are job slots/node)containment of jobs (no cross-WN infection)
CON need to distribute the policy through fabric management/config toolsno cross-workernode mapping (e.g. no support for pilot-launched MPI)
gLExec, SCAS, and the road towards distributes access control 20
Enabling Grids for E-sciencE
INFSO-RI-508833
WN-coordinated access control
PRO single unique account mapping per user across whole farm, CE, and SEtransactions database is simple (implemented as an NFS file system)communications protocol is well tested and well known
CON need to distribute the policy through fabric management config toolscoordination only applies to the account mapping, not to authorization
gLExec, SCAS, and the road towards distributes access control 21
Enabling Grids for E-sciencE
INFSO-RI-508833
Site-central access control
PRO single unique account mapping per user across whole farm, CE, and SE*can do instant banning and access control in a single placeprotocol profile allows interop between SCAS and GUMS (but no others!)
CON replicated setup for redundancy needed for H/A sitesstill cannot do credential validation (formalistic issues with the protocol)
gLExec, SCAS, and the road towards distributes access control 22* of course, central policy and distributed
per-WN mapping also possible!
Enabling Grids for E-sciencE
INFSO-RI-508833
Centralizing decentralized SAC
Supporting consistent • policy management• mappings (if the are not WN-local)• banning
via the
Site Central Authorization Service SCAS– network wrapper around LCAS and LCMAPS– it’s a variant-SAML2XAML2 client-server– it is itself access controlled
gLExec, SCAS, and the road towards distributes access control 23
Enabling Grids for E-sciencE
INFSO-RI-508833
Local LCMAPS
gLExec, SCAS, and the road towards distributes access control 24
• Linked dynamically or statically to application• does both credential acquisition
- local grid map file- VOMS FAQN to uid and gids
• and enforcement- setuid- krb5 token requests- AFS tokens- LDAP directory update
LCAS is similar is use and design, but makes the basic Yes/No decision
Enabling Grids for E-sciencE
INFSO-RI-508833
SCAS: LCMAPS in the distance
gLExec, SCAS, and the road towards distributes access control 25
• Application links LCMAPS dynamically or statically, or includes Prima client• Local side talks to SCAS using a variant-SAML2XACML2 protocol
- with agreed attribute names and obligation between EGEE/OSG- remote service does acquisition and mappings- both local, VOMS FAQN to uid and gids, etc.
• Local LCMAPS (or application like gLExec) does the enforcement
Enabling Grids for E-sciencE
INFSO-RI-508833
Talking to SCAS
• From the CE– Connect to the SCAS using the CE host credential– Provide the attributes & credentials of the service requester, the
action (“submit job”) and target resource (CE) to SCAS– Using common (EGEE+OSG+GT) attributes– Get back: yes/no decision and uid/gid/sgid obligations
• From the WN with gLExec– Connect to SCAS using the credentials
of the pilot job submitterAn extra control to verify the invoker of gLExec is indeed an authorized pilot runner
– Provide the attributes & credentials of the service requester, the action (“run job now”) and target resource (CE) to SCAS
– Get back: yes/no decision and uid/gid/sgid obligations• The obligations are now coordinated between CE and WNs
gLExec, SCAS, and the road towards distributes access control 26
Enabling Grids for E-sciencE
INFSO-RI-508833
Where does SCAS go?
SCAS is the medium-term answer to distributed access control– Going to central certification now– Testing by SA3/AMS shows well over 25 Hz performance
(speed was limited only by available number of client nodes,where bandwidth is limited by running in virtual machines)
– ‘bonus’ features (like central credential validation) may be added on demand – ask if you want this
Long-term solution is part of the new Authorization Framework• new Execution Environment Service (EES) will • take care of the account mapping &c, • using technology elements from SCAS• and leveraging the other AuthZ components for policy administration,
coordinated policy decisions and enforcement
gLExec, SCAS, and the road towards distributes access control 27
Enabling Grids for E-sciencE
INFSO-RI-508833
Questions?
QgLExec, SCAS, and the road towards distributes access control 28
top related