Top Banner
Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN
23

Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Jan 18, 2018

Download

Documents

Winifred Morton

How it all should work (1) Users and services have digital certificates signed by trusted certificate authorities (CAs) – Certificate lifetime usually is 1 year Users are members of virtual organizations (VOs) – WLCG: alice, atlas, cms, lhcb, dteam, ops, … – Users need to re-sign AUP every year – Sites decide which VOs to support at which QoS Services are rarely made members of a VO – It would be desirable to some extent A service could prove that it is trusted by the VO Now: rely on information system + filtering HEPiX , LBNL3
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Security aspects of the WLCG infrastructure: clients and services

Maarten LitmaathCERN

Page 2: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Outline

• How it all should work• Proxies• Incoherence• Security model examples• Banning• Argus• Site authorization• Pilot jobs• Virtual machines and clouds• Data security• Other services• SSO, identity providers• Vulnerability aspects

HEPiX 2009-10-29, LBNL 2

This list probably is incomplete…

Page 3: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

How it all should work (1)• Users and services have digital certificates signed by

trusted certificate authorities (CAs)– Certificate lifetime usually is 1 year

• Users are members of virtual organizations (VOs)– WLCG: alice, atlas, cms, lhcb, dteam, ops, …– Users need to re-sign AUP every year– Sites decide which VOs to support at which QoS

• Services are rarely made members of a VO– It would be desirable to some extent

• A service could prove that it is trusted by the VO• Now: rely on information system + filtering

HEPiX 2009-10-29, LBNL 3

Page 4: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

How it all should work (2)

• Users create short-lived proxies for grid access

• Long-lived proxies are only found on MyProxy servers

• Proxies are delegated to services as needed– Some services can retrieve or renew proxies via

MyProxy

• Services interpret proxies consistently– The same criteria are used by different services– User jobs and data are protected as needed

• Services log security-related information consistently

• Users can easily be banned as neededHEPiX 2009-10-29, LBNL 4

Page 5: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Where we want to be

HEPiX 2009-10-29, LBNL 5

Page 6: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Where we are

HEPiX 2009-10-29, LBNL 6

Page 7: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Proxies (1)• Plain grid proxy

– Usage: grid-proxy-init– Mapping can only be based on the DN– DNs in grid-mapfile harvested from VOMS servers

• Different subsets can be mapped differently• VOMS proxy

– Usage:• voms-proxy-init –voms vo• voms-proxy-init –voms vo:/vo/group• voms-proxy-init –voms vo:/vo/group/Role=role

– Plain grid proxy + set of attributes signed by VOMS server

– Attributes: groups and/or roles– Mapping can be based on attributes and/or the DN

• Attributes usually preferredHEPiX 2009-10-29, LBNL 7

Page 8: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Proxies (2)• Proxy lifetime should be “short”

– Cf. AFS/Kerberos token lifetime– Default 12 hours, 24 hours probably OK– Current practice: LHC experiments use multi-day proxies

to avoid potential problems with proxy renewal• CMS use 8-day proxies!

• Long job needs proxy to be renewed before it expires

• Long-lived proxies can be stored on a MyProxy server– Trusted services can retrieve or renew short-lived proxies

• MyProxy server currently is a single point of failure– RFE: upload proxies to multiple servers, try all of them

for downloading proxies as needed

HEPiX 2009-10-29, LBNL 8

Page 9: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Incoherence • Different services treat proxies differently

– Libraries – Mapping

• Plain proxies• VOMS proxies

– Logging– Banning

• Not possible on certain services!– Testing/debugging/forensics tools

• Available for some scenarios on some services

• Try finding two gLite services with the same security model !– OSG, ARC?

HEPiX 2009-10-29, LBNL 9

Page 10: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Security model examples• LCG Computing Element

– VOMS mapping with fallback on plain proxy mapping• CREAM Computing Element

– VOMS only• OSG Computing Element

– GUMS: VOMS, DN• Disk Pool Manager

– Virtual IDs– VOMS mapping and plain proxy mapping

• dCache– gPlazma: GUMS, vo-role-map, …

• Workload Management System– VOMS authZ by 2 different libraries: GridSite, LCMAPS

• But Condor-G engine only looks at the DN!HEPiX 2009-10-29, LBNL 10

Page 11: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Banning• OSG have SAZ and GUMS, ARC have Charon

• EGEE/gLite: LCAS library and SCAS/Argus services have banning plugins– Easy to ban a DN– LCG-CE, CREAM-CE, WMS

• DPM/LFC virtual ID table will get banning flags– Currently only plain proxies can be fully banned

• By mapping them to non-existent accounts/VOs – VOMS proxies can be banned only from creating new

files

• Argus should make this consistent and easy– Also can import a grid-wide ban list

HEPiX 2009-10-29, LBNL 11

Page 12: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Argus

• Argus is the long-term gLite authorization framework• It should give all gLite services a consistent authZ model• It allows for authZ decisions to be taken centrally per site

– A single place to pull the plug• It can import remote policies

– Regional, national, project-based, …– Give priority to local/national/… users– Banning of DNs, e.g. grid-wide

• Policies can affect QoS for DNs or VOMS attributes– Preferences– Banning

• Argus will be introduced gradually– It can coexist with legacy services

HEPiX 2009-10-29, LBNL 12

Page 13: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Site authorization

• EGEE– SCAS

• Released to production early July for glexec on the WN• Only deployed on the few sites that helped debugging

glexec and its use by ATLAS and LHCb– Argus

• In certification• OSG

– GUMS– SAZ

• ARC– Charon– Argus support foreseen

HEPiX 2009-10-29, LBNL 13

Page 14: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Pilot jobs (1)

• A pilot job checks and prepares the worker node environment for a real job, i.e. a task that it downloads from a central task queue– Late binding leads to good efficiency

• A multi-user pilot job can pick up a task from any user in the VO

• The task should run with its own associated proxy– Access services, store data etc. with the correct identity

• It should run under an account corresponding to that proxy– Separate users as the CE head node would have done– Protect the pilot proxy against malicious payloads

• A setuid root utility is needed to switch to the correct identity– Like “sudo” or Apache “suexec” gLExec

HEPiX 2009-10-29, LBNL 14

Page 15: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Pilot jobs (2)

• Each experiment has a pilot job framework– ALICE: AliEn– ATLAS: PanDA– CMS: glideinWMS, only used on OSG– LHCb: DIRAC

• All examined by GDB Pilot Job Frameworks Review group• Current usage

– Production managers run VO workload for many/all users

– Individual users may be able to run their own jobs• Foreseen usage

– Pilot jobs use glexec to run payload under user account• Problem: we have no production experience with glexec

and there is little time left before the LHC starts HEPiX 2009-10-29, LBNL 15

Page 16: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Virtual machines and clouds

• Running each job in its own VM is desirable– Reduce security interference between jobs

• Shared software area and shared services remain– Local files left behind can be cleaned up completely– Implemented at some sites and becoming more popular

• Shared SW area not needed when SW included in the image– Avoids Trojan horses and bottleneck

• Complete images also are a natural fit for clouds

• Some sites are experimenting with clouds

HEPiX 2009-10-29, LBNL 16

Page 17: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Data security (1)

• Fine-grained security policies for data access are possible in principle

• In practice there are only 2 levels of security today– Production managers are responsible for the vast

majority of a VO’s data volume (99%)– Only they have write access to specific resources

used in managing production data• Reserved sub-trees in the catalog name space• Reserved disk pools and tape access

– All the remaining resources are group-writable• By default writable for the whole VO!• Different groups in a VO can be shielded from each

other– If they are mapped differently– This may require site admin intervention

HEPiX 2009-10-29, LBNL 17

Page 18: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Data security (2)• BeStMan

– Classic grid-mapfile, GUMS• CASTOR

– Classic grid-mapfile, insecure RFIO !!• dCache

– gPlazma supports GUMS, vo-role-mapfile, …• DPM, LFC

– Maps to virtual UIDs and GIDs (defined in DB)– Native VOMS support, fallback on classic grid-mapfile

• Lcgdm-mapfile to determine the VO for a plain grid proxy• Grid-mapfile is needed by DPM GridFTP server

• StoRM– Native VOMS support– Uses just-in-time ACLs to give access to data on cluster

FSHEPiX 2009-10-29, LBNL 18

Page 19: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Other services

• Information system– Insecure LDAP

• Anyone can search for vulnerable hosts• Information can be corrupted (DNS spoofing, MITM attack)

– Any site can claim it supports any VO• The VO can configure a filter to get rid of unwanted sites

or run a private, static information system– Filters currently work only for Computing and Storage Elements

• Monitoring– When secure, often viewable for any DN from a trusted

CA• Accounting

– Secure– Privacy

HEPiX 2009-10-29, LBNL 19

Page 20: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

SSO, identity providers

• SSO for services is popular• Identity providers

– Kerberos– Shibboleth– …

• Why should grid usage be excluded?• SSO identity can be translated into grid identity

– FNAL Kerberos CA, SLCS– SWITCH SLCS– …

HEPiX 2009-10-29, LBNL 20

Page 21: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Vulnerability aspects

• EGEE Grid Security Vulnerability Group has >70 open issues– The vast majority of them are deemed low risk …

for now

• A complete list of domains involved in WLCG could be used to configure service firewalls accordingly– Outbound client connections might also be

constrained

• Jobs/payloads should be signed by the user proxy– Close the door to “easy” injection of rogue jobs

HEPiX 2009-10-29, LBNL 21

Page 22: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Conclusions

• Security aspects of WLCG clients and services show a forest of libraries, configurations and features– A lot of legacy

• More consistency and simplicity are highly desirable

• Some important functionalities only implemented partially– Banning– Site-wide policies– Data protection

• There are steady improvements and road maps– To get us out of the woods…

HEPiX 2009-10-29, LBNL 22

Page 23: Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

HEPiX 2009-10-29, LBNL 23