Top Banner
Challenges Running an NFSv4-backed OSG Cluster Kevin Coffman [email protected] Center for Information Technology Integration University of Michigan
28

Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman [email protected] Center for Information Technology Integration University of Michigan.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Challenges Running an NFSv4-backed OSG Cluster

Kevin [email protected]

Center for Information Technology Integration

University of Michigan

Page 2: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Overview

● Basic NFSv4 in production

● Open Science Grid (OSG) Overview

● OSG Installation

● OSG Configuration

● Submitting a job!

● Authentication differences (AFS vs. NFSv4)

● Authentication futures

Page 3: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Basic NFSv4 file service in production

● Basic file storage● User name mappings● Home directories● Kernel builds, etc.

Page 4: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Open Science Grid Overview

● Architecture

– Head node & worker notes

– Core is NSF Middleware Initiative (including Globus,

Condor, kx.509)● Authentication

– X.509, kx.509, proxy certs● No cluster file-system required

– “Home”, Base, Data, Apps, Temp, Worker node temp

Page 5: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

OSG Installation

● New Linux kernels, new NFSv4 code, new OSG releases, repeat!

● Base installation is done solely on head node

● Credentials needed

– Root access assumed for local file system access

● Mapping machine cred now necessary

– Kerberos credentials for NFS file system access

● Name-to-UID mapping issues

– Found the need for tools/scripts for flushing mappings

Page 6: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

OSG Configuration

● Daemons (i.e., MonALISA and Condor) on head node and worker nodes require authentication for file system access

– Keytabs

– More name to UID mapping required

● Virtual Organization (VO) accounts

– DN to UNIX account name via grid-mapfile

– Name to UID mappings required for file system access

Page 7: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Submitting a job!

● Job submission uses X.509 authentication

– Need Kerberos authentication for file-system access

– Gatekeeper forks a job manager process for each job● Job manager process runs as the original user and needs

user’s credentials

● Verified works as expected using AUTH_SYS w/o requiring Kerberos credentials

Page 8: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

MGRID Architecture

mod_ssl

mod_kx509

mod_kct

CHEF

Apache

Tomcat

KCT

GateKeeper

Resource

Grid Resource

KCA

kx509

kinit

User Workstation

KDC

Kerberos V5

SSL (Client Certificate required)

GSI

Kerberos

Kerberos

SASL

MGRID Portal

1

2

3

4

5

6

7

6

Authorization

Resource Mgr SASL

8

mod_jk

mod_php

LDAP

Authorization

LDAP

libpkcs11

Browser

Page 9: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Grid job authentication issues

● Jobs scheduled to run in the future● Long-running jobs (refreshing credentials)● Combination of both (future and long-running)● Distribution of user credentials to worker nodes

for file system access

Page 10: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Authentication differences(AFS vs NFSv4)

AFS NFSv4

Kernel uses tokens Kernel uses GSS contexts

Kernel assumes tokens were obtained prior to file access (klog)

Kernel requests GSS context on-demand at the time of the (first) file access

Single token for all file servers in a cell

Separate service ticket (really GSS context) needed for each server

Page 11: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Current Architecture

user

kernel

client server

user

process

GSSDSVC

GSSD

NFS NFSDgss context

cache

gss context

cacheCredentials

on Disk keytab

KDC

AS TGS

1

2

3 4

5

6

7

8

9

10

11

1213

Page 12: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Authentication futures

● SPKM3

– Allows us to stay in X.509 world

– Anonymous (DH)● Certificate on server to prevent MIM

– X.509 Certificates● LIPKEY

– Built on top of SPKM3

– Allows TLS-like password authentication

Page 13: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Linux kernel keys support(a.k.a. keyring)

● General credential storage in-kernel– thread-specific keyring

– process-specific keyring

– session-specific keyring (PAG-like via JOIN_SESSION_KEYRING)

● Different key types: ‘user’, ‘rpcsec_gss context’● Create, delete, link, search, revoke, etc.● Quotas and permissions● Referenced by serial # and description

Page 14: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

MIT Kerberos ccache using keyring as backing storage

• Assumes a single “active” credentials cache• Can store more than one ccache in same session keyring• All user-level codeSession | +---> krb5_cc_active (key: contains 0x00004f12) | +---> /tmp/krb5cc_20010_XF45C2 (keyring: id is 0x000023cd) | | | +---> [email protected] (principal info) | +---> krbtgt/[email protected] | +---> nfs/[email protected] | +---> nfs/[email protected] | +---> pop/[email protected] | +---> [email protected] | +---> /tmp/krb5cc_20010_umich (keyring: id is 0x00004f12) | +---> [email protected] (principal info) +---> krbtgt/[email protected] +---> imap/[email protected]

Page 15: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Mount using keyring support

● Mount program will use keytab to set up machine credentials in keyring

● /sbin/request-key invoked and finds machine credentials

● Context is negotiated and “rpcsec_gss context” key instantiated

Page 16: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

User access using keyring support

● Assumes they have credentials in keyring via kinit or PAM

– No more looking around blindly for creds in

filesystem

– /sbin/request-key invoked and finds user’s session-

specific credentials

Page 17: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Keyring issues

● Upcalls from asynchronous events● Still need to tie “rpcsec_gss context” keys to

Kerberos credentials

Page 18: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Future Architecture

user

kernel

client server

user

process

request-key

handler

SVC

GSSD

NFS NFSDgss context

cache

(in keyring)

gss context

cache

KDC

AS TGS

1

2 3

4

5

6 8

9

1011TGT

keytab

7

Page 19: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Questions / Discussion

http://www.citi.umich.edu/projects

Page 20: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

References

● Open Science Grid– http://www.opensciencegrid.org

● MonALISA– http://monalisa.cacr.caltech.edu

● Condor– http://www.cs.wisc.edu/condorCondor

● Keyring– Kernel Source: /Documentation/keys.txt

Page 21: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Backup Slides

Page 22: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Krb5: Obtaining gss context

● TGT: currently stored in file system● Per NFSD service ticket: currently stored in file

system● GSSD locates user credentials by convention

(/tmp/krb5cc_uid)● Synchronizing gss_context and credential

problematic

Page 23: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Linux credential interface● New system calls for kernel credential

placement● Available for upcoming PAG implementation● Passed via upcall to GSSD● Credential vs. gss context management no

longer a problem

Page 24: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Linux Krb5 kernel credential ● Pass TGT to kernel as credential ● Stored in user process (PAG)● Passed to GSSD via gss_init_sec_context upcall● GSSD manages Krb5 NFSD service tickets● Multiple in kernel TGTs vs. cross realm

authentication

Page 25: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Client: LIPKEY with SPKM3

● Initiator– Anonymous SPKM3 client

● Credential:– LIPKEY username and password

– sent to server encrypted in SPKM3 session key

● Context– per <user, nfsd> LIPKEY(?) and SPKM3 gss

context

Page 26: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Linux LIPKEY kernel credential

● LIPKEY credential (username and password) is per server.

● Not stored in kernel● Instead, store information to be passed to GSSD

which will prompt user for LIPKEY password for each NFSD.

Page 27: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Client: SPKM with X509 ● Initiator

– password for long term user X.509 private key

● Credential– short term proxy X509 credential and private key

(grid-proxy-init)

● Context– per <user, nfsd> SPKM gss context

Page 28: Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan.

Linux SPKM kernel credential

● Pass proxy (short term) X509 credential and private key to kernel as credential

● Stored in user process (PAG)● Passed to GSSD via gss_init_sec_context upcall● GSSD manages CA hierarchy and credential

checking