Top Banner
Hadoop Elephant in Active Directory Forest Marek Gawiński, Arkadiusz Osiński Allegro Group
25

Hadoop Elephant in Active Directory Forest

Feb 10, 2017

Download

Documents

LyDuong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hadoop Elephant in Active Directory Forest

Hadoop Elephant in Active Directory Forest

Marek Gawiński, Arkadiusz OsińskiAllegro Group

Page 2: Hadoop Elephant in Active Directory Forest
Page 3: Hadoop Elephant in Active Directory Forest

Agenda

● Goals and motivations● Technology stack● Architecture evolution● Automation integrating new servers● Making AD users and groups visible to Linux● Making architecture non-vulnerable to AD

service inaccessibility● Auto-deployment clients software on

desktops

Page 4: Hadoop Elephant in Active Directory Forest

Allegro Hadoop cluster in numbers

4 terabytes RAM2 petabytes disk space47 datanodes79 projects612 users

Page 5: Hadoop Elephant in Active Directory Forest

Goals and motivations

● Secured cluster● Central authentication and authorisation ● Compliance for real and project users and

groups● Cluster resources available from desktop● Integrating new servers automatically● Making whole architecture non-vulnerable

for failures or timeouts to AD● Auto-deployment and autoconfiguration of

Hadoop clients’ software on users desktops

Page 6: Hadoop Elephant in Active Directory Forest

Technology stack

● Cloudera CDH5● MIT Kerberos● Microsoft Active Directory● FreeIPA● sssd● puppet● msktutil● Hadoop desktop client

Page 7: Hadoop Elephant in Active Directory Forest

History - FreeIPA+FreeIPA Kerberos

Client

Secured Hadoop cluster

FreeIPA User

Local groups management

Kerberos KDCUser/pass

Kerberos Service Ticket

Che

ck u

ser/p

ass

Internal hadoop credsCheck groups

Page 8: Hadoop Elephant in Active Directory Forest

History - FreeIPA+own Kerberos

Client

Secured Hadoop cluster

FreeIPA User

Local groups managementKerberos Service Ticket

Che

ck u

ser/p

ass

User/pass

Inte

rnal

had

oop

cred

s

Check groups

Kerberos KDC

Kerberos KDC MIT

Page 9: Hadoop Elephant in Active Directory Forest

History - FreeIPA+own Kerberos+AD

Client

Secured Hadoop cluster

FreeIPA User

Local groups management

Kerberos KDC MIT

Kerberos Service Ticket

Che

ck u

ser/p

ass

AD User&Groups

AD KerberosChe

ck u

ser/p

ass

User/pass

Internal hadoop credsCheck groups

Check groupsUser/pass

Page 10: Hadoop Elephant in Active Directory Forest

Final - own Kerberos+AD

Client

Secured Hadoop cluster

Kerberos Service Ticket

AD User&Groups

AD KerberosChe

ck u

ser/p

ass

Kerberos KDC MIT

Internal hadoop creds

Check groupsUser/pass

Page 11: Hadoop Elephant in Active Directory Forest

Integrating new Linux servers automatically with AD

AD User&Groups

AD Kerberos

Msktutil

Kerberos keytab

Create user

Create principal

Page 12: Hadoop Elephant in Active Directory Forest

Integrating new Linux servers automatically with AD

define get_ad_keytab ( $path = '', ...) { ... $realm = 'SOME_REALM' $pass = hiera('hadoop_prod/ad/krb_manager_pass') $principal = "${title}/${host}@${realm}" $command = "echo ${pass} | kinit _hadoop_manager@${realm}; \ /usr/local/bin/add_ad_princ.sh ${title} ${host} ${path}; kdestroy" ...

msktutil -c -s $PRINCIPAL --upn $PRINCIPAL -k $KEYTAB \ --computer-name $COMPUTER_NAME \ --server $SERVER_KRB \ --realm $REALM \ -b $USER_LDAP_ROOT \ --dont-expire-password \ --description "\"$DESCRIPTION\"" \ --user-creds-only

Page 13: Hadoop Elephant in Active Directory Forest

Integrating new Linux servers automatically with AD

root@nn1:~# klist -ketKeytab name: FILE:/etc/krb5.keytabKVNO Timestamp Principal---- ------------------- ------------------------------------------------------ 1 08/17/2015 13:26:45 host/[email protected] (aes256-cts-hmac-sha1-96) 1 08/17/2015 13:26:45 host/[email protected] (aes128-cts-hmac-sha1-96) 1 08/17/2015 13:26:45 host/[email protected] (des3-cbc-sha1) 1 08/17/2015 13:26:45 host/[email protected] (arcfour-hmac) 1 08/17/2015 13:26:45 host/[email protected] (camellia128-cts-cmac) 1 08/17/2015 13:26:45 host/[email protected] (camellia256-cts-cmac) 4 08/17/2015 13:30:23 [email protected] (arcfour-hmac) 4 08/17/2015 13:30:23 [email protected] (aes128-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 [email protected] (aes256-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 host/[email protected] (arcfour-hmac) 4 08/17/2015 13:30:23 host/[email protected] (aes128-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 host/[email protected] (aes256-cts-hmac-sha1-96)

Page 14: Hadoop Elephant in Active Directory Forest

Integrating new Linux servers automatically with AD

Separated Subtree in AD structure

Page 15: Hadoop Elephant in Active Directory Forest

System Security Services Daemon

● Identity and authentication● Multiple providers (FreeIPA, LDAP, AD)● High availability for backends● Provides PAM and NSS modules● Caching● > 1.11.x - stable support for AD forest auth

Page 16: Hadoop Elephant in Active Directory Forest

System Security Services Daemon

AD schema with no modifications

/etc/sssd/sssd.conf

[domain/AD.REALM]id_provider = adad_server = h1, h2, h3ad_backup_server = hb1, hb2, hb3auth_provider = adchpass_provider = adaccess_provider = adenumerate = Falsekrb5_realm = AD.REALMldap_schema = adldap_id_mapping = Truecache_credentials = Trueldap_access_order = expireldap_account_expire_policy = adldap_force_upper_case_realm = truefallback_homedir = /home/AD.REALM/%udefault_shell = /bin/falseldap_referrals = false

root@nn1:~# id _hc_tech_prod |tr "," "\n"uid=1827653611(_hc_tech_prod)gid=1827600513(domain users)groups=1827600513(domain users)1827652945(_gr_hc_users_common)1827647474(_gr_hc_hadoop_prod)1827652940(_gr_hc_project1_prod)1827652919(_gr_hc_project2_prod)

Page 17: Hadoop Elephant in Active Directory Forest

Making whole architecture non-vulnerable for failures

/etc/sssd/sssd.conf

[nss]memcache_timeout = 3600

Local filesystem nss cache

Active Closest DC

Fallback servers in Remote DC

Page 18: Hadoop Elephant in Active Directory Forest

Auto-deployment and autoconfiguration on desktops

● Install script for Hadoop Client on desktops● Refresh configs with currently prod environment● Support for HDFS/YARN/Hive/Spark

[marek.gawinski:~/ALLEHADOOP] $ sh env.shPassword for [email protected]: **************

[marek.gawinski:~/ALLEHADOOP] $ klistTicket cache: FILE:/tmp/krb5cc_1511317717Default principal: [email protected]

Valid starting Expires Service principal09/04/15 23:31:35 09/05/15 09:31:35 krbtgt/[email protected]

renew until 09/11/15 23:31:33

Page 19: Hadoop Elephant in Active Directory Forest

Auto-deployment and autoconfiguration on desktops

[marek.gawinski:~/ALLEHADOOP] $ hivehive (default)> show databases;OKdatabase_nametpch_benchmarks...xwing_pocTime taken: 0.816 seconds, Fetched: 72 row(s)hive (default)> set hive.execution.engine = tez;hive (default)> select count(*) from table1;

[marek.gawinski:~/ALLEHADOOP] $ hdfs dfs -lsFound 8 itemsdrwxr-xr-x - marek.gawinski hadoop 0 2015-08-06 02:00 .Trashdrwxr-xr-x - marek.gawinski hadoop 0 2015-07-28 21:01 .hiveJarsdrwxr-xr-x - marek.gawinski hadoop 0 2015-07-09 10:43 .sparkStagingdrwx------ - marek.gawinski hadoop 0 2015-05-22 02:35 .stagingdrwxr-xr-x - marek.gawinski hadoop 0 2015-08-31 13:11 oozie1-rw-r--r-- 3 marek.gawinski hadoop 43 2015-05-26 15:26 ozzietest1.hql-rw-r--r-- 3 marek.gawinski hadoop 13 2015-08-31 12:30 pwd.txtdrwxr-xr-x - marek.gawinski hadoop 0 2015-04-16 16:21 tables

Page 20: Hadoop Elephant in Active Directory Forest

Auto-deployment and autoconfiguration on desktops

Page 21: Hadoop Elephant in Active Directory Forest

Auto-deployment and autoconfiguration on desktops

Page 22: Hadoop Elephant in Active Directory Forest

Auto-deployment and autoconfiguration on desktops

Page 23: Hadoop Elephant in Active Directory Forest

Auto-deployment and autoconfiguration on desktops

Page 24: Hadoop Elephant in Active Directory Forest

Benefits

● One standard for access control to all company resources

● Every new employee automatically can play with Hadoop with no additional effort

● One password to all systems

Page 25: Hadoop Elephant in Active Directory Forest

Thank you!

Questions?