Top Banner
Page 1 © Hortonworks Inc. 2014 Securing Hadoop Hadoop Security Demystifiedand then made more confusing. Presenter: Adam Muise Content: Balaji Ganesan Adam Muise
49

2014 sept 4_hadoop_security

Jan 22, 2015

Download

Software

Adam Muise

An overview of securing Hadoop. Content primarily by Balaji Ganesan, one of the leaders of the Apache Argus project. Presented on Sept 4, 2014 at the Toronto Hadoop User Group by Adam Muise.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 1. Securing HadoopHadoop Security Demystifiedand then made more confusing.Presenter:Adam MuiseContent:Balaji GanesanAdam MuisePage 1 Hortonworks Inc. 2014

2. What do we mean by Security?Say you have a house guest- Authentication- Who gets in the door- Authorization- How far are they allowed in the house and what roomsare they allowed in- Auditing- Follow them around- Encryption- When all else fails, lock it upPage 2 Hortonworks Inc. 2014 3. Insecurity Not just for Teenagers- Security is really about risk mitigation- No perfect solution exists unless youlocate your datacenter in the hull ofthe Titanic and cut all communications- The risks are:- Inappropriate access to data by internalresources- External data theft- Service outages- No knowledge of theft or inappropriateaccess- Hadoops value to a business is to centralizetheir data, that can make leaks moredetrimental than a DDoS or stolen laptopsPage 3 Hortonworks Inc. 2014 4. Attention to Hadoop security on the risePage 4 Hortonworks Inc. 2014- As Hadoop becomes moreadopted, more sensitiveproduction data is going intoclusters, more attention is beingpaid to security- Intel/Cloudera working on Project Rhino- Hortonworks introduces Apache Knox- Cloudera buys Gazzang- Hortonworks buys XASecure and turns itinto Apache Argus- HBase gets cell level security- the list goes on 5. Watch out for those malicious attacksPage 5 Hortonworks Inc. 2014 6. Layers Of Hadoop SecurityPerimeter Level Security Network Security (i.e. Firewalls) Apache Knox (i.e. Gateways)Authentication Kerberos Delegation TokensAuthorization Argus Security PoliciesOS Security File Permissions Process IsolationPage 6 Hortonworks Inc. 2014Data Protection Transport Storage Access 7. Typical Hadoop SecurityVanilla HadoopPage 7 Hortonworks Inc. 2014 8. Hadoop out of the box- While a lot of security is built into Hadoop, out of the box not much of itis turned on- Without strong authentication, anyone with sufficient access tounderlying OS has ability to impersonate users- Often paired with gateway nodes that provide stronger accessrestrictions- HDFS/YARN/Hive- Authentication - Derived from OS users local to the box the task/request is submitted from- Authorization Dependent on each project/servicePage 8 Hortonworks Inc. 2014 9. Page 9 Hortonworks Inc. 2014HDFSTypical Flow Hive AccessHiveServer 2A B CBeelineClient 10. Typical Hadoop SecurityStrong Authentication through KerberosPage 10 Hortonworks Inc. 2014 11. Kerberos PrimerPage 11 Hortonworks Inc. 2014Page 11KDCClientNNDN1. kinit - Login and get Ticket Granting Ticket (TGT)3. Get NameNode Service Ticket (NN-ST)2. Client Stores TGT in Ticket Cache4. Client Stores NN-ST in Ticket Cache5. Read/write file given NN-ST andfile name; returns block locations,block IDs and Block Access Tokensif access permitted6. Read/write block givenBlock Access Token and block IDClientsKerberosTicket Cache 12. Kerberos Summary Provides Strong Authentication Establishes identity for users, services and hosts Prevents impersonation on unauthorized account Supports token delegation model Works with existing directory services Basis for AuthorizationPage 12 Hortonworks Inc. 2014Page 12 13. Hadoop Authentication Users authenticate with the services CLI & API: Kerberos kinit or keytab Web UIs: Kerberos SPNego or custom plugin (e.g. SSO) Services authenticate with each other Prepopulated Kerberos keytab e.g. DN->NN, NM->RM Services propagate authenticated user identity Authenticated trusted proxy service e.g. Oozie->RM, Knox->WebHCat Job tasks present delegated users identity/access Delegation tokens e.g. Job task -> NN, Job task -> JT/RM Strong authentication is the basis for authorizationPage 13 Hortonworks Inc. 2014ClientPage 13NameNodeData NodeNameNodeOozie JobTrackerTask NameNode(User)KerberosorCustom(Service)Kerberos(Service)Kerberos+(User)doas(User)DelegationToken 14. User Management Most implementations use LDAP for user info LDAP guarantees that user information is consistent across thecluster An easy way to manage users & groups The standard user to group mapping comes from the OS on theNameNode Kerberos provides authentication PAM can automatically log user into KerberosPage 14 Hortonworks Inc. 2014Page 14 15. Kerberos + Active DirectoryPage 15 Hortonworks Inc. 2014Page 15Cross Realm TrustClientHadoop ClusterAD /LDAP KDCUsers: [email protected]!Hosts: [email protected]!Services: hdfs/[email protected]!User StoreUse existing directorytools to manage usersUse Kerberos tools tomanage host + serviceprincipalsAuthentication 16. Groups Define groups for each required role Hadoop has pluggable interface Mapping from user to group not stored within Hadoop Defaults to the OS information on master node Typically driven from LDAP on Linux Existing Plugins ShellBasedUnixGroupsMapping - /bin/id JniBasedUnixGroupsMapping system call LdapGroupsMapping LDAP call CompositeGroupMapping combines Unix & LDAP group mapping Strong authentication and role-based groups provide protectionsenabling shared clustersPage 16 Hortonworks Inc. 2014Page 16 17. GroupsAD /LDAPUser StorePage 17 Hortonworks Inc. 2014Plugin rw!Page 17NameNodeClient Hadoop Cluster 18. Kerberos FAQ Where do I install KDC? On a master type node User Provisioning Hook up to Corporate AD/LDAP to leverage existing User Provisioning Growing a cluster Provision new services and nodes in MIT KDC, copy keytabs to new nodes Is Kerberos a SPOF? Kerberos support HA, with delegation tokens the KDC load is reducedPage 18 Hortonworks Inc. 2014Page 18 19. Typical Flow Authenticate through KerberosPage 19 Hortonworks Inc. 2014HDFSHiveServer 2A B CKDCUse Hive ST,submit queryHive getsNamenode (NN)service ticketHive createsmap reduceusing NN STClient getsservice ticket forHiveBeelineClient 20. Typical Hadoop SecurityStrong Authentication + Cross-cutting AuthorizationPage 20 Hortonworks Inc. 2014 21. Apache Argus (aka HDP Security) CapabilitiesPage 21 Hortonworks Inc. 2014Hadoop and ArgusAuthenticationCross Platform Security Kerberos, Integration with ADGateway for REST APIs Knox for http, REST APIsRole Based AuthorizationsFine grained access control HDFS Folder, File,Hive Database, Table, Column, UDFsHBase Table, Column Family, ColumnWildcard Resource Names YesPermission Support HDFS Read, Write, ExecuteHive Select, Update, Create, Drop, Alter, Index, LockHbase Read, Write, Create 22. Authorization and AuditAuthorizationFine grain access control HDFS Folder, File Hive Database, Table, Column HBase Table, Column Family, ColumnAuditExtensive user access auditing inHDFS, Hive and HBase IP Address Resource type/ resource Timestamp Access granted or deniedPage 22 Hortonworks Inc. 2014Flexibilityin definingpoliciesControlaccess intosystem 23. Central Security AdministrationApache Argus Delivers a single pane of glass forthe security administrator Centralizes administration ofsecurity policy Ensures consistent coverage acrossthe entire Hadoop stackPage 23 Hortonworks Inc. 2014 24. Setup Authorization Policies24Page 24 Hortonworks Inc. 2014file levelaccesscontrol,flexibledefinitionControlpermissions 25. Monitor through Auditing25Page 25 Hortonworks Inc. 2014 26. Authorization and Auditing with ArgusHadoop distributedfile system (HDFS)Page 26 Hortonworks Inc. 2014Argus Administration PortalHBaseHive Server2Argus PolicyServerArgus AuditServerArgusAgentHadoop Components EnterpriseUsersArgusAgentArgusAgentLegacyToolsIntegration APIRDBMSHDFSKnoxFalconArgusAgent*ArgusAgent*ArgusAgent*StormYARN:DataOpera.ngSystem* - Future Integration 27. Simplified Workflow - HDFSUsers access HDFS datathrough application Name NodePage 27 Hortonworks Inc. 2014ArgusPolicyManagerArgus AgentAdmin sets policies for HDFSfiles/folderUserApplicationData scientist runs amap reduce jobIT users accessHDFS throughCLINamenode usesArgus Agent forAuthorizationAuditDatabase Audit logs pushed to DBNamenode providesresource access touser/client1222345 28. Simplified Workflow - Hive28Page 28 Hortonworks Inc. 2014Audit logs pushed to DBArgus AgentAdmin sets policies for Hive db/tables/columnsHive Server2HiveServer2provide dataaccess tousers1345IT users accessHive via beeline2 command toolHiveAuthorizes withArgus Agent2Users access Hive data usingJDBC/ODBCArgusPolicyManagerUserApplicationAuditDatabase 29. Simplified Workflow - HBase29Page 29 Hortonworks Inc. 2014AuditDatabase Audit logs pushed to DBArgusPolicyManagerArgus AgentAdmin sets policies for HBasetable/cf/columnUserApplicationData scientist runs amap reduce jobHbase ServerHBase serverprovide dataaccess to users12345IT users accessHbase viaHBShell2HBase Authorizeswith Argus Agent2Users access HBase datausing Java API 30. Typical Flow Add Authorization through ArgusPage 30 Hortonworks Inc. 2014HDFSHiveServer 2A B CKDCUse Hive ST,submit queryHive getsNamenode (NN)service ticketArgusHive createsmap reduceusing NN STClient getsservice ticket forHiveBeelineClient 31. Typical Hadoop SecurityStrong Authentication + Cross-cutting Authorization + PerimeterSecurityPage 31 Hortonworks Inc. 2014 32. What does Perimeter Security really mean?REST APIPage 32 Hortonworks Inc. 2014HadoopServicesGatewayREST APIFirewallUserFirewallrequired atperimeter(today)Knox Gatewaycontrols allHadoop RESTAPI accessthrough firewallHadoopclustermostlyunaffectedFirewall onlyallowsconnectionsthrough specificports from Knoxhost 33. Why Knox?Simplified Access Kerberos encapsulation Extends API reach Single access point Multi-cluster support Single SSL certificatePage 33 Hortonworks Inc. 2014Centralized Control Central REST API auditing Service-level authorization Alternative to SSH edge nodeEnterprise Integration LDAP integration Active Directory integration SSO integration Apache Shiro extensibility Custom extensibilityEnhanced Security Protect network details Partial SSL for non-SSL services WebApp vulnerability filter 34. Current Hadoop Client Model FileSystem and MapReduce Java APIs HDFS, Pig, Hive and Oozie clients (that wrap the Java APIs) Typical use of APIs is via Edge Node that is inside cluster Users SSH to Edge Node and execute API commands from shellPage 34 Hortonworks Inc. 2014Page 34SSH!User Edge Node Hadoop 35. Hadoop REST APIsService APIWebHDFS Supports HDFS user operations including reading files, writing to files,making directories, changing permissions and renaming. Learn more aboutWebHDFS.WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL Useful for connecting to Hadoop from the outside the cluster When more client language flexibility is required i.e. Java binding not an option Challenges Client must have knowledge of cluster topology Required to open ports (and in some cases, on every host) outside the clusterPage 35 Hortonworks Inc. 2014Page 35commands. Learn more about WebHCat.Hive Hive REST API operationsHBase HBase REST API operationsOozie Job submission and management, and Oozie administration. Learn moreabout Oozie. 36. Knox Deployment with Hadoop ClusterApplication TierDMZSwitchNNSNNPage 36 Hortonworks Inc. 2014LBSwitch Switch.Master NodesRack 1Switch SwitchDN DN.Slave NodesRack 2.Slave NodesRack NWeb TierKnoxHadoopCLIs 37. Hadoop REST API Security: Drill-DownPage 37 Hortonworks Inc. 2014Page 37RESTClientEnterpriseIdentityProviderLDAP/ADKnox GatewayGGWWFirewallFirewallDMZLBEdge Node/HadoopCLIs RPCHTTPHTTP HTTPLDAPHadoop Cluster 1MastersSlavesNNRMWebOozie HCatDN NMHBaseHS2Hadoop Cluster 2MastersSlavesNNRMWebOozie HCatDN NMHBaseHS2 38. OpenLDAP Configuration In sandbox.xml:main.ldapRealmorg.apache.shiro.realm.ldap.JndiLdapRealmmain.ldapRealm.userDnTemplateuid={0},ou=people,dc=hadoop,dc=apache,dc=orgmain.ldapRealm.contextFactory.urlldap://localhost:33389Page 38 Hortonworks Inc. 2014Page 38 39. Service level authorization Configuration In authorizationAclsAuthztruewebhdfs.acl.modeORwebhdfs.aclguest;*;*