1 Deploying enterprise grade security for Hadoop Brock Noland |So.ware Engineer, Cloudera February 27, 2014
Jan 27, 2015
1
Deploying enterprise grade security for Hadoop Brock Noland |So.ware Engineer, Cloudera February 27, 2014
Outline
• IntroducCon • Hadoop security primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
2
IntroducCon
Tonight's focus is SQL-‐on-‐Hadoop • Vast majority of Hadoop users use Hive or Cloudera Impala
• Data warehouse offload is the most common use case
• Data warehouse offload is a two step process 1. AutomaCc transformaCons moved to Hadoop 2. Data analysts given query access
3
Data warehouse use case
4
Online Database Data Warehouse Hadoop
Outline
• IntroducCon • Hadoop Security Primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
5
AuthenCcaCon
• AuthenCcaCon is who you are • Hadoop models
• Default -‐ “trusted network” • Strong -‐ Kerberos
6
Default AuthenCcaCon – trusted network
• Default security mechanism • Hadoop client uses local username • Used in
• POCs • Startups • Demos • Pre-‐prod environments
7
Default AuthenCcaCon – trusted network
8
Client Host Hadoop
$ whoami brock $ cat a.txt some data $ hadoop fs -‐put a.txt .
User: brock File: a.txt Contents: some data
Strong AuthenCcaCon – Kerberos
• Hadoop is secured with Kerberos • Provides mutual authenCcaCon • Protects against eavesdropping and replay a^acks
• Every user and service has a Kerberos “principal” • Service: impala/[email protected] • User: [email protected]
• CredenCals • Service: keytabs • User: password
9
Strong AuthenCcaCon – Kerberos
10
Client Host Hadoop
$ whoami brock $ kinit Password: ******* $ cat a.txt some data $ hadoop fs -‐put a.txt .
<kerberos Ccket> <encrypted data> *
* RPC EncrypCon must be enabled
Strong AuthenCcaCon – Kerberos
• Keytab • Encrypted key for servers (similar to a “password”) • Generated by server such as MIT Kerberos or AcCve Directory
11
Strong AuthenCcaCon – Kerberos
• ImpersonaCon • Services such as Hive Server2 impersonate users • Data loaded by “joe” via HS2 is owned by “joe” • Oozie jobs submi^ed by “brock” are run as “brock”
12
Hive Server 2 and Oozie
13
Hadoop
Hive Server 2 (HS2) Oozie
Beeline (Hive CLI) Tableau JDBC Oozie CLI Control-‐M
AuthorizaCon
• HDFS permissions • Unix style • Read/Write/Execute for Owner/Group/Other • Coarse grained
• Other Hadoop components have authorizaCon • MapReduce who can use which job queues • HBase table ACL’s
14
$ hadoop fs -ls file -rw-r----- 1 analyst1 analysts 2244 2014-01-19 12:15 file
• Permissions
• Unix style permissions • Read/Write/Execute • Owner/Group/Other
• Owner • One and only one owner
• Group • One and only one group
HDFS Permisssions
Back to our use case
• Scenario facts • ETL offload is a success • Data warehouse is expensive and at capacity • Same data is in Hadoop
• Next step • End users start using Hadoop to augment the DW • Security becomes primary concern
16
End users need to share data
• Unlike automated ETL jobs, end users want to share data with peers
• Must manage HDFS permissions manually • Each file has a single group • End result is users set permissions to world readable/writeable
17
Outline
• IntroducCon • Hadoop Security Primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
18
Hive: Security holes
CREATE TEMPORARY FUNCTION custom_udf AS ’com.mycompany. MaliciousClass’; SELECT TRANSFORM(stuff) USING 'malicious-script.pl' AS thing1, thing; CREATE EXTERNAL TABLE external_table(column1 string) LOCATION ‘/path/to/any/table’;
19
Hive: Security holes
CREATE TABLE test (c1 string) ROW FORMAT SERDE 'com.mycompany.MaliciousClass'; FROM ( FROM t1 MAP t1.c1 USING 'malicious-script1.pl' CLUSTER BY key) map_output INSERT OVERWRITE TABLE t2 REDUCE t2.c1 USING 'malicious-script2.pl' AS c2;
20
Default: AuthorizaCon
• Hive ships with an “advisory” authorizaCon system • All users see all databases/tables/columns • Does not fix any security holes • Users grant themselves permissions
21
Outline
• IntroducCon • Hadoop Security Primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
22
Kerberos with impersonaCon: Sharing data
The user “manager1” wants to share the table “manager1_table” with senior analysts but not junior analysts. # hadoop fs -ls -R /user/hive/warehouse drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxr-x--T - manager1 manager1 0 manager1_table
23
Kerberos with impersonaCon: Sharing data
IT must create a group # groupadd senioranalysts
Then add the appropriate members to group # usermod -G analyst,senioranalysts analyst1 # usermod -G management,analyst,senioranalysts manager1
24
Kerberos with impersonaCon: Sharing data
Then “manager1” can manually change the file permissions $ hadoop fs -chgrp -R senioranalysts …/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxr-x--T - manager1 senioranalysts 0 manager1_table
25
Kerberos with impersonaCon: Sharing data
Now any senior-‐level analyst can query the data $ whoami analyst1 $ beeline ... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> ⏎ select count(*) from manager1_table; +------------+ | count(*) | +------------+ | 47 | +------------+
26
Kerberos with impersonaCon: Sharing data
Junior analysts cannot query the data: $ whoami jranalyst1 $ beeline .... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> ⏎ select * from manager1_table; Error: java.io.IOException: org.apache.hadoop.security.AccessControlException: Permission denied: user=jranalyst1, access=READ_EXECUTE, inode="/user/hive/warehouse/manager1_table":manager1:senioranalysts:drwxr-x--T
27
Kerberos with impersonaCon: Sharing data
What happens in the real world?
28
Kerberos with impersonaCon: Sharing data
Table “manager1_table” is owned by user/group “manager1” $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxr-x--T - manager1 manager1 0 manager1_table
29
Kerberos with impersonaCon: Sharing data
User “manager1” makes “manager1_table” world readable/writable $ hadoop fs -chmod -R 777 /user/hive/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxrwxrwt - manager1 manager1 0 manager1_table
30
Kerberos with impersonaCon: Summary
• Securing Hive with Kerberos and impersonaCon makes Hive unusable for DW offload • Manual file permission management • End state is world writable/readable • No ability to restrict access to columns or rows • All users see all databases/tables/columns
31
Outline
• IntroducCon • Hadoop Security Primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
32
Fine Grained Security: Apache Sentry
33
Unlocks Key RBAC Requirements Secure, fine-‐grained, role-‐based authorizaCon MulC-‐tenant administraCon
Open Source Apache Incubator project
Ecosystem Support Apache SOLR, HiveServer2, & Impala 1.1+
AuthorizaRon module for Hive, Search, & Impala
Key Benefits of Sentry
34
Store SensiCve Data in Hadoop
Extend Hadoop to More Users
Comply with RegulaCons
Key CapabiliCes of Sentry
35
Fine-‐Grained AuthorizaCon Specify security for SERVERS, DATABASES, TABLES & VIEWS
Role-‐Based AuthorizaCon SELECT privilege on views & tables INSERT privilege on tables ALL privilege on the server, databases, tables & views ALL privilege is needed to create/modify schema
MulC-‐Tenant AdministraCon Separate policies for each database/schema Can be maintained by separate admins
Sentry Architecture
36
Binding Layer
Impala
Impala Hive
Policy Engine
Policy Provider
File Database
HiveServer2
Authoriza5on Provider
Local FS/HDFS
Search
SOLR
Pig …
Query MR
SQL
Query ExecuCon Flow
37
Parse
Build
Check
Plan
Sentry
Validate SQL grammar
Construct statement tree
Validate statement objects • First check: AuthorizaCon
Forward to execuCon planner
Outline
• IntroducCon • Hadoop Security Primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
38
Click to edit Master Ctle style
39