Top Banner
1 Apache Sentry: Enterprise-grade Security for Hadoop Xuefu Zhang, Srayva Tirukkovalur | Cloudera April 16, 2014
31

April 2014 HUG : Apache Sentry

Aug 11, 2014

Download

Data & Analytics

April 2014 HUG : Apache Sentry
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: April 2014 HUG : Apache Sentry

1

Apache Sentry: Enterprise-grade Security for HadoopXuefu Zhang, Srayva Tirukkovalur | ClouderaApril 16, 2014

Page 2: April 2014 HUG : Apache Sentry

Outline• Introduction• Hadoop security primer

• Authentication• Authorization• Data Protection• Governance and Auditing

• Introducing Apache Sentry• What's Sentry• Sentry Architecture• Sentry Internal

• Future Work• Demo• Q&A

2

Page 3: April 2014 HUG : Apache Sentry

Introduction

● Hadoop gets bigger ...● Hadoop has been enjoying an increasing adoption rate● More and more data on Hadoop Cluster● More and more access to the data● Data warehouse offload is the most common use case● Apache Hive, Apache Drill, Cloudera Impala● SQL on Hadoop is phenomenon

3

Page 4: April 2014 HUG : Apache Sentry

Introduction (cont'd)

● But more encumbrance ...● Enterprises wants to protect sensitive data● Government regulations, compliance, like HIPPA, PII,

FISMA● Existing security problems with Hadoop has hindered

the adoption● Security has become the top priority

4

Page 5: April 2014 HUG : Apache Sentry

Introduction (cont'd)

● Reality is ...● Different components, different security mechanisms● Multiple components may access the same data set● Hadoop was born out of trust, not security● Thinking of Windows

5

Page 6: April 2014 HUG : Apache Sentry

Outline• Introduction• Hadoop security primer

• Authentication• Authorization• Data Protection• Governance and Auditing

• Introducing Apache Sentry• What's Sentry• Sentry Architecture• Sentry Internal

• Future work• Demo• Q&A

6

Page 7: April 2014 HUG : Apache Sentry

Hadoop Security Primer• Authentication

● Identify who you are● Untrusted users has no access to the cluster network● In a trusted network, every one is good citizen● Who you are is determined by client host

7

Page 8: April 2014 HUG : Apache Sentry

Hadoop Security Primer• Strong Authentication

● Kerberos● LDAP, ActiveDirectory● LDAP, AD integrated with Kerberos, establishing a

single point of truth● Single Sign On

8

Page 9: April 2014 HUG : Apache Sentry

Hadoop Security Primer (cont'd)• Kerberos

● Strong authentication● Provides mutual authentication● Protects against eavesdropping and replay attacks● Every user and service has a Kerberos “principal”● Credentials: keytabs (service), password (user)

9

Page 10: April 2014 HUG : Apache Sentry

Hadoop Security Primer (cont'd)

• Authorization● Determine if you can access● HDFS Posix style permission R/W/X for U/G/O, coarse-

grained● Other components have authorization

● MR job queue● HBase ACLs on table and column family.● Accumulo provides cell-level access control

● Impersonation

10

Page 11: April 2014 HUG : Apache Sentry

Hadoop Security Primer (cont'd)

• Data Protection● Data at rest and in transit● Hadoop provides encryption on data in transit: DTP,

HTTP, RPC, JDBC/ODBC● Hadoop has no native encryption on data at rest (HDFS-

6134)● Relying on OS-level encryption

11

Page 12: April 2014 HUG : Apache Sentry

Hadoop Security Primer (cont'd)

• Governance and auditing● Again, component to component● DFS and MapReduce provide base audit support● Apache Hive metastore records audit (who/when)

information for Hive interactions. ● Apache Oozie provides audit trail for services

12

Page 13: April 2014 HUG : Apache Sentry

Outline• Introduction• Hadoop security primer

• Authentication• Authorization• Data Protection• Governance and Auditing

• Introducing Apache Sentry• What's Sentry• Sentry Architecture• Sentry Internal

• Future work• Demo• Q&A

13

Page 14: April 2014 HUG : Apache Sentry

Introducing Apache Sentry14

● Hadoop Authorization● Existing authorization is fragmented, coarse-grained,

and manual● A lot of times data is just unprotected for simplicity● Enterprises need a centralized authorization component

that work across components with ease of use, fine-grained, role based

Page 15: April 2014 HUG : Apache Sentry

Introducing Apache Sentry (cont'd)15

● What's Sentry● Sentry is an authorization module for Hive, Search,

Impala, and beyond● It unlocks Key RBAC Requirements: secure, fine-

grained, role-based authorization, multi-tenant administration

● Open Source, Apache Incubator project● Ecosystem Support: Apache SOLR, HiveServer2, &

Impala 1.1+

Page 16: April 2014 HUG : Apache Sentry

Introducing Apache Sentry (cont'd)16

● Key Benefits● Store Sensitive Data in Hadoop● Extend Hadoop to More Users● Comply with Regulations

Page 17: April 2014 HUG : Apache Sentry

Introducing Apache Sentry (cont'd)17

● Key Capabilities● Fine-Grained: SERVERS, DATABASES, TABLES &

VIEWS; INDEXES, COLLECTIONS● Role-Based: role including privileges such as SELECT,

INSERT, ALL; UPDATE, QUERY● Multi-Tenant administration

● Separate policies for each database/schema● Can be maintained by separate admins

Page 18: April 2014 HUG : Apache Sentry

Introducing Apache Sentry (cont'd)18

BindingLayer

Impala

Impala Hive

Policy Engine

Policy Provider

File Database

HiveServer2

AuthorizationProvider

Local FS/HDFS

Search

SOLR

Pig …

Sentry Architecture

Page 19: April 2014 HUG : Apache Sentry

Introducing Apache Sentry (cont'd)19

QueryMR

SQL

Parse

Build

Check

Plan

Sentry

Validate SQL grammar

Construct statement tree

Validate statement objects• First check: Authorization

Forward to execution planner

Page 20: April 2014 HUG : Apache Sentry

Introducing Apache Sentry (cont'd)

• Actors● User● User group membership● Resources● Privilege● Role

20

Page 21: April 2014 HUG : Apache Sentry

Introducing Apache Sentry (cont'd)

• User● User authenticated● User identity obtained from session context

21

Page 22: April 2014 HUG : Apache Sentry

Introducing Apache Sentry (cont'd)

• User group membership● Defined outside sentry policy● Obtained from user directory (LDAP, AD, HDFS)● Maybe available from session context

22

Page 23: April 2014 HUG : Apache Sentry

Introducing Apache Sentry (cont'd)

• Resources● Data to be protected● File or directory on HDFS● Table or views in Hive● URI● Resource can be hierarchical

23

Page 24: April 2014 HUG : Apache Sentry

Introducing Apache Sentry (cont'd)

• Privilege● Action or operation associated with a resource● Exists in a role only● SELECT on a given TABLE or VIEW● CREATE a TABLE or VIEW● QUERY on a search COLLECTION● DELETE a FILE or DIRECTORY● Example

collection=customerCol->action=query

24

Page 25: April 2014 HUG : Apache Sentry

Introducing Apache Sentry (cont'd)

• Roles● A collection of privileges● Defined in Sentry policy● Example

[roles]ana_query_role = collection=sentryColl->action=queryana_update_role = collection=sentryColl->action=updatetest_role = collection=testColl->action=updatefull_admin_role = collection=*

25

Page 26: April 2014 HUG : Apache Sentry

Introducing Apache Sentry (cont'd)

• (Group, Role) mapping● Defined in policy● One-to-Many● Example

[groups]analyts = ana_query_role, ana_update_roleadmins = full_admin_roletestgroup = test_rolehbase = full_admin_role

26

Page 27: April 2014 HUG : Apache Sentry

Introducing Apache Sentry (cont'd)

• Rule evaluation● Who's the user?● Which group(s) does the user belong to?● What resource to be accessed?● How the resource is accessed (READ, SELECT, etc.)?● Does any of the user's groups have a role, which has

the right privilege?● Yes – great! Go head!● No – sorry! No sufficient privilege!

27

Page 28: April 2014 HUG : Apache Sentry

Outline• Introduction• Hadoop security primer

• Authentication• Authorization• Data Protection• Governance and Auditing

• Introducing Apache Sentry• What's Sentry• Sentry Architecture• Sentry Internal

• Future work• Demo• Q&A

28

Page 29: April 2014 HUG : Apache Sentry

Future Work29

● Introduce Sentry to more Hadoop components for their authorization needs

● Centralized policy store aiming for the whole enterprise● Grant/Revoke● Centralized authorization service for all protected

resources including metadata

We appreciate your contribution and support

Page 30: April 2014 HUG : Apache Sentry

Outline• Introduction• Hadoop security primer

• Authentication• Authorization• Data Protection• Governance and Auditing

• Introducing Apache Sentry• What's Sentry• Sentry Architecture• Sentry Internal

• Future work• Demo• Q&A

30

Page 31: April 2014 HUG : Apache Sentry

Click to edit Master title style31