HADOOP SECURITY FEATURES That make your risk officer happy By Anurag Shrivastava, ING Commercial Bank, Amsterdam @shri2201
Jul 17, 2015
HADOOP SECURITY FEATURES
That make your risk officer happy
By Anurag Shrivastava, ING Commercial Bank, Amsterdam
@shri2201
Security for Hadoop
Source: http://blogs.gartner.com/merv-adrian/2014/01/21/security-for-hadoop-dont-look-now/
Hadoop Security Features 2
Hadoop in Enterprise
Data Lake – an important information assets for enterprise
Data from System of
Records and Logs are stored
in Hadoop
Significant cost
savings for
Enterprise
Diverse types of
users
Picture Source: http://arunkottolli.blogspot.nl/2014/03/understanding-data-in-big-data.html
Hadoop Security Features 3
Operational Security in Enterprise• User Access Management
• Security Event Monitoring
• Application State Monitoring
• Security Testing
• Patch Management
• Data Protection
• Backup and restore
Hadoop Security Features 4
User Access Management
Requirements
Privileged, group and generic accounts
Separation of technical and business users
Separation of environments (DTAP)
Separation of admins and other users
Separation of users in different business roles
Application of four eyes principle when entering or
changing the data
Hadoop Security Features 5
Security Event Monitoring• Definition of application specific events
• All login attempts failed or successful
• Unauthorized attempt to access a table or file
• Operational performance of application
• Name node performance
• CPU, Disk
• Integration with Master Control Room
• Alerting the asset manager
Hadoop Security Features 6
Data Protection (1/2)• Confidentiality
• Protect information from unauthorized
disclosure
• Integrity
• Ensure the accuracy, completeness and
timeliness of information and prevent data
tempering
• Availability
• Ensure that information and service is
available when required
Picture Source:
http://www.attix5.co.uk/thought-
leadership/why-data-protection-software-
essential-good-nights-sleep
Hadoop Security Features 7
Data Protection (2/2)• Confidentiality
• Logon
• Access Control
• Malicious code protection
• Security Event Monitoring
• Encryption
• Integrity
• Message authentication code
• Data Lineage
Picture Source:
http://www.attix5.co.uk/thought-
leadership/why-data-protection-software-
essential-good-nights-sleep
Hadoop Security Features 8
Security under spotlight in Data Lake
• All kinds of enterprise data – structured,
semi-structured and unstructured
• Many groups of users – Data Scientists,
Analysts, Engineers, Marketers,
Managers
• Long term retention of data
• Different types of workloads
• Value of data grows as the data from
different sources are combined in Data
Lake
Picture source: http://beyondplm.com/2014/05/05/plm-downstream-usage-and-future-information-rivers/
Hadoop Security Features 9
Data Lake Risks
• Data Lake is an attractive target of inside and outside attackers
• Security compromise in Data Lake can have major or catastrophic
business impact
IT Risk assessment gives Hadoop implementation
the highest risk rating for Data Lake use case.
Hadoop Security Features 10
Lab Like Security is not Enough
Play Area Big Data Predictive Analytics
Lab
Production
System
Hadoop Security Features 11
Predictive Analytics Lab
Stepping Stone
(Citrix)
18 x Hadoop
Nodes
GIT, Libraries,
Build Tools
Monitoring
Services
Data Files in
Batches
Dedicated VLAN Shared ServicesShared Services
SMTP Relay
Internet via
Corporate
Infrastructure
Firewall Rules
Guard the
Perimeter
Security
Of Hadoop
Cluster
18 x Hadoop
Nodes
Lab like security works for a small group of people
Hadoop Security Features 12
Limitations of Hadoop• No “Data at Rest” Encryption
• A Kerberos-Centric Approach
• Limited Authorization Capabilities
• Complexity of the Security Model and Configuration
Unfortunately this is not sufficient for Data Lake that ingests all the
data and caters to thousands of users.
Hadoop Security Features 13
Hadoop Security
Hadoop Security Solutions from Major Vendors
Hortonworks acquires XASecure to
bring ACLs in Hadoop
Apache Ranger
Apache Knox
Apache Falcon
Cloudera is working on Project Rhino Project Rhino
Apache Sentry
Hadoop Security Features 14
HDP-Apache Ranger
Hadoop Security Features 15
Apache RangerApache Ranger currently supports authorization, auditing and security administration of limited
number of HDP components
Hive
HBase
StormKnox
HDFS
Hadoop Security Features 16
Apache Ranger Goals
1. Centralized security administration to manage all security related tasks in
a central UI or using REST APIs.
2. Fine grained authorization to do a specific action and/or operation with
Hadoop component/tool and managed through a central administration tool
3. Standardize authorization method across all Hadoop components.
4. Enhanced support for different authorization methods - Role based access
control, attribute based access control etc.
5. Centralize auditing of user access and administrative actions (security
related) within all the components of Hadoop.
Hadoop Security Features 17
Apache Knox and Hadoop Services
Hadoop Services
Covered
• WebHDFS (HDFS)
• Templeton
(HCatalog)
• Stargate (HBase)
• Oozie
• Hive/JDBC
Hadoop Security Features 18
Apache Falcon• Visualize Data Pipeline Lineage
• Track Data Pipeline audit logs
• End to End Monitoring of Data
Pipeline
• Policies for Data Replication and
Retention
Hadoop Security Features 19
Apache Sentry and Project Rhino
Hadoop Security Features 20
Goals of Project Rhino
• Provide encryption with hardware-enhanced performance
• Support enterprise-grade authentication and single sign-on for
Hadoop services
• Provide role-based access control in Hadoop with cell-level
granularity in HBase
• Ensure consistent auditing across essential Apache Hadoop
components
Hadoop Security Features 21
Apache Sentry and Project Rhino
Hadoop Security Features 22
Making Risk Officer Happy
• Hadoop security has
more to offer
• Role based access
• Audit logging
• Data encryption
• User Access Management
• Security Event Monitoring
• Application State Monitoring
• Security Testing
• Patch Management
• Data Protection
• Backup and restore
Overlapping efforts of vendors, Lack of complete coverage for all products,
Varying commitment to open source would slow down the adoption of Hadoop.
Hadoop Security Features 23
THANK YOUAnurag Shrivastava
@shri2201