Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hortonworks: Hadoop for the Enterprise We Do Hadoop
Jul 15, 2015
Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks: Hadoop for the Enterprise We Do Hadoop
Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP
Customer Momentum
• 330+ customers (as of end of 2014)
Hortonworks Data Platform • Completely open multi-tenant platform for any app & any data. • A centralized architecture of consistent enterprise services for
resource management, security, operations, and governance.
Partner for Customer Success • Open source community leadership focus on enterprise needs • Unrivaled world class support
• Founded in 2011 • Original 24 architects, developers,
operators of Hadoop from Yahoo! • 600+ Employees • 1000+ Ecosystem Partners
Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Traditional systems under pressure Challenges • Constrains data to app • Can’t manage new data • Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012 2.8 Zettabytes
2020 40 Zettabytes
LAGGARDS
INDUSTRY LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop emerged as foundation of new data architecture
Apache Hadoop is an open source data platform for managing large volumes of high velocity and variety of data • Built by Yahoo! to be the heartbeat of its ad & search business
• Donated to Apache Software Foundation in 2005 with rapid adoption by large web properties & early adopter enterprises
• Incredibly disruptive to current platform economics
Traditional Hadoop Advantages ü Manages new data paradigm ü Handles data at scale ü Cost effective ü Open source
Traditional Hadoop Had Limitations " Batch-only architecture " Single purpose clusters, specific data sets " Difficult to integrate with existing investments " Not enterprise-grade
Application
Storage HDFS
Batch Processing MapReduce
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Security in HDP Making Hadoop Enterprise Ready
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop exacerbates the security challenge
New Security Requirements • Provide consistent and granular
access control to data for each application on top of Hadoop
• Enable complete & comprehensive definition and application of policy across all the different access types
• Must retain privacy and security despite ability to infer knowledge from co-existing & unstructured data
AN
ALY
TIC
S
Data Marts
Business Analytics
Visualization & Dashboards
AN
ALY
TIC
S
Applications Business Analytics
Visualization & Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS (Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-Time Batch Partner ISV Batch Batch MPP EDW
Clickstream Web & Social
Geoloca7on Sensor & Machine
Server Logs
Unstructured
SOU
RC
ES
Existing Systems
ERP CRM SCM
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP Security: comprehensive, complete and simple
Security in HDP is comprehensive and complete for Hadoop
Administration Central management & consistent security
Authentication Authenticate users and systems
Authorization Provision access to data
Audit Maintain a record of data access
Data Protection Protect data at rest and in motion
• HDP ensures comprehensive enforcement of security policy across the entire Hadoop stack
• HDP provides functionality across the complete set of security requirements
• HDP is the only solution to provide a single simple interface for security policy definition and maintenance
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP Security: comprehensive, complete and simple In order to protect any data system you must implement the following
Administration Central management & consistent security
Only HDP delivers a single administrative console to set policy across the entire cluster Apache Ranger
Authentication Authenticate users and systems
Integrate with existing AD and LDAP authentication for perimeter and project access
Apache Knox, Native Kerberos
Authorization Provision access to data
Work within all Apache projects to provide consistent authorization controls Apache Ranger
Audit Maintain a record of data access
Maintain a record of events across all components that is consistent and accessible Apache Ranger
Data Protection Protect data at rest and in motion
Wire and storage encryption in Hadoop. Refer partner encryption solutions for more advanced needs
HDFS, Partner Encryption
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
“Hortonworks loves and lives open source innovation” World Class Support and Services. Hortonworks' Customer Support received a maximum score and was significantly higher in rating compared to other vendors
A Leader in Hadoop
The Forrester Wave™ Big Data Hadoop Solutions Q1 2014
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Security Voltage HP SecureData for Hadoop
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
A History of Excellence • HP Security Voltage : Founded in 2002 out of Stanford University,
based in Cupertino, California.
• Acquired by HP : February 2015
• Mission: To protect the world’s sensitive data
• By: Providing encryption and tokenization solutions that protect data wherever it is used or stored
• Market Leadership:
– PCI solutions are used by six of the top eight U.S. payment processors
– Provide the world’s most pervasive email encryption solutions
– Contribute technology to multiple standards organizations
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Discussion Topics Today • Traditional techniques are insufficient to protect sensitive data in Hadoop from new, advanced
threats − Data-at-rest protection does not secure data in analytics, in motion
− Leaving major compliance and exploitable security gaps
• A data-centric security strategy, complementary to Hadoop security options: − Enables data to be protected from advanced threats – always-on protection of data wherever its stored,
used or moved
− Enables data de-identification in test, development, and analytics
− Enables Hadoop deployment without compliance and insider risks
− Can cut compliance costs by as much as 90%
• Data-centric security is the new standard adopted by leaders in banking, insurance, retail, healthcare, and related sectors
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Why is Securing Hadoop Difficult?
• Multiple sources of data from multiple enterprise systems, and real-time feeds with varying (or unknown) protection requirements
• Rapid innovation in a well-funded open-source developer community
• Multiple types of data combined together in the Hadoop “data lake”
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Why is Securing Hadoop Difficult?
• Automatic replication of data across multiple nodes once entered into the HDFS data store
• Access by many different users with varying analytic needs
• Reduced control if Hadoop clusters are deployed in a cloud environment
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Existing Ways to Secure Hadoop • Existing IT security: − Network firewalls
− Logging and monitoring
− Configuration management
• Enterprise-scale security for Apache Hadoop − Apache Knox: Perimeter security
− Kerberos: Strong authentication
− Apache Ranger: Monitoring and Management
Need to augment these with “data-centric” protection of data in use, in motion and at rest
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Introducing: “Data-Centric” Security
Storage
File Systems
Databases
Data & Applications
Traditional IT Infrastructure Security
Disk encryption
Database Encryption
SSL/TLS/Firewalls
Security Gap
Security Gap
Security Gap
Security Gap
SSL/TLS/Firewalls
Authentication Management
Middleware
Threats to Data
Malware, Insiders
SQL Injection, Malware
Traffic Interceptors
Malware, Insiders
Credential Compromise
Data Ecosystem
Dat
a Se
curit
y C
over
age
Security Gaps
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Security Voltage Provides This Protection
Storage
File Systems
Databases
Data & Applications
Traditional IT Infrastructure Security
Disk encryption
Database Encryption
SSL/TLS/Firewalls
Security Gap
Security Gap
Security Gap
Security Gap
SSL/TLS/Firewalls
Authentication Management
Middleware
Threats to Data
Malware, Insiders
SQL Injection, Malware
Traffic Interceptors
Malware, Insiders
Credential Compromise
Data Ecosystem
Dat
a Se
curit
y C
over
age
Security Gaps
HP Security Voltage Data-centric Security
End-
to-e
nd
D
ata
Prot
ectio
n
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Format-Preserving Encryption (FPE)
AES
FPE 345-753-5772
8juYE%Uks&dDFa2345^WFLERG
First Name: Gunther Last Name: Robertson SSN: 934-72-2356 DOB: 20-07-1966
First Name: Uywjlqo Last Name: Muwruwwbp SSN: 253-67-2356 DOB: 18-06-1972
Ija&3k24kQotugDF2390^32 0OWioNu2(*872weW Oiuqwriuweuwr%oIUOw1@
Tax ID
934-72-2356
• Supports data of any format: name, address, dates, numbers, etc.
• Preserves referential integrity
• Only applications that need the original value need change
• Used for production protection and data masking
• Currently in the NIST standardization process
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Secure Stateless Tokenization (SST)
Credit Card 934-72-2356
Tax ID 1234 5678 8765 4321
Partial SST
SST 347-982-8309
Obvious SST
8736 5533 4678 9453
1234 5633 4678 4321
1234 56AZ UYTZ 4321
347-982-2356
AZS-UXD-2356
• Tokenization for PCI scope reduction • Replaces token database with a smaller token mapping table
• Token values mapped using random numbers • Numerous advantages over traditional tokenization:
− No database hardware, software, replication problems, etc.
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Data Protection with HP FPE (AES FFX) and HP SST
FPE
FPE
FPE
FPE
SST*
Name SS# Credit Card # Street Address Customer IDJames&Potter 385.12.1199 37123&456789&01001& 1279&Farland&Avenue G8199143Ryan&Johnson 857.64.4190 5587&0806&2212&0139 111&Grant&Street S3626248Carrie&Young 761.58.6733 5348&9261&0695&2829 4513&Cambridge&Court B0191348Brent&Warner 604.41.6687 4929&4358&7398&4379 1984&Middleville&Road G8888767Anna&Berman 416.03.4226 4556&2525&1285&1830 2893&Hamilton&Drive S9298273
Name SS# Credit,Card,# Street,Address Customer,IDKwfdv&Cqvzgk 161.82.1292 37123&48BTIR&51001 2890&Ykzbpoi&Clpppn S7202483Veks&Iounrfo 200.79.7127 5587&08MG&KYUP&0139 406&Cmxto&Osfalu B0928254Pdnme&Wntob 095.52.8683 5348&92VK&DEPD&2829 1498&Zejojtbbx&Pqkag G7265029Eskfw&Gzhqlv 178.17.8353 4929&43KF&PPED&4379 8261&Saicbmeayqw&Yotv G3951257Jsfk&Tbluhm 525.25.2125 4556&25ZX&LKRT&1830 8412&Wbbhalhs&Ueyzg B6625294
• Enables large amounts of sensitive data to be “de-identified” in Hadoop
• Majority of analysis, MapReduce jobs, etc. can occur on de-identified data
• Reduces insider threats and improves compliance
• Enables developers to test without exposure
• Enables Hadoop and cloud adoption
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP SecureData • HP Stateless Key Management − No key database to store or manage
− High performance, unlimited scalability • Both encryption & tokenization technologies
− Customize solution to meet your exact requirements
• Broad Platform Support − On-premise / cloud / Big Data
− Structured / Unstructured − Linux, Hadoop, Windows, AWS, IBM z/OS, HP NonStop,
Teradata, etc.
• Quick time-to-value − Complete end-to-end protection within a common platform
− Format-preservation dramatically reduces implementation effort
HP SecureData Key Servers
HP SecureData Central Management Console
HP SecureData Web Services API
HP SecureData Command Line and Automated
Parsers
HP SecureData Native APIs
(C, Java, C#, .NET)
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Options for Securing Data in Hadoop with HP Security Voltage
Applications, Analytics & Data
Applications, Analytics & Data
Applications & Data
Applications & Data
Applications & Data
Hadoop Cluster
Hadoop Jobs
ETL & Batch
BI Tools & Downstream Applications
Hadoop Jobs & Analytics
Hadoop Jobs & Analytics
Egress Zone
Application with HP Security Voltage Interface Point
Unprotected Data De-Identified Data
Legend: Standard Application
HP Security Voltage
HDFS
Storage Encryption
HP Security Voltage
HP Security Voltage
2
1
6
4
5
7
ETL & Batch
Landing Zone
HP Security Voltage
HP Security Voltage
HP Security Voltage
HP Security Voltage
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Securing Data During Ingestion
Applications & Data
Source Data & Applications
Applications & Data
Applications & Data
Hadoop Cluster
Hadoop Jobs
HP Security Voltage
HDFS
Storage Encryption
HP Security Voltage
HP Security Voltage
2
1
ETL & Batch
Landing Zone
HP Security Voltage
HP Security Voltage
• Data protection upon import – Outside Hadoop with standard tools
• Data protection using Sqoop – Unique HP Security Voltage integration
• Data protection using MapReduce & other tools
• Data protected at the source
Application with HP Security Voltage Interface Point
Unprotected Data De-Identified Data
Legend: Standard Application
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Using Data for Analytics, Applications and Export
Applications, Analytics & Data
Applications, Analytics & Data
Hadoop Cluster
ETL & Batch
BI Tools & Downstream Applications
Hadoop Jobs & Analytics
Hadoop Jobs & Analytics
Egress Zone
HP Security Voltage
HDFS
Storage Encryption
HP Security Voltage
6
4
5
7
HP Security Voltage
HP Security Voltage
Application with HP Security Voltage Interface Point
Unprotected Data De-Identified Data
Legend: Standard Application
• Decrypt/de-tokenize data within Hadoop analytics and programs (Hive, MapReduce and other tools) – Can export data as needed
• Decrypt/de-tokenize data outside Hadoop for additional post-processing – Using standard tools
• Using de-identified data with Hadoop analytics and programs
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Hadoop Cluster
HDFS
Storage Encryption
HP Security Voltage
Storage-Level Encryption
• Uses open source “dm-crypt” program included with Linux
• Big advantage: HP Stateless Key Management
• Use case: General protection for all data in Hadoop
• Physical theft/loss of storage
• “Data-at-rest” protection only
Application with HP Security Voltage Interface Point
Unprotected Data De-Identified Data
Legend: Standard Application
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Use Case 1: Global Telecommunications Company
• Analyze several hundred million customer records for analytic patterns, retail optimization, business intelligence
• Records contain personal customer data, log data, activity data, location information, buying information etc.
• 17 fields are deemed to be sensitive
• Deployed a 500 node Hadoop cluster; moving into the thousands
• Typically ingest 300 million customer records in > 1.5 minutes. SLAs should not be significantly affected
Need
• Integrated HP SecureData into MapReduce jobs that ingest data
• Sensitive data in 17 fields is protected using HP Format-Preserving Encryption
• Almost all analysis is performed on protected data
• HP Security Voltage tools integrate into Hive and MapReduce if results are to be re-identified
• HP Security Voltage added 90 seconds to the ingestion process
• Data that is protected by HP Security Voltage tools at source (z/OS, Teradata, Oracle, etc.) can directly flow into Hadoop
Solution
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Solution
Use Case 2: Health Care Insurance Company
• Better health analysis to customers: One of their use cases for Hadoop is to provide better analysis of health status to customers on their web site
• Catch prescription fraud: Fraudsters collect prescriptions from 5-6 doctors and get them filled by 5-6 pharmacies. The manual process takes several weeks to track. Hadoop will enable them to do this almost instantly
• Reverse claim overpayment: Often times claims are overpaid based on errors and mistakes. They hope to catch this as it happens with Hadoop
• Developer hackathons: Open the system up to their Hadoop developers as a sandbox, enabling innovation, discovery and competitive advantage – without risk
Need
• Utilized the massive un-tapped data sets for analysis that were hampered by compliance and risk
• Integrated HP SecureData in Sqoop so data is de-identified as it is copied from databases
• Ability to initially scale to 1000 Hadoop nodes
• Currently investigating the use of HP SecureData enterprise wide for open systems and mainframe platforms
• Enabling innovation through data access without risk with HIPAA/HITECH regulated data sets
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Solution
Use Case 3: Global Financial Services Company
• Customer is rapidly moving to adopt open source storage and data analysis platforms
• Use cases: Fraud detection, marketing (360 degree view of what the customer is doing, to provide more relevant marketing), creating data sets or reports to sell or provide to other companies, financial modeling
• Invested in multiple data warehouse and big data platforms
• Using complex ETL tools to import data into Hadoop from sources including mainframe, distributed databases, flat files, etc.
• Protection in Hadoop is the first step in an enterprise wide data protection strategy
Need
• Protect sensitive PCI and PII data as it is being imported into Hadoop. Fields protected include PAN, Bank Account, SSN, Address, City, Zip Code, Date of birth
• HP Secure Stateless Tokenization (SST) offers PCI audit scope reduction for the Hadoop environment
• Central key and policy management infrastructure can scale enterprise wide to mainframe and distributed platforms
• Data can be protected at ingestion through integration with Sqoop and MapReduce
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Conclusion • Multi-platform enterprises adopting a data lake architecture need a cross-
platform solution for protection of sensitive data • Big data partners bring comprehensive security within Hadoop, with core
capabilities for authentication, authorization and auditing • HP Security Voltage brings the data-centric security across data stores
including Hadoop—protecting data at rest, in use and in motion, and maintaining the value of the data for analytics
• Together enabling comprehensive security for the enterprise, and rapid and successful Hadoop adoption!