Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Q&A box is available for your questions Webinar will be recorded for future viewing Thank you for joining! We’ll get started soon…
Jan 20, 2015
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Q&A box is available for your questions
Webinar will be recorded for future viewing
Thank you for joining!
We’ll get started soon…
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Evolving your data into strategic asset …using HDP and Red Hat JBoss Data Virtualization
We do Hadoop.
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Your speakers…
Raghu Thiagarajan, Director, Partner Product Management, Hortonworks
Kimberly Palko, Principal Product Manager, Red Hat
Kenny Peeples, Principal Technical Marketing Manager, Red Hat
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Clickstream Capture and analyze website visitors’ data trails and optimize your website
Sensors Discover patterns in data streaming automatically from remote sensors and machines
Server Logs Research logs to diagnose process failures and prevent security breaches
New types of data Hadoop Value:
Sentiment Understand how your customers feel about your brand and products – right now
Geographic Analyze location-based data to manage operations where they occur
Unstructured Understand patterns in files across millions of web pages, emails, and documents
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP 2.1: Enterprise Hadoop
HDP 2.1 Hortonworks Data Platform
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle & Governance
Falcon Sqoop Flume NFS
WebHDFS YARN : Data OperaFng System
DATA MANAGEMENT
SECURITY DATA ACCESS GOVERNANCE & INTEGRATION
AuthenFcaFon AuthorizaFon AccounFng
Data ProtecFon
Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox
OPERATIONS
Script Pig
Search
Solr
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Others
In-‐Memory AnalyNcs, ISV engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
Batch
Map Reduce
Deployment Choice Linux Windows On-Premise Cloud
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
OPERATIONAL TOOLS
DEV & DATA TOOLS
INFRASTRUCTURE
HDP is deeply integrated in the data center SO
UR
CES
EXISTING Systems
Clickstream Web &Social GeolocaFon Sensor & Machine
Server Logs Unstructured
DAT
A S
YSTE
M
RDBMS EDW MPP HANA
APPLICAT
IONS
BusinessObjects BI
HDP 2.1
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
YARN
• Enables millions of JBoss developers to quickly build applications with Hadoop
• Simplifies deployment of Hadoop on OpenStack
• Develops and deploys Apache Hadoop as integrated components of the open modern data architecture
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Modern Data Architecture + Red Hat Data Virtualization Extract and Refine
• Easily combine data from multiple sources without moving or copying data • Use any reporting or analytical tool
Application Database Server
AMBARI
MAPREDUCE
YARN
HDFS
REST
DATA REFINEMENT
HIVE PIG CUSTOM
HTTP
STREAM
LOAD
SQOOP
FLUME
WebHDFS
NFS
SOURCE DATA
- Sensor Logs - Clickstream - Flat Files - Unstructured - Sentiment - Customer - Inventory
DBs
JMS Queue’s
Files Files Files
INTERACTIVE
HIVE Server2
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hive creates Structure on Raw Sentiment Data
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hive: Interactive SQL-IN-Hadoop
Stinger Initiative – DELIVERED Next generation SQL based interactive query in Hadoop Speed
Improve Hive query performance has increased by 100X to allow for interactive query times (seconds)
Scale The only SQL interface to Hadoop designed for queries that scale from TB to PB
SQL Support broadest range of SQL semantics for analytic applications running against Hadoop
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
HDP 2.1
An Open Community at its finest: Apache Hive Contribution
1,672 Jira Tickets Closed
145 Developers
44 Companies
~330,000 Lines Of Code Added… (2.5x)
Apache YARN
Apache MapReduce
1 ° ° °
° ° ° °
° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
Apache Tez
Apache Hive SQL
Business AnalyFcs Custom Apps SFnger Project
SFnger Phase 1:
• Base OpNmizaNons • SQL Types • SQL AnalyNc FuncNons • ORCFile Modern File Format
SFnger Phase 2: • SQL Types • SQL AnalyNc FuncNons • Advanced OpNmizaNons • Performance Boosts via YARN
Delivered
SFnger Phase 3 • Hive on Apache Tez • Query Service (always on) • Buffer Cache • Cost Based OpNmizer (OpNq)
13 Months
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Federation: Regulatory requirements drive geo-specific clusters
Consume Compose Connect
DV Dashboard to analyze the aggregated data by User Role
JBoss Data Virtualization
Store
Web
Catalog
AMBARI
MAPREDUCE
YARN
HDFS REST
DATA REFINEMENT HIVE PIG CUSTOM
HTTP
STREAM
LOAD SQOOP FLUME
WebHDFS NFS
AMBARI
MAPREDUCE
YARN
HDFS REST
DATA REFINEMENT HIVE PIG CUSTOM
HTTP
STREAM
LOAD SQOOP FLUME
WebHDFS NFS
SOURCE 1: Hive/Hadoop in the HDP contains US Region Data
SOURCE 2: Hive/Hadoop in the HDP contains EU Region Data
INTERACTIVE
HIVE Server2
INTERACTIVE
HIVE Server2
Refine Explore
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Customer data with confidentiality requirements
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Red Hat JBoss Data Virtualization and Hortonworks HDP Kimberly Palko
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Engineering Collaboration Benefits Integration with JBoss Data Virtualization
Enable agile Big Data Hadoop integration with existing enterprise assets and maximize universal data utilization to enable self-service analytics
Integration with multiple Red Hat JBoss Middleware product family
Enables millions of JBoss developers to quickly build applications with Hadoop
Integration with Red Hat Storage Enables Hadoop to use Red Hat Storage secure resilient storage pool for data applications
Integration with Red Hat Enterprise Linux OpenStack Platform
Simplifies automated deployment of Hadoop on OpenStack
Integrated with Red Hat Enterprise Linux and OpenJDK
Develop and deploy Apache Hadoop as an integrated component for multiple deployment scenarios
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Data Supply and Integration Solution
Data Virtualization sits in front of multiple data sources and ü allows them to be treated a single source ü delivering the desired data
ü in the required form
ü at the right time
ü to any application and/or user. THINK VIRTUAL MACHINE FOR DATA
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Easy Access to Big Data
• Reporting tool accesses the data virtualization server via rich SQL dialect
• The data virtualization server translates rich SQL dialect to HiveQL
• Hive translates HiveQL to MapReduce
• MapReduce runs MR job on big data
MapReduce
HDFS
Hive
Analytical Reporting
Tool
Data Virtualization
Server
Hadoop
Big Data
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Different Users Different Views of Big Data
• Logical tables with different forms of aggregation
• Logical tables containing extra derived data
• Logical tables with filtered data • All reports/users share the same
specifications
MapReduce
HDFS
Hive
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Caching For Faster Performance – Virtual View
Cached View 1
View 1
Query 2 Query 1
Virtual Database (VDB)
• Same cached view for multiple queries
• Refreshed automatically or manually • Cache repository can be any
supported data source
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Caching for Faster Performance – Result Set
View 1
Query 1 Query 1
Virtual Database (VDB) Result Set Cache
• Results for a single query are cached after first execution
• Each unique query has its own cache
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Demonstration Combining sentiment data from Hadoop
with data from traditional relational sources
Kenny Peeples
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Use Case 1 - Overview
Objective: -Determine if sentiment data from the first week of the Iron Man 3 movie is a predictor of sales Problem: -Cannot utilize social data and sentiment analysis with sales management system Solution: -Leverage JBoss Data Virtualization to mashup Sentiment analysis data with ticket and merchandise sales data on MySQL into a single view of the data.
Consume Compose Connect
Excel Powerview and DV Dashboard to analyze the aggregated data
JBoss Data Virtualization
Hive
SOURCE 1: Hive/Hadoop contains twi[er data including
senFment
SOURCE 2: MySQL data that includes Fcket and merchandise sales
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Use Case 1 - Architecture
DATA
SYSTEM
TRADITIONAL REPOSITORIES
RDBMS EDW MPP
APPLICAT
IONS
Business AnalyFcs
Custom ApplicaFons
Packaged ApplicaFons
VIRTUAL DATA MART
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Use Case 1 - Resources
• GUIDE How to guide: https://github.com/DataVirtualizationByExample/HortonworksUseCase1 Tutorial: http://hortonworks.com/hadoop-tutorial/evolving-data-stratagic-asset-using-hdp-red-hat-jboss-data-virtualization/ • VIDEOS: http://vimeo.com/user16928011/hortonworksusecase1short http://vimeo.com/user16928011/hortonworksusecase2short • SOURCE: https://github.com/DataVirtualizationByExample/HortonworksUseCase1
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
JBoss Data Virtualization Security Kimberly Palko
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Role based access control
Roles • Define roles based on
organization hierarchy
Users • External authentication via
Kerberos, LDAP, etc.
VDB • Assign users and groups to a
virtual data base
Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Authentication
Kerberos From client to the virtual data base New in Data Virtualization 6.1: Kerberos authentication to the data source
Login Modules LDAP (MS Active Directory, OpenLDAP, etc.), any JAAS based security domain
REST and Web Services WS-UsernameToken HTTP Basic authentication
SAML SAML authentication for web client applications
Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Audit Logging via Dashboard
Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Row and Column Masking
- Row based masking Ex: keyed off geographic marker - Column masking to a constant, null, or a SQL statement Example: change all but the Last 4 digits in a credit card number to stars concat('****', substring(column, length(column)-4))
Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Summary of Security Capabilities • Authentication
– Kerberos, LDAP, WS-UsernameToken, HTTP Basic, SAML • Authorization
– Virtual data views, Role based access control • Administration
– Centralized management of VDB privileges • Audit
– Centralized audit logging and dashboard • Protection
– Row and column masking – SSL encryption (ODBC and JDBC)
Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Demonstration Geographically Distributed Hadoop Clusters with Data Virtualization -
Securing Data by User Role Kenny Peeples
Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Use Case 2 - Overview
Objective: -Secure data according to Role for row level security and Column Masking
Problem: -Cannot hide region data from region specific users
Solution: -Leverage JBoss Data Virtualization to provide Row Level Security and Masking of columns
Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Use Case 2 - Architecture
DATA
SYSTEM
APPLICAT
IONS
Business AnalyFcs
Custom ApplicaFons
Packaged ApplicaFons
VIRTUAL DATA MART
Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Use Case 2 - Resources
• GUIDE How to guide: https://github.com/DataVirtualizationByExample/HortonworksUseCase2 Tutorial: Available soon • VIDEOS: http://vimeo.com/user16928011/hortonworksusecase2short http://vimeo.com/user16928011/hortonworksusecase2short • SOURCE: https://github.com/DataVirtualizationByExample/HortonworksUseCase2
Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Benefits of JBoss Data Virtualization with Hortonworks HDP 2.1 • Combines new data in Hadoop with data in
traditional data sources without moving or copying data
• Gives access to a variety of BI and analytics tools
• Provides caching for faster access to data • Provides consistent security policy across
multiple data sources
Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Thank you! Hortonworks and Red Hat JBoss Data Virtualization
Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Next Steps...
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
More about Red Hat & Hortonworks http://hortonworks.com/partner/redhat
Contact us: [email protected]
Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Don’t Forget to Register for our Next Webinar!
September 17th, 10 AM PST Red Hat JBoss Data Virtualization and Hortonworks Data Platform
http://info.hortonworks.com/RedHatSeries_Hortonworks.html