Top Banner
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Using HP Vertica and Apache Hadoop for customer analytics We do Hadoop.
25

Hortonworks and HP Vertica Webinar

Jan 20, 2015

Download

Software

Hortonworks

Learn how when an organizations combine HP and Vertica Analytics Platform and Hortonworks, they can quickly explore and analyze broad variety of data types to transform to actionable information that allows them to better understand how their customers and site visitors interact with their business, offline and online.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hortonworks and HP Vertica Webinar

Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Using HP Vertica and Apache Hadoop …for customer analytics

We do Hadoop.

Page 2: Hortonworks and HP Vertica Webinar

Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Your speakers…

John Kreisa, VP Strategic Alliance Marketing Hortonworks

Chris Selland, VP Business Development HP Software, Big Data Group

Page 3: Hortonworks and HP Vertica Webinar

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Poll

Where are you in your Hadoop journey? •  Researching our options •  Currently evaluating some software •  Deep in a trial •  What’s Hadoop?

Page 4: Hortonworks and HP Vertica Webinar

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Big Data Market Trends & Projections

Big Data Explosion

% by which org’s leveraging modern info management systems outperform peers by 2015

ñ Hadoop enabled DBMS’s

85% from new data types

50x data growth 2010 to

2020

1 Zettabyte (ZB) =

1 Billion TBs

15x

growth rate of machine generated

data by 2020

The US has 1/3 of the world’s data

Big Data is 1 of 5 US GDP Game Changers $325 billion incremental annual GDP from big data analytics in retail and manufacturing by

2020

Page 5: Hortonworks and HP Vertica Webinar

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Cameras and microphones widely deployed

New routes to market via intelligent objects

Content and services via connected products

Everything has a URL

Remote sensing of objects and environment

Augmented reality

Situational decision support

Building and infrastructure management

Over 50% of Internet connections are things: 2011: 15+ billion permanent, 50+ billion intermittent 2020: 30+ billion permanent, >200 billion intermittent

Source: Gartner Keynote at Hadoop Summit 2013

Page 6: Hortonworks and HP Vertica Webinar

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

A Data Architecture Under Pressure From New Data AP

PLICAT

IONS  

DATA

   SYSTEM  

REPOSITORIES  

SOURC

ES  

Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

RDBMS   EDW   MPP  

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

Source: IDC

2.8  ZB  in  2012  

85%  from  New  Data  Types  

15x  Machine  Data  by  2020  

40  ZB  by  2020  

OLTP,  ERP,  CRM  Systems  

Unstructured  documents,  emails  

Clickstream  

Server  logs  

Sen>ment,  Web  Data  

Sensor.  Machine  Data  

Geoloca>on  

Page 7: Hortonworks and HP Vertica Webinar

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop Within An Emerging Modern Data Architecture

OPERATIONS  TOOLS  

Provision, Manage & Monitor

DEV  &  DATA  TOOLS  

Build & Test

DATA

   SYSTEM  

REPOSITORIES  

SOURC

ES  

RDBMS   EDW   MPP  

OLTP,  ERP,  CRM  Systems  

Documents,    Emails  

Web  Logs,  Click  Streams  

Social  Networks  

Machine  Generated  

Sensor  Data  

Geoloca>on  Data  

Gov

erna

nce

&

Inte

grat

ion

Secu

rity

Ope

ratio

ns

Data Access

Data Management

APPLICAT

IONS  

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

Page 8: Hortonworks and HP Vertica Webinar

Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop: Typically Used For New Analytic Applications SC

ALE

SCOPE

New Analytic Apps New types of data LOB-driven

Page 9: Hortonworks and HP Vertica Webinar

Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Clickstream Capture and analyze website visitors’ data trails and optimize your website

Sensors Discover patterns in data streaming automatically from remote sensors and machines

Server Logs Research logs to diagnose process failures and prevent security breaches

New types of data Hadoop Value:

Sentiment Understand how your customers feel about your brand and products – right now

Geographic Analyze location-based data to manage operations where they occur

Unstructured Understand patterns in files across millions of web pages, emails, and documents

Page 10: Hortonworks and HP Vertica Webinar

Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

New Analytic Applications For New Types Of Data

$

•  Supplier Consolidation •  Supply Chain and Logistics •  Assembly Line Quality Assurance •  Proactive Maintenance •  Crowdsourced Quality Assurance

•  New Account Risk Screens •  Fraud Prevention •  Trading Risk •  Maximize Deposit Spread •  Insurance Underwriting •  Accelerate Loan Processing

•  Call Detail Records (CDRs) •  Infrastructure Investment •  Next Product to Buy (NPTB) •  Real-time Bandwidth

Allocation •  New Product Development

•  360° View of the Customer •  Analyze Brand Sentiment •  Localized, Personalized

Promotions •  Website Optimization •  Optimal Store Layout

Financial Services

Retail Telecom Manufacturing

Healthcare Utilities, Oil & Gas

Public Sector

•  Genomic data for medical trials •  Monitor patient vitals •  Reduce re-admittance rates •  Store medical research data •  Recruit cohorts for

pharmaceutical trials

•  Smart meter stream analysis •  Slow oil well decline curves •  Optimize lease bidding •  Compliance reporting •  Proactive equipment repair •  Seismic image processing

•  Analyze public sentiment •  Protect critical networks •  Prevent fraud and waste •  Crowdsource reporting for

repairs to infrastructure •  Fulfill open records requests

Page 11: Hortonworks and HP Vertica Webinar

Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

360° Customer View for Home Supply Retailer

Problem Lack of a unified customer record across all channels •  Global distribution online, in home and across 2000+ stores •  No “golden record” for analytics on customer buying behavior across all channels •  Data repositories on website traffic, POS transactions and in-home services existed

in isolation of each other •  Limited ability for targeted marketing to specific segments •  Data storage costs increasing

Solution HDP delivers targeted marketing & data storage savings •  Golden record enables targeted, customized marketing •  Data warehouse offload saved millions in recurring expense •  Customer team continues to find unexpected, unplanned uses for their 360 degree

view of customer buying behavior •  New use case: price optimization versus competitors à several millions in top-line

revenue growth

Creating Opportunity Data: Clickstream,

Unstructured, Structured

Retail

Major home improvement retailer

>$74B in revenue

>300K employees

>2,200 stores

RT2

Page 12: Hortonworks and HP Vertica Webinar

Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop Incrementally Delivers A ‘Data Lake’ SC

ALE

SCOPE

A Modern Data Architecture/Data Lake

 

New Analytic Apps New types of data LOB-driven

RDBMS

MPP

EDW

Gov

erna

nce

&

Inte

grat

ion

Secu

rity

Ope

ratio

ns

Data Access

Data Management

Data Lake An architectural shift in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale

Page 13: Hortonworks and HP Vertica Webinar

Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop: An Integrated Part Of The Modern Data Architecture

DEPTH Hortonworks engages in deep engineered relationships with the leaders in the data center, applications and operations BREADTH Hundreds of partners work with us to certify their applications to work with Hadoop so they can extend big data to their users

Provision, Manage & Monitor

APPLICAT

IONS  

DATA

   SYSTEM  

OPERATIONAL  TOOLS  

DEV  &  DATA  TOOLS  

INFRASTRUCTURE  

HDP 2.1

Gov

erna

nce

&

Inte

grat

ion

Secu

rity

Ope

ratio

ns

Data Access

Data Management

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

REPOSITORIES  

Build & Test

On Premise or in the Cloud

SOURC

ES  

OLTP,  ERP,  CRM  Systems  

Documents,    Emails  

Web  Logs,  Click  Streams  

Social  Networks  

Machine  Generated  

Sensor  Data  

Geoloca>on  Data  

Page 14: Hortonworks and HP Vertica Webinar

Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Customer Analytics with HP Vertica + Hortonworks Chris Selland, VP Business Development HP Software, Big Data Group

Page 15: Hortonworks and HP Vertica Webinar

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 15

Completing Analytical Vision

Data Types

Acc

urac

y an

d In

sigh

t

CRM ERP Data Warehouse Web Social Log Files Machine Data Images

Dark Data

Big Data Traditional Enterprise Data

Audio Video

Structured Semi-Structured Unstructured

Page 16: Hortonworks and HP Vertica Webinar

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 16

Structured Data

Customer Analytics in the Big Data Era

Select Customers with < 2 Months Remaining on Contract with 5+ dropped calls per week and lifetime value > $500

From a database get me all matches from the CRM and Call Detail Records that match the query

From unstructured sources get me all matches for weblogs, calls, chat, email that were negative for the structured results

Unstructured Data

Customer expressed negative sentiment through social media, web log and/or support within the last 3 months

Page 17: Hortonworks and HP Vertica Webinar

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 17

Faster answers from Big Data at a fraction of the cost of traditional data warehouses

Introducing HP Vertica Dragline

Store all your data in any format cost-effectively across Vertica + Hadoop

Explore all your data directly in Hadoop without moving or changing it

Serve all of your data consumers without compromise from individualized queries to large complex reports

HP Vertica

Page 18: Hortonworks and HP Vertica Webinar

Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HP Vertica Dragline: The Richest, Most Open SQL on Hadoop

Challenge Extracting Data from Hadoop requires complex and brittle ETL processes Solution: Hadoop Navigation and Analytics Benefits: •  Navigate Hadoop data using its

native catalog •  Quickly & easily load native data

types from Hadoop to Vertica •  Avoid creating and maintaining

time-consuming schemas •  Use the full power of HP Vertica

SQL and Analytics

Provision, Manage & Monitor

APPLICAT

IONS  

DATA

   SYSTEM  

OPERATIONAL  TOOLS  

DEV  &  DATA  TOOLS  

INFRASTRUCTURE  

HDP 2.1

Gov

erna

nce

&

Inte

grat

ion

Secu

rity

Ope

ratio

ns

Data Access

Data Management

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

REPOSITORIES  

Build & Test

On Premise or in the Cloud

SOURC

ES  

OLTP,  ERP,  CRM  Systems  

Documents,    Emails  

Web  Logs,  Click  Streams  

Social  Networks  

Machine  Generated  

Sensor  Data  

Geoloca>on  Data  

Page 19: Hortonworks and HP Vertica Webinar

Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Flexible Vertica Hadoop Connectivity Leverage existing tools in shared Vertica and Hadoop storage environment

webHDFS

ANSI SQL

webHDFS

HDP 2.1 Hortonworks Data Platform

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Data Workflow, Lifecycle & Governance

Falcon Sqoop Flume NFS

WebHDFS

YARN: Data Operating System

DATA MANAGEMENT

SECURITY BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

GOVERNANCE & INTEGRATION

Authentication Authorization Accounting

Data Protection

Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon

Cluster: Knox

OPERATIONS

Script

Pig

Search

Solr

SQL

Hive HCatalog

NoSQL

HBase Accumulo

Stream

Storm

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

°

°

N

HDFS (Hadoop Distributed File System)

In-Memory

Spark

Tez Tez

Batch

Map Reduce

webHCAT Hadoop Connector

Storage Tiering

HDFS Connector External Tables and Copy

HCatalog Connector

Page 20: Hortonworks and HP Vertica Webinar

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 20

Data Tiering and Cost Optimization

Tier-off older data

Value Discovery

Interactive Data Frequently queried Vertica data cache

Batch Data

Archive Data

Serve Convert data to Vertica storage format

Explore Any format

Store Any format Location Format

Cold

Cool

Hot

Dark Data

Page 21: Hortonworks and HP Vertica Webinar

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 21

JSON Record-Unstructured Data

{"filter_level":"medium","contributors":null,"text":“Listening to Meg Whitman talk about the New Style of IT at #HPDiscover","geo":null,"retweeted":false,"in_reply_to_screen_name":null,"truncated":false, "lang":"en","entities":{"symbols":[],"urls":[],"hashtags":[{"text":"nope","indices":[51,56]}], "user_mentions":[]},"in_reply_to_status_id_str":null,"id":346104750565097474,"source":"!

<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone<\/a>", "in_reply_to_user_id_str":null,"favorited":false,"in_reply_to_status_id":null,"retweet_count":0,"created_at":“Tue Jun 11 03:19:37 +0000 2013","in_reply_to_user_id":null, "favorite_count":0,"id_str": "346104750565097474","place":null,"user":{"location":"","default_profile":false,"profile_background_tile":true,"statuses_count":2354,"lang":"en","profile_link_color":"FF0000","profile_banner_url":"https://pbs.twimg.com/profile_banners/271588683/1370571522","id":271588683,"following":null,"protected":false,"favourites_count":121,"profile_text_color":"3D1957","description":"Dance It is a part of me A part of who I am It has entered my life Taken over my body It is in my walk In my movements In my thoughts I have become a DANCER","verified":false,"contributors_enabled": false,"profile_sidebar_border_color":"65B0DA","name":"ashley tousignant", "profile_background_color":"642D8B","created_at": "Thu Mar 24 20:25:59 +0000 2011","default_profile_image":false,"followers_count":434,"profile_image_url_https":"https://si0.twimg.com/profile_images/3765534455/eee814d484d70b8eb9ca5db08a122cbb_normal.jpeg","geo_enabled":true,"profile_background_image_url":"http://a0.twimg.com/images/themes/theme10/bg.gif","profile_background_image_url_https":"https://si0.twimg.com/images/themes/theme10/bg.gif","follow_request_sent":null,"url":null,"utc_offset":null,"time_zone":null,"notifications":null,"profile_use_background_image":true,"friends_count":844,"profile_sidebar_fill_color":"7AC3EE","screen_name":"01ashleymt","id_str":"271588683","profile_image_url":"http://a0.twimg.com/profile_images/3765534455/eee814d484d70b8eb9ca5db08a122cbb_normal.jpeg","listed_count":0,"is_translator":false},"coordinates":null}!

!

More than 140 Characters

Page 22: Hortonworks and HP Vertica Webinar

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

22

HP Vertica Flex Zone Avoid creating and maintaining time-consuming schemas

on semi-structured data Faster SQL querying

semi-structured data loading Auto-schematization

for JSON and delimited data Flexible parsers

for blazing-fast performance One-step schema

Load, manage, and explore semi-structured data

Page 23: Hortonworks and HP Vertica Webinar

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

23

Analyzing Billions of Clicks

Challenge online •  Millions of website visitors

generate billions of clicks per month

•  Must store 5 years worth of data to get full value of year-over-year clickstream analysis

•  Legacy database had sluggish performance – queries took 48 hours after each day’s transactions

•  Extremely complex website – many pages are generated dynamically creating complex clickstream trails

Major Computer Products Manufacturer

HP Vertica Solution •  Queries run in hours or even

minutes; 48x – 100x faster •  Industry-standard SQL

accelerated acceptance and proficiency

•  Speed of HP Vertica allows iterative and recursive analysis for deeper dives

•  Functionality tailored to individual interactions based on nuanced understanding of user behavior at an individual level

Page 24: Hortonworks and HP Vertica Webinar

Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Next steps…

Download the Hortonworks Sandbox

Learn Hadoop

Build Your Analytic App

Try Hadoop 2

More about HP Vertica & Hortonworks http://hortonworks.com/partner/HP/

Don’t miss our next webinar! HP Converged Systems and Hortonworks Planning for the Impacts of Big Data in the Data Center http://info.hortonworks.com/hpconvergedandhortonworks.html

Page 25: Hortonworks and HP Vertica Webinar

Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

End