Future of Data Hortonworks Data Platform and Hortonworks ... · Hortonworks Connected Data Platforms and Solutions Hortonworks Connection Hortonworks Solutions Enterprise Data ...
Post on 04-Jun-2018
295 Views
Preview:
Transcript
Future of DataHortonworks Data Platform and Hortonworks Data FlowEric Thorsen, VP Industry Solutions
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Explosion of Data
Consumer Behavior
• Millennials now outnumber Baby Boomers as dominant transactional generation and will constitute 50% of workforce in next few years, 75% of workforce by 2030
• 2.5 Billion Connected People on Social networks by 2020, 75 Billion Connected Devices by 2020
Big Data Trends
• The number of U.S. firms using big data has jumped 58 percentage points to 63% penetration
• 70% of firms now say that big data is of critical importance to their firms, from only 21% in 2012. One of the fastest tech-adoption rates ever.
• The title of chief data officer — the C-Suite manager of big data — a title that until recently didn’t even exist, is now found in 54% of companies surveyed.
Data Exploding with unprecedented data types
• Sensors, iBeacons, Weighted Shelves, Smart Hangers, Smart Bins, Smart Racks
• Social Media, Tweets, Mentions, Likes, Blogs
• Clickstream, Web logs, video feeds
• Server activity “80% of the world’s data has been created in the last two years.”Ginni Rometty, IBM CEO – January 2014
Big Data Executive Survey 2016 – NewVantage Partners
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
DATA – More Volume and More Types
I N C R E A S I N G D A T A V A R I E T Y A N D C O M P L E X I T Y
USER GENERATED CONTENT
MOBILE WEB
SMS/MMS
SENTIMENT
EXTERNAL DEMOGRAPHICS
HD VIDEO
SPEECH TO TEXT
PRODUCT/SERVICE LOGS
SOCIAL NETWORK
BUSINESS DATA FEEDS
USER CLICK STREAM
WEB LOGS
OFFER HISTORY DYNAMIC PRICING
A/B TESTING
AFFILIATE NETWORKS
SEARCH MARKETING
BEHAVIORAL TARGETING
DYNAMIC FUNNELSPAYMENTRECORD
SUPPORT CONTACTS
CUSTOMER TOUCHESPURCHASE DETAIL
PURCHASERECORD
SEGMENTATIONOFFER DETAILS
P E T A B Y T E S
T E R A B Y T E S
G I G A B Y T E S
E X A B Y T E S
E R P
B I G D A T A
W E B
C R M
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Traditional systems under pressure
Challenges• Constrains data to app
• Can’t manage new data
• Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012
2.8 Zettabytes
1
2 New Data
ERP CRM SCM
New
Traditional
*Multiples of BytesKilobyteMegabyteGigabyteTerabytePetabyteExabyteZettabyteYottabyte
1,0
00
,00
0,0
00
,00
0,0
00
,00
0,0
00
Much of the new data exists in-flight between systems and devices as part of the Internet of Anything
2014
4.1 Zettabytes
2020
40 Zettabytes
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
DATA AT RESTDATA IN MOTION
ACTIONABLEINTELLIGENCE
Modern Data Applications
PERISHABLE INSIGHTS
HISTORICAL INSIGHTS
INTERNETOF
ANYTHING
Hortonworks DataFlow
Hortonworks Data Platform
Hortonworks DeliversConnected Data Platforms
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks Connected Data Platforms and Solutions
HortonworksConnection
Hortonworks Solutions
Enterprise DataWarehouse Optimization
Cyber Security andThreat Management
Internet of Thingsand Streaming Analytics
Hortonworks Connection
Subscription Support
SmartSense
Premier Support
Educational Services
Professional Services
Community Connection
CloudHortonworks Data Cloud
AWS HDInsight
Data CenterHortonworks Data Suite
HDFHDP
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Holistic Customer Interaction Model
HDP and HDF Subscription
Operational Services
Applications
Support/ ”Break Fix”
Professional Services and Partner SI’s
Configure, Manage and Upgrade
Components Included
Customer Proposal Components
8 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks Influences the Apache Community
We Employ the Committers
--one third of all committers to the Apache®
Hadoop™ project, and a majority in Apache NiFiand other important projects
Our Committers Innovate
and improve Connected Data Platforms
We Influence the Hadoop Roadmap
by communicating important requirements to the community through our leaders
A PA C H E H A D O O P C O M M I T T E R S
9 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 9
Social Mapping
Payment Tracking
Factory YieldsDefect
Detection
Call Analysis
Machine DataProduct Design
M & A
Due Diligence
Next Product Recs
Cyber Security
Risk ModelingAd Placement
Proactive Repair
Disaster Mitigation
Investment Planning
Inventory Predictions
Customer Support
Sentiment Analysis
Supply Chain
Ad PlacementBasket
AnalysisSegments
Cross-Sell
Customer Retention
Vendor Scorecards
Optimize Inventories
OPEX
Reduction
Mainframe
Offloads
Historical
Records
Data
as a Service
Public
Data
Capture
Fraud
Prevention
Device
Data
Ingest
Rapid
Reporting
Digital
Protection
10 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 10
Social Mapping
Payment Tracking
Factory YieldsDefect
Detection
Call Analysis
Machine DataProduct Design
M & A
Due Diligence
Next Product Recs
Cyber Security
Risk ModelingAd Placement
Proactive Repair
Disaster Mitigation
Investment Planning
Inventory Predictions
Customer Support
Sentiment Analysis
Supply Chain
Ad PlacementBasket
AnalysisSegments
Cross-Sell
Customer Retention
Vendor Scorecards
Optimize Inventories
OPEX
Reduction
Mainframe
Offloads
Historical
Records
Data
as a Service
Public
Data
Capture
Fraud
Prevention
Device
Data
Ingest
Rapid
Reporting
Digital
Protection
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Product Monitoring
Process Monitoring
Quality Analysis
Quality Data
Batch Genealogy
Time Series Data
MFGData
Warranty Data
Yield Analysis
Product
EquipmentProduction
Line
Supply Chain
Customer
Process
FactoryLogistics Business
Connected Car
Real-time Operations
Yield Optimization
Quality Optimization
Energy Management
Supply Chain Optimization
ProactiveRepair
InventoryPredictions
Predictive Maintenance
OPEXReduction
Demand Sensing
MainframeOffloads
Device Data
Ingest
Rapid Reporting
DigitalProtection
Dataas a
Service
FraudPrevention
PublicData
Capture
I N N OVAT E
R E N OVAT E
E X P L O R E O P T I M I Z E T R A N S F O R M
A C T I V EA R C H I V E
E T LO N B O A R D
D ATAE N R I C H M E N T
DATAD I S C OV E RY
S I N G L EV I E W
P R E D I C T I V EA N A LY T I C S
M A N U FAC T U R I N G
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Merck’s Journey
Improving Life Sciences Manufacturing Yields Presents a Complex Data Discovery Challenge
Vaccine manufacturing requires precise control of complex fermentation processes
Two batches of a vaccine, produced using an identical manufacturing process, can exhibit significant yield variances
Batches that fail quality standards can cost $1 million each
Merck analyzed one vaccine: 10 years of manufacturing data stored across 16 systems
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Merck’s Journey
Scientific Search
Sensor Data Storage
Vaccine Yield Optimization
Innovate
RenovateThe Journey to the Golden Batch
Combined 10 years data amounted to 1 billion records
5.5 million batch comparisons
1st year yield boost of 40K more doses $10M profit impact
McKinsey: 50% yield improvement
Epidemiology
D ATAD I S C O V E R Y
A C T I V EA R C H I V E
D A T AD I S C O V E R Y
D A T AD I S C O V E R Y
The Golden Batch
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Risk Assessment
DueDiligence
SocialMapping
ProductDesign
M & ACall
AnalysisSensor Data
Loss Control
Telematics
CustomerSupport
ClaimAnalysis
Market Segments
CustomerRetention
SentimentAnalysis
Fraud Investigation
Risk Analysis
Cross-Sell
Channel Scorecards
AdPlacement
CyberSecurity
CatModels
InvestmentPlanning
RiskAppetite
RiskModeling
LossControl
Claim Severity
NextBest Action
OPEXReduction
HistoricalRecords
MainframeOffloads
Device Data
Ingest
Rapid Reporting
DigitalProtection
Dataas a
Service
FraudPrevention
PublicData
Capture
I N N OVAT E
R E N OVAT E
E X P L O R E O P T I M I Z E T R A N S F O R M
A C T I V EA R C H I V E
E T LO N B O A R D
D ATAE N R I C H M E N T
DATAD I S C OV E RY
S I N G L EV I E W
P R E D I C T I V EA N A LY T I C S
I N S U R A N C E
Fraud Mitigation
Solvency Analysis
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Progressive’s Journey
Progressive Wanted to Ingest IoT Data to Predict Risk for its Usage-based Insurance Product
Progressive Snapshot offers usage-based insurance through an in-car sensor that transmits IoT driving data
Sensors collect up to six months of data from drivers and the data is archived for years, per regulatory requirements
Progressive’s existing systems were not scaling efficiently
It took 5–7 days to transform only 25% of available UBI data
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Progressive’s Journey
Rewarding Safer Drivers and Improving Traffic Safety
Snapshot plug-in devices capture driving detail, 100% stored in HDP, ingested in 2-3 days
More than 12 billion miles driven stored
Through a web app, customers can review their own driving detail and improve their safety
Snapshot and usage-based insurance drove $2.6 billion in 2014 Progressive premiums, growing since then
Innovate
Renovate
Claims Notes Mining
Individual Driving
Histories
Usage-BasedInsurance (UBI)
Web LogAnalysis
Online AdPlacement
Sensor DataIngest
PREDICTIVEANALYTICS
A C T I V EA R C H I V E
D A T AD I S C O V E R Y
D A T AD I S C O V E R Y
D A T AD I S C O V E R Y
E T LO N B O A R D
Safe Roads
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Arizona State University’s Journey
A Genome Represents 20 Billion Rows of Data and Researchers Couldn’t Explore Enough Genetic Data to Understand How Genes Affect Cancer
Cancer is both complicated (the interplay between multiple biological process) and also complex (affected by biological, genetic, environmental and social factors)
Since each genome represents so much data, legacy platforms couldn't amass enough genomic data to explore cancer patterns across a broad genetic spectrum
This created a “lamp-posting” phenomenon, forcing a focus around incremental research clustered around genes known to influence cancer
ASU turned to HDP to store and process huge amounts of genomic data, to make that data broadly available to researchers and to do it all at a scalable cost
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Arizona State University’s Journey
HDP’s Storage and Compute Efficiencies Allow Individual Researchers to Do the Work Previously Done By Entire Teams
ASU’s Next Generation Cyber Capability (NGCC) project combines HDP with high-performance computing, for genomic analysis in Apache Spark
The NGCC architecture follows President Obama’s “National Cancer Moonshot” guidelines, with a federated framework that encourages data sharing
One query against a table with 20 billion rows would time out before it could return results. In HDP it returned results in 1-2 minutes
“Now with HDP we have both the availability of data and the technical capability to analyze it. We are able to explore spaces where we simply couldn’t go before. It just wasn’t possible before having this technology. This has sped our time to insight infinitely in some cases. Some questions were not possible before, and now they return results in a day.”
-- Dr. Kenneth Buetow, Director of Computational Sciences and Informatics
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Claims Optimization
SocialSentiment
Cohort Selection
Bill ShockPhysician
NotesDevice
Monitoring
R & DQuality
Benchmarks
PatientExperience
Seasonal Staffing
Net Promoter
Score
Supply Chain
SentimentAnalysis
PatientOutreach
360°PatientView
Patient Throughput
Customer Churn
Analysis
STARS Ratings
Genomics
Remote Monitoring
Drug Diversion
CensusProactive
Maintenance
PreventativeMedicine
Inventory
MedicationSafety
OPEXReduction
Lab Notes Archive
MainframeOffloads
Device Data
Ingest
Rapid Reporting
DigitalProtection
Dataas a
Service
FraudPrevention
Real-time Decision Support
I N N OVAT E
R E N OVAT E
E X P L O R E O P T I M I Z E T R A N S F O R M
A C T I V EA R C H I V E
E T LO N B O A R D
D ATAE N R I C H M E N T
DATAD I S C OV E RY
S I N G L EV I E W
P R E D I C T I V EA N A LY T I C S
H E A LT H C A R E
Care-path Best
Practices
OR Optimization
HCAHPSScores
Staffing Predictions
Proactive Outreach
Legacy System
Data
Imaging Archive
Historical PatientRecords
Improved Drug Yields
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Mercy’s Journey
Mercy Medical System Sought a Data Lake for a Single View of its Patients –“One Patient, One Record”
Existing platform impeded goal of enriching Epic data for 1 million patients over 35 Hospitals and 500 clinics
Moving Epic EMR data to Clarity EDW took 24 hours and was “never goingto enable real-time analytics”. Now that takes 3-5 minutes with HDP.
Improved billing processes resulted in $1M additional annual revenuefrom newly documented secondary diagnoses and care
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
PREDICTIVEANALYTICS
Mercy’s Journey
BillingVital Signs
SinglePatient Record
Lab Notes
PrivacyDatabase
Medical Decision Support
DeviceData
Ingest
PreventiveCare
Epic Enrichment
OPEX Efficiency
Epic EMR Replication
Better HealthThrough Data
Searches of free-text lab notes, speed researcher insight from “never” to “seconds”
Ingest of ICU vital signsincreased by 900X, letting clinicians respond more quickly
Mercy is building real-timetools to support surgical decisions and preventive care
Innovate
Renovate
Better Health
D A T AD I S C O V E R Y
S I N G L EV I E W
D A T AD I S C O V E R Y
S I N G L EV I E W
A C T I V EA R C H I V E
A C T I V EA R C H I V E
A C T I V EA R C H I V E
D A T AE N R I C H M E N T
E T LO N B O A R D
P R E D I C T I V EA N A L Y T I C S
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop for Retail
DATA REPOSITORIES
ANALYSIS
Single view of consumerTargeted promotionsRecommendation enginesBasket analysis
Price optimizationInventory optimizationLoyalty managementPath to purchase
Secu
rity
Op
era
tio
ns
Go
vern
ance
& In
tegr
atio
n
°1 ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° N
YARN : Data Operating System
Script SQL NoSQL Stream Search Others
HDFS(Hadoop Distributed File System)
In-Mem
ERP
EDW
RDBMS
CRM
EMERGING & NON-TRADITIONAL SOURCES
SOCIAL MEDIA
BEACONS
SENSOR RFID
CLICKSTREAM
IN-STORE WIFI LOGS
SERVER LOGS
TRADITIONAL SOURCES
CRM STORES PRODUCT CATALOG STAFFING PLANS
ERP POS TRANSACTIONS INVENTORY WEB TRANSACTIONS
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interoperable with Leading Datacenter Technologies
Partners
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Start Your Journey with Hortonworks in 5 Easy Steps
1. Schedule a use case workshop with the local Hortonworks team
2. Complete the Big Data Scorecard– hortonworks.com/get-started/big-data-scorecard
3. Download Hortonworks Sandbox – hortonworks.com/sandbox
4. Subscribe for Support – hortonworks.com/services/jumpstart/
5. Join Hortonworks Community Connection – hortonworks.com/community/
6. Follow the Hortonworks blog – hortonworks.com/blog/
top related