Click here to load reader
Apr 16, 2017
[email protected]#FutureOfData
2 HortonworksInc.20112016.AllRightsReserved
AgendaFutureofData
TraditionalDataArchitectures
WhatsApacheHadoop?
DataAccesswithHadoop
LabIntro
3 HortonworksInc.20112016.AllRightsReserved
CustomersarebuildingModernDataApplicationstotransformtheirindustriesrenovatingtheirITarchitecturesandinnovatingwiththeirDatainMotionorDataatResttopoweractionableintelligence.
SocialMapping
PaymentTracking
FactoryYields
DefectDetection
CallAnalysis MachineDataProductDesign M&A
DueDiligence
NextProductRecs
CyberSecurity
RiskModeling
AdPlacement
ProactiveRepair
DisasterMitigation
InvestmentPlanning
InventoryPredictions
CustomerSupport
SentimentAnalysis
SupplyChain
AdPlacement
BasketAnalysis Segments
Cross-Sell
CustomerRetention
VendorScorecards
OptimizeInventories
OPEXReduction
MainframeOffloads
HistoricalRecords
DataasaService
PublicData
Capture
FraudPrevention
DeviceDataIngest
RapidReporting
DigitalProtection
3 HortonworksInc.20112016.AllRightsReserved
FutureofData
5 HortonworksInc.20112016.AllRightsReserved
INTERNETOF
ANYTHING
TheFutureofDataisaboutactionableintelligencederivedfromaconstantlyconnectedsocietywitheasysecureaccesstorichdatasetscomingfromtheInternetofAnything
DataPowersHighwaySafety
7 HortonworksInc.20112016.AllRightsReserved
TirePressure
Serverlog Mobile
Sensor
Location
Precipitation
Social
Click-stream
DataPowersHighwaySafety
8 HortonworksInc.20112016.AllRightsReserved
NewDataParadigmOpensUpNewOpportunity
2.8zettabytesin2012
44zettabytesin2020
N E W
1 zettabyte (ZB) = 1 million petabytes (PB); Sources: IDC, IDG Enterprise, and AMR Research
Clickstream
ERP,CRM,SCM
Web&social
Geolocation
InternetofThings
Server logs
Files, emails
Transformeveryindustryviafullfidelityofdataandanalytics
Opportunity
T R A D I T I O N A L
LAGGARDS
LEADERS
AbilitytoConsumeData
EnterpriseBlindSpot
9 HortonworksInc.20112016.AllRightsReserved
Whatdisruptedthedatacenter?
?
Data?
10 HortonworksInc.20112016.AllRightsReserved
ModernDataApplications
Polygot Persistence
SQLNoSQL
NewSQLSearch
Graph
At-Rest In-Motion
AnalyticsDataVariety
Integration
DataLake Federation
OptimizationStorage,Compute
DistributedComputing
CommodityHardware
Cloud
HybridDistributedComputing
11 HortonworksInc.20112016.AllRightsReserved
TheFutureofDataActionableIntelligence
D AT A I N M O T I O N
STORAG
ESTORAG
E
GROUP2GROUP1
GROUP4GROUP3
D A T A A T R E S T
INTERNETOF
ANYTHING
ConnectedDataPlatformsarepoweringActionableIntelligence
Anyandalldatafromsensors,machines,
geolocation,clicks,files,social.
Securepoint-to-pointandbi-directionaldataflows
Collectandcuratealldata.
12 HortonworksInc.20112016.AllRightsReserved
TraditionalDataArchitectures
13 HortonworksInc.20112016.AllRightsReserved
SystemsofIntelligence
SystemsofEngagements
SystemsofInteractions
DataSystems
13
SystemsofRecord
SystemsofInsight
EventsInGray
AnalyticsIn
Green
OperatorsDevelopers
14 HortonworksInc.20112016.AllRightsReserved
RDBMS
Sales
NoSQL
Unstructured
Visualization&Dashboards
BusinessAnalytics
DataMarts
DataMarts Archive
StatisticsOLAP
EDW
FileServer
ClickstreamLogs
Web&SocialLogs
AudioVideo
LogsLogs
Logs
Geolocation
JSON
ETL
POS CRM ERP
ECM
Filter
AppServer
MessageBus
Documents
15 HortonworksInc.20112016.AllRightsReserved
RDBMS
Sales
NoSQL
Unstructured
Visualization&Dashboards
BusinessAnalytics
DataMarts
DataMarts Archive
StatisticsOLAP
EDW
FileServer
ClickstreamLogs
Web&SocialLogs
AudioVideo
LogsLogs
Logs
Geolocation
JSON
ETL
POS CRM ERP
ECM
Filter
AppServer
MessageBus
Documents
Tooexpensiveandslowasdatagrowthkeepsaccelerating
Tooslowtogetthedatapreparedforanalytics
Analyticsisonlyleveragingalimiteddataset
Colddatabecomesarchivedandisnolongerusableforanalytics
DataingestisrigidandslowfornewIoAT datatypes
Limitedrealtimeinsights
TraditionalDataArchitectureChallengeswithBigData
16 HortonworksInc.20112016.AllRightsReserved
RDBMS
Sales
NoSQL
Unstructured
Visualization&Dashboards
BusinessAnalytics
DataMarts
DataMarts Archive
StatisticsOLAP
EDW
FileServer
ClickstreamLogs
Web&SocialLogs
AudioVideo
LogsLogs
Logs
Geolocation
JSON
ETL
POS CRM ERP
ECM
Filter
AppServer
MessageBus
Documents
17 HortonworksInc.20112016.AllRightsReserved
Next Generation AnalyticsIterative & ExploratoryData is the structure
IT TeamDelivers DataOn Flexible
Platform
BusinessUsers
Explore andAsk Any Question
Analyze ALL Available Information
Whole population analytics connects the dots
Traditional AnalyticsStructured & Repeatable
Structure built to store data
BusinessUsers
DetermineQuestions
IT TeamBuilds System
To AnswerKnown Questions
17
Available Information
AnalyzedInformation
Capacity constrained down sampling of available information
Carefully cleanse all information before any analysis
AnalyzedInformation
Analyze information as is & cleanse as needed
AnalyzedInformation
ModernDataApplications
18 HortonworksInc.20112016.AllRightsReserved
Next Generation AnalyticsIterative & ExploratoryData is the structure
Traditional AnalyticsStructured & Repeatable
Structure built to store data
18
?AnalyzedInformation
Question
DataAnswer
Hypothesis
StartwithhypothesisTestagainstselecteddata
Data leads the way Explore all data, identify correlations
Data
Correlation
All Information
Exploration
Actionable Insight
Analyzeafterlanding Analyzeinmotion
ModernDataApplicationsHasTwoThemes
WhatsApacheHadoop?
20 HortonworksInc.20112016.AllRightsReserved
HadoopArchitecture
DataAccessEngines
DistributedReliableStorage
DistributedComputeFrameworkResourceManagement,DataLocalityDataOperatingSystem
Batch Interactive Real-time
Governance&
IntegrationSecurity
Applications
DeployAnywhere
21 HortonworksInc.20112016.AllRightsReserved
HadoopDataPlatformArchitecture
StoreandprocessallofyourCorporateDataAssets
YARN:DataOperatingSystem
DATA MANAGEMENT
Providelayeredapproachto
securitythroughAuthentication,Authorization,Accounting,andDataProtection
SECURITY
Access yourdatasimultaneously inmultiple ways(batch, interactive, real-time)
DATA ACCESS
Loaddataandmanage according
topolicy
GOVERNANCE & INTEGRATION
ENTERPRISEMGMT&SECURITY
Empowerexistingoperationsandsecuritytoolstomanage Hadoop
PRESENTATION&APPLICATION
Enablebothexistingandnewapplicationtoprovidevaluetotheorganization
Providedeploymentchoice acrosson-premise,appliance, virtualized,cloud
DEPLOYMENTOPTIONS
Deployandeffectivelymanage theplatform
OPERATIONS
22 HortonworksInc.20112016.AllRightsReserved runson
ETL
RDBMSImport/Export
DistributedStorage&ProcessingFramework
SecureNoSQL DB
SQLonHBase
NoSQL DB
WorkflowManagement
SQL
StreamingDataIngestion
ClusterSystemOperations
SecureGateway
DistributedRegistry
ETL
Search&Indexing
EvenFasterDataProcessing
DataManagement
MachineLearning
HadoopEcosystem
23 HortonworksInc.20112016.AllRightsReserved
OpenEnterpriseHadoopCapabilities
YARN : Data Operating System
DATA ACCESS SECURITYGOVERNANCE & INTEGRATION OPERATIONS
1
N
Data Lifecycle & Governance
FalconAtlas
AdministrationAuthenticationAuthorizationAuditingData Protection
RangerKnoxAtlasHDFSEncryptionData Workflow
SqoopFlumeKafkaNFSWebHDFS
Provisioning, Managing, & Monitoring
AmbariCloudbreakZookeeper
Scheduling
Oozie
Batch
MapReduce
Script
Pig
Search
Solr
SQL
Hive
NoSQL
HBaseAccumuloPhoenix
Stream
Storm
In-memory
Spark
Others
ISV Engines
Tez Tez Slider Slider
DATA MANAGEMENT
HortonworksDataPlatform
DeploymentChoiceLinux Windows On-Premise Cloud
HDFS Hadoop Distributed File System
24 HortonworksInc.20112016.AllRightsReserved
HORTONWORKS DATAPLATFORM
DATAMGMT
HDP2.2Dec2014
HDP2.1April2014
HDP2.0Oct2013
HDP2.2Dec2014
HDP2.1April2014
HDP2.0Oct2013
2.2.0
2.4.0
2.6.0
Ong