Top Banner
Hadoop Crash Course Rafael Coss Community Team Developer Advocate @racoss [email protected]
93

Hadoop Crash Course

Apr 13, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hadoop Crash Course

HadoopCrashCourseRafaelCossCommunityTeamDeveloperAdvocate@[email protected]

Page 2: Hadoop Crash Course

2 ©HortonworksInc.2011– 2016.AllRightsReserved

EvolvewithDataTrendsbyUsingOpenSource

à KeepupwithrapidlyevolvingDataTrends

à DifferentiateyourbizwithInsightsfromData

à Speedupyouradaptationtoconstantchangebyusingopensource

à AModernDataArchitectureisaJourney

Page 3: Hadoop Crash Course

3 ©HortonworksInc.2011– 2016.AllRightsReserved

AgendaHadoopUseCases

TraditionalDataArchitectures

What’sApacheHadoop?

DataAccesswithHadoop

LabIntro

Page 4: Hadoop Crash Course

4 ©HortonworksInc.2011– 2016.AllRightsReserved4 ©HortonworksInc.2011– 2016.AllRightsReserved

PaymentTracking

DueDiligence

SocialMapping

ProductDesign M&ACall

AnalysisMachineData

DefectDetecting

FactoryYields

CustomerSupport

BasketAnalysis Segments

CustomerRetention

SentimentAnalysis

OptimizeInventories

SupplyChain

Cross-Sell

VendorScorecards

AdPlacement

OPEXReduction

HistoricalRecords

MainframeOffloads

DeviceDataIngest

Rapid Reporting

DigitalProtection

Dataasa

Service

FraudPrevention

PublicData

Capture

INNOVATE

RENOVATE

EXP LORE OPT IM I Z E TRANS FORM

ACT IVEARCH IVE

ETLONBOARD

DATAENR ICHMENT

DATAD ISCOVERY

S INGLEV IEW

CyberSecurity

DisasterMitigation

InvestmentPlanning

AdPlacement

RiskModeling

ProactiveRepair

InventoryPredictions

NextProductRecs

PRED ICT IVEANALYT ICSCustomers are building Modern Data

Applications to transform their industries, renovating their IT architectures and innovating their Data in Motion and Data at Rest platforms to power actionable intelligence.

Page 5: Hadoop Crash Course

5 ©HortonworksInc.2011– 2016.AllRightsReserved5 ©HortonworksInc.2011– 2016.AllRightsReserved

PaymentTracking

DueDiligence

SocialMapping

ProductDesign M&ACall

AnalysisMachineData

DefectDetecting

FactoryYields

CustomerSupport

BasketAnalysis Segments

CustomerRetention

SentimentAnalysis

OptimizeInventories

SupplyChain

Cross-Sell

VendorScorecards

AdPlacement

OPEXReduction

HistoricalRecords

MainframeOffloads

DeviceDataIngest

Rapid Reporting

DigitalProtection

Dataasa

Service

FraudPrevention

PublicData

Capture

INNOVATE

RENOVATE

EXP LORE OPT IM I Z E TRANS FORM

ACT IVEARCH IVE

ETLONBOARD

DATAENR ICHMENT

DATAD ISCOVERY

S INGLEV IEW

CyberSecurity

DisasterMitigation

InvestmentPlanning

AdPlacement

RiskModeling

ProactiveRepair

InventoryPredictions

NextProductRecs

PRED ICT IVEANALYT ICS

Page 6: Hadoop Crash Course

6 ©HortonworksInc.2011– 2016.AllRightsReserved6 ©HortonworksInc.2011– 2016.AllRightsReserved

PaymentTracking

DueDiligence

SocialMapping

ProductDesign M&ACall

AnalysisMachineData

DefectDetecting

FactoryYields

CustomerSupport

BasketAnalysis Segments

CustomerRetention

SentimentAnalysis

OptimizeInventories

SupplyChain

Cross-Sell

VendorScorecards

AdPlacement

OPEXReduction

HistoricalRecords

MainframeOffloads

DeviceDataIngest

Rapid Reporting

DigitalProtection

Dataasa

Service

FraudPrevention

PublicData

Capture

INNOVATE

RENOVATE

EXP LORE OPT IM I Z E TRANS FORM

ACT IVEARCH IVE

ETLONBOARD

DATAENR ICHMENT

DATAD ISCOVERY

S INGLEV IEW

CyberSecurity

DisasterMitigation

InvestmentPlanning

AdPlacement

RiskModeling

ProactiveRepair

InventoryPredictions

NextProductRecs

PRED ICT IVEANALYT ICS

Mustdotomakemoderndataapplicationspossible

Page 7: Hadoop Crash Course

7 ©HortonworksInc.2011– 2016.AllRightsReserved7 ©HortonworksInc.2011– 2016.AllRightsReserved

PaymentTracking

DueDiligence

SocialMapping

ProductDesign M&ACall

AnalysisMachineData

DefectDetecting

FactoryYields

CustomerSupport

BasketAnalysis Segments

CustomerRetention

SentimentAnalysis

OptimizeInventories

SupplyChain

Cross-Sell

VendorScorecards

AdPlacement

OPEXReduction

HistoricalRecords

MainframeOffloads

DeviceDataIngest

Rapid Reporting

DigitalProtection

Dataasa

Service

FraudPrevention

PublicData

Capture

INNOVATE

RENOVATE

EXP LORE OPT IM I Z E TRANS FORM

ACT IVEARCH IVE

ETLONBOARD

DATAENR ICHMENT

DATAD ISCOVERY

S INGLEV IEW

CyberSecurity

DisasterMitigation

InvestmentPlanning

AdPlacement

RiskModeling

ProactiveRepair

InventoryPredictions

NextProductRecs

PRED ICT IVEANALYT ICS

Mustdotomakemoderndataapplicationspossible

Powerfulmeanstooptimize currentbusiness

Page 8: Hadoop Crash Course

8 ©HortonworksInc.2011– 2016.AllRightsReserved8 ©HortonworksInc.2011– 2016.AllRightsReserved

PaymentTracking

DueDiligence

SocialMapping

ProductDesign M&ACall

AnalysisMachineData

DefectDetecting

FactoryYields

CustomerSupport

BasketAnalysis Segments

CustomerRetention

SentimentAnalysis

OptimizeInventories

SupplyChain

Cross-Sell

VendorScorecards

AdPlacement

OPEXReduction

HistoricalRecords

MainframeOffloads

DeviceDataIngest

Rapid Reporting

DigitalProtection

Dataasa

Service

FraudPrevention

PublicData

Capture

INNOVATE

RENOVATE

EXP LORE OPT IM I Z E TRANS FORM

ACT IVEARCH IVE

ETLONBOARD

DATAENR ICHMENT

DATAD ISCOVERY

S INGLEV IEW

CyberSecurity

DisasterMitigation

InvestmentPlanning

AdPlacement

RiskModeling

ProactiveRepair

InventoryPredictions

NextProductRecs

PRED ICT IVEANALYT ICS

Mustdotomakemoderndataapplicationspossible

Powerfulmeanstooptimize currentbusiness

Pathwaytotransform forstrategicadvantageandnewrevenuestreams

Page 9: Hadoop Crash Course

9 ©HortonworksInc.2011– 2016.AllRightsReserved

CustomersarebuildingModernDataApplicationstotransformtheirindustries–renovatingtheirITarchitecturesandinnovatingwiththeirDatainMotionorDataatResttopoweractionableintelligence.

SocialMapping

PaymentTracking

FactoryYields

DefectDetection

CallAnalysis MachineData

ProductDesign M&A

DueDiligence

NextProductRecs

CyberSecurity

RiskModeling

AdPlacement

ProactiveRepair

DisasterMitigation

InvestmentPlanning

InventoryPredictions

CustomerSupport

SentimentAnalysis

SupplyChain

AdPlacement

BasketAnalysis Segments

Cross-Sell

CustomerRetention

VendorScorecards

OptimizeInventories

OPEXReduction

MainframeOffloads

HistoricalRecords

DataasaService

PublicData

Capture

FraudPrevention

DeviceDataIngest

RapidReporting

DigitalProtection

9 © HortonworksInc.2011– 2016.AllRightsReserved

Page 10: Hadoop Crash Course

FutureofData

Page 11: Hadoop Crash Course

11 ©HortonworksInc.2011– 2016.AllRightsReserved

INTERNETOF

ANYTHING

TheFutureofDataisaboutactionableintelligencederivedfromallyourdatacomingfromtheInternetofAnything

Page 12: Hadoop Crash Course

12 ©HortonworksInc.2011– 2016.AllRightsReserved

Whatareallthedimensionsofallyourdata?

Structured---------------- UnstructuredAtRest--------------------- In-motion

KPI------------------- DataExhaustCore------------------ JaggedEdge

WithinYourFirewall----------- ExternalData.On-prem ----------------- Cloud .

Page 13: Hadoop Crash Course

13 ©HortonworksInc.2011– 2016.AllRightsReserved

TheFutureofDataActionableIntelligence

D ATA I N M O T I O N

STORAG

ESTORA

GE

GROUP2GROUP1

GROUP4GROUP3

D ATA AT R E S T

INTERNETOF

ANYTHING

ConnectedDataArchitectureAcrossyourDataPlane

ispoweringActionableIntelligence

Anyandalldatafromsensors,machines,

geolocation,clicks,files,social.

Securepoint-to-pointandbi-directionaldataflows

Collectandcuratealldata.

Page 14: Hadoop Crash Course

14 ©HortonworksInc.2011– 2016.AllRightsReserved

NewDataParadigmOpensUpNewOpportunity

2.8zettabytesin2012

44zettabytesin2020

N E W

1 zettabyte (ZB) = 1 million petabytes (PB); Sources: IDC, IDG Enterprise, and AMR Research

Clickstream

ERP,CRM,SCM

Web&social

Geolocation

InternetofThings

Serverlogs

Files,emails

Transformeveryindustryviafullfidelityofdataandanalytics

Opportunity

T R A D I T I O N A L

LAGGARDS

LEADERS

AbilitytoConsumeData

EnterpriseBlindSpot

Page 15: Hadoop Crash Course

15 ©HortonworksInc.2011– 2016.AllRightsReserved

Whatdisruptedthedatacenter?

?

Data?

Page 16: Hadoop Crash Course

16 ©HortonworksInc.2011– 2016.AllRightsReserved

TraditionalArchitectureanditsgaps

Page 17: Hadoop Crash Course

17 ©HortonworksInc.2011– 2016.AllRightsReserved

Observation

Interaction

Intelligence

Page 18: Hadoop Crash Course

18 ©HortonworksInc.2011– 2016.AllRightsReserved

SystemsofIntelligence

SystemsofEngagements

SystemsofInteractions

DataSystems

18

SystemsofRecord

EventsInGray

ActionableIntelligence

OperatorsDevelopers

Products

AnalyticsIn

Green

SystemsofInsight

Page 19: Hadoop Crash Course

19 ©HortonworksInc.2011– 2016.AllRightsReserved

19

ModernDataApplicationsDataScopeNext Generation Analytics

Iterative & ExploratoryData is the structure

Traditional AnalyticsStructured & Repeatable

Structure built to store data

Capacity constrained down sampling of available information Whole population analytics connects the dots

Carefully cleanse all information before any analysis Analyze information as is & cleanse as needed

Page 20: Hadoop Crash Course

20 ©HortonworksInc.2011– 2016.AllRightsReserved

RDBMS

Sales

NoSQL

Unstructured

Visualization&Dashboards

BusinessAnalytics

DataMarts

DataMarts Archive

StatisticsOLAP

EDW

FileServer

ClickstreamLogs

Web&SocialLogs

AudioVideo

LogsLogs

Logs

Geolocation

JSON

ETL

POS CRM ERP

ECM

Filter

AppServer

MessageBus

Documents

Page 21: Hadoop Crash Course

21 ©HortonworksInc.2011– 2016.AllRightsReserved

RDBMS

Sales

NoSQL

Unstructured

Visualization&Dashboards

BusinessAnalytics

DataMarts

DataMarts Archive

StatisticsOLAP

EDW

FileServer

ClickstreamLogs

Web&SocialLogs

AudioVideo

LogsLogs

Logs

Geolocation

JSON

ETL

POS CRM ERP

ECM

Filter

AppServer

MessageBus

Documents

à Tooexpensiveandslowasdatagrowthkeepsaccelerating

à Tooslowtogetthedatapreparedforanalytics

à Analyticsisonlyleveragingalimiteddataset

à Colddatabecomesarchivedandisnolongerusableforanalytics

à DataingestisrigidandslowfornewIoAT datatypes

à Limitedrealtimeinsights

TraditionalDataArchitectureChallengeswithBigData

Page 22: Hadoop Crash Course

22 ©HortonworksInc.2011– 2016.AllRightsReserved

RDBMS

Sales

NoSQL

Unstructured

Visualization&Dashboards

BusinessAnalytics

DataMarts

DataMarts Archive

StatisticsOLAP

EDW

FileServer

ClickstreamLogs

Web&SocialLogs

AudioVideo

LogsLogs

Logs

Geolocation

JSON

ETL

POS CRM ERP

ECM

Filter

AppServer

MessageBus

Documents

Page 23: Hadoop Crash Course

What’sApacheHadoop?

Page 24: Hadoop Crash Course

24 ©HortonworksInc.2011– 2016.AllRightsReserved

HadoopArchitecture

DataAccessEngines

DistributedReliableStorage

DistributedComputeFrameworkResourceManagement,DataLocalityDataOperatingSystem

Batch Interactive Real-time

Governance&

IntegrationSecurity

Applications

DeployAnywhere

Page 25: Hadoop Crash Course

25 ©HortonworksInc.2011– 2016.AllRightsReserved runson

ETL

RDBMSImport/Export

DistributedStorage&ProcessingFramework

SecureNoSQL DB

SQLonHBase

NoSQL DB

WorkflowManagement

SQL

StreamingDataIngestion

ClusterSystemOperations

SecureGateway

DistributedRegistry

ETL

Search&Indexing

EvenFasterDataProcessing

DataManagement

MachineLearning

HadoopEcosystem

Page 26: Hadoop Crash Course

26 ©HortonworksInc.2011– 2016.AllRightsReserved

OpenEnterpriseHadoopCapabilities

YARN : Data Operating System

DATA ACCESS SECURITYGOVERNANCE & INTEGRATION OPERATIONS

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

°

N

Data Lifecycle & Governance

FalconAtlas

AdministrationAuthenticationAuthorizationAuditingData Protection

RangerKnoxAtlasHDFSEncryptionData Workflow

SqoopFlumeKafkaNFSWebHDFS

Provisioning, Managing, & Monitoring

AmbariCloudbreakZookeeper

Scheduling

Oozie

Batch

MapReduce

Script

Pig

Search

Solr

SQL

Hive

NoSQL

HBaseAccumuloPhoenix

Stream

Storm

In-memory

Spark

Others

ISV Engines

Tez Tez Slider Slider

DATA MANAGEMENT

HortonworksDataPlatform

DeploymentChoiceLinux Windows On-Premise Cloud

HDFS Hadoop Distributed File System

Page 27: Hadoop Crash Course

27 ©HortonworksInc.2011– 2016.AllRightsReserved

HORTONWORKSDATAPLATFORM

DATAMGMT

HDP2.2Dec2014

HDP2.1April2014

HDP2.0Oct2013

HDP2.2Dec2014

HDP2.1April2014

HDP2.0Oct2013

2.2.0

2.4.0

2.6.0

OngoingInnovationinApache

HDFSYARNMapReduceHadoopCore

WhatisApacheHadoop?

Yahoo!2006

HortonworksOct2011

Yahoo!startfocusonmultipleHadoopapps&clustersContributesHadooptoApache

2008

HDP1.0Oct2012

ApacheHadoopv2YARN

GooglepublishesGFS&MapReduce papers2004-2005

HDP2.4March2016 2.7.1

HDP2.2Dec2014HDP2.3July2015 2.7.1

HDP 2.5Aug2016 2.7.3

Page 28: Hadoop Crash Course

28 ©HortonworksInc.2011– 2016.AllRightsReserved

Apache Hadoop = Storage + Compute

storage storage

storage storage

HadoopDistributedFileSystem(HDFS)

CPU RAM

YetAnotherResourceNegotiator(YARN)

Page 29: Hadoop Crash Course

29 ©HortonworksInc.2011– 2016.AllRightsReserved

`

+ /directory/structure/in/memory.txt

Resource management + schedulingDisk, CPU, Memory

CoreNameNode

HDFS

ResourceManagerYARN

Hadoop daemon

User application

NN

RM

DataNodeHDFS

NodeManagerYARN

Worker Node

Page 30: Hadoop Crash Course

30 ©HortonworksInc.2011– 2016.AllRightsReserved

Hadoop Distributed File System (HDFS)

Fault Tolerant Distributed Storage• Dividefilesintobigblocksanddistribute3copiesrandomly acrossthecluster• ProcessingDataLocality

• NotJuststoragebutcomputation

10110100101001001110011111100101001110100101001011001001010100110001010010111010111010111101101101010110100101010010101010101110010011010111010

0

Logical File

1

2

3

4

Blocks

1

Cluster

1

1

2

22

3

3

34

44

Page 31: Hadoop Crash Course

31 ©HortonworksInc.2011– 2016.AllRightsReserved

HDFSStorageArchitecture- Before

Before• DataNode isasinglestorage• Storageisuniform- OnlystoragetypeDisk

• Storagetypeshiddenfromthefilesystem

Alldisksassinglestorage

Page 32: Hadoop Crash Course

32 ©HortonworksInc.2011– 2016.AllRightsReserved

CloudStorage

HDFSStorageArchitecture- Now

New Architecture• DataNode isacollectionofstorages• Supportdifferenttypesofstorages

– Disk,SSDs,Memory

Block Storage Policies– DescribeshowtostoredatablocksinHDFS

Collectionoftierstorage

Page 33: Hadoop Crash Course

33 ©HortonworksInc.2011– 2016.AllRightsReserved

It Looks Like a File System

Page 34: Hadoop Crash Course

34 ©HortonworksInc.2011– 2016.AllRightsReserved

Batch Processing in HadoopMapReduceBatch Access to DataOriginal data access mechanism for Hadoop

• FrameworkMadefordevelopingdistributedapplicationstoprocessvastamountsofdatain-parallelonlargeclusters

• ProvenReliableinterfacetoHadoopwhichworksfromGBtoPB.But,batchoriented– Speedisnotit’sstrongpoint.

• EcosystemPortedtoHadoop2torunonYARN.SupportsoriginalinvestmentsinHadoopbycustomersandpartnerecosystem.

DataNode1Mapper

Dataisshuffledacrossthenetwork

&sorted

MapPhase Shuffle/Sort ReducePhase

MapReduce JobLifecycle

SayingthatMapReduceisdeadispreposterous- Wouldlimitsustoonlynewworkloads

- ALLHadoop clustersusemapreduce

- ProvenatEnterpriseScale

DataNode2Mapper

DataNode3Mapper

DataNode1Reducer

DataNode2Reducer

DataNode3Reducer

YARN:DataOperatingSystem

Interactive Real-TimeBatch

Page 35: Hadoop Crash Course

35 ©HortonworksInc.2011– 2016.AllRightsReserved

What is MapReduce?Break a large problem into sub-solutionsMap

• Iterate over a large # of records

• Extract something of interest fromeach record

Shuffle

• Sort Intermediate results

Reduce

• Aggregate, summarize, filter or transform intermediate results

• Generate final output

MapProcess

MapProcess

MapProcess

MapProcess

Data

DataData

Data

DataData

DataData

DataData

Data

DataData MapProcess

ReduceProcess

ReduceProcess

Data

Read&ETL

Shuffle&Sort Aggregation

Data

DataData

Data

Data

Data

Data

Data

Page 36: Hadoop Crash Course

36 ©HortonworksInc.2011– 2016.AllRightsReserved

1st GenHadoop:CostEffectiveBatchatScale

HADOOP1.0BuiltforWeb-ScaleBatchApps

SingleAppBATCH

HDFS

SingleAppINTERACTIVE

SingleAppBATCH

HDFS

SiloscreatedfordistinctusecasesSingleApp

BATCH

HDFS

SingleAppONLINE

Page 37: Hadoop Crash Course

37 ©HortonworksInc.2011– 2016.AllRightsReserved

HadoopemergedasfoundationofnewdataarchitectureApacheHadoopisanopensourcedataplatformformanaginglargevolumesofhighvelocityandvarietyofdata

• BuiltbyYahoo!tobetheheartbeatofitsad&searchbusiness

• DonatedtoApacheSoftwareFoundationin2005withrapidadoptionbylargewebproperties&earlyadopterenterprises

• Incrediblydisruptivetocurrentplatformeconomics

TraditionalHadoopAdvantagesüManagesnewdataparadigmü Handlesdataatscaleü Costeffectiveü Opensource

TraditionalHadoopHadLimitationsBatch-onlyarchitectureSinglepurposeclusters,specificdatasetsDifficulttointegratewithexistinginvestmentsNotenterprise-grade

Application

StorageHDFS

Batch ProcessingMapReduce

Page 38: Hadoop Crash Course

38 ©HortonworksInc.2011– 2016.AllRightsReserved

YARNextendsHadoopintodatacenterleaders

YARNThe Architectural Center of Hadoop

• Common data platform, many applications

• Support multi-tenant access & processing

• Batch, interactive & real-time use cases

• Supports 3rd-party ISV tools

(ex. SAS, Syncsort, Actian, etc.)

YARN Ready Applications Facilitates ongoing innovation and enterprise adoption via ecosystem of new and existing “YARN Ready” solutions

YARN : Data Operating System

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

°

N

HDFS Hadoop Distributed File System

DATA MANAGEMENT

Batch

MapReduce

Script

Pig

Search

Solr

SQL

Hive

NoSQL

HBaseAccumuloPhoenix

Stream

Storm

In-memory

Spark

Others

ISV Engines

Tez Tez Slider Slider

Page 39: Hadoop Crash Course

39 ©HortonworksInc.2011– 2016.AllRightsReserved

WhatdoesiOS 6andWindows3.1haveincommon?

Page 40: Hadoop Crash Course

40 ©HortonworksInc.2011– 2016.AllRightsReserved

HadoopBeyondBatchwithYARN

SingleUseSysztemBatchApps

MultiUseDataPlatformBatch,Interactive,Online,Streaming,…

Ashiftfromtheoldtothenew…

HADOOP 1

MapReduce(cluster resource management

& data processing)

Data FlowPig

SQLHive

Others

API,Engine,

andSystem

YARN(Data Operating System: resource management, etc.)

Data FlowPig

SQLHive

OtherISV

Apache Yarn as a Base

System

Engine

API’s

1 ° ° ° ° °

° ° ° ° ° N

HDFS (redundant, reliable storage)

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° N

HDFS (redundant, reliable storage)

BatchMapReduce

Tez Tez

MapReduce as the BaseHADOOP 2

Page 41: Hadoop Crash Course

41 ©HortonworksInc.2011– 2016.AllRightsReserved

HadoopWorkloadEvolution

SingleUseSystemBatchApps

MultiUseDataPlatformBatch,Interactive,Online,Streaming,…

Ashiftfromtheoldtothenew… MultiUsePlatformData&Beyond

HADOOP 1

YARN

HADOOP 2

1 ° ° ° °

° ° ° ° N

HDFS (redundant, reliable storage)

1 ° ° °

° ° ° N

HDFS

MapReduce

HADOOP.Next

YARN ‘

1 ° ° ° ° ° °

° ° ° ° ° ° N

HDFS (redundant, reliable storage)

DATA ACCESS APPS

Docker

MySQLMR2 Others(ISV Engines)

Multiple(Script, SQL, NoSQL, …)

MR2 Others(ISV Engines)

Multiple(Script, SQL, NoSQL, …)

Docker

Tomcat

Docker

Other

Page 42: Hadoop Crash Course

42 ©HortonworksInc.2011– 2016.AllRightsReserved

What’snewinApacheHadoop3.0?

StorageOptimizationHDFS:Erasurecodes

ImprovedUtilizationYARN:LongRunningServicesYARN:ScheduleEnhancements

AdditionalWorkloadsYARN:Docker&Isolation

EasiertoUseNewUserInterface

RefactorBaseLots ofTrunkcontentJDK8 andnewerdependentlibraries

- 3.0.0-alpha1- Sep/3/2016- Alpha2- Jan/25/2017- Alpha3- Q22017(Estimated)- Beta/GA- Q3/Q42017(Estimated)

ReleaseTimeline

3.0

Page 43: Hadoop Crash Course

43 ©HortonworksInc.2011– 2016.AllRightsReserved

Gartner:WhatisHadoop?

à CommonApacheProjects– ALL=7(6)– Exceptfor1=3(5)– Exceptfor2=4(4)² About14CommonProjects

à UncommonProjects– Only1=9(1)– Only2=7 (2)– Only3=6 (3)² About22UncommonProjects

http://blogs.gartner.com/merv-adrian/2015/07/02/now-what-is-hadoop/

ODPi

ODPi

ODPi

ODPi

ODPi ODPi ODPi

Hortonworks.com/tutorials

Page 44: Hadoop Crash Course

44 ©HortonworksInc.2011– 2016.AllRightsReserved

1446MarketStreetSanFrancisco,CA94102

HORTONWORKSDATAPLATFORM

Hado

op&YAR

N

Flum

e

Oozie

Pig

Hive

Tez

Sqoo

p

Amba

ri

Slider

Kafka

Knox

Solr

Zookeepe

r

Spark

Falcon

Ranger

HBase

Atlas

Accumulo

Storm

Phoe

nix

4.10.2

DATAMGMT DATAACCESS GOVERNANCE&INTEGRATION OPERATIONS SECURITY

HDP2.2Dec2014

HDP2.1April2014

HDP2.0Oct2013

HDP2.2Dec2014

HDP2.1April2014

HDP2.0Oct2013

0.12.0 0.12.0

0.12.1 0.13.0 0.4.0

1.4.4 1.4.4 3.3.23.4.5

0.4.00.5.0

0.14.0 0.14.0 3.4.6 0.5.0 0.4.00.9.30.5.2

4.0.04.7.2

1.2.1 0.60.0 0.98.4 4.2.0 1.6.1 0.6.0 1.5.21.4.5 4.1.02.0.0

1.4.0 1.5.1 4.0.0

1.3.1

1.5.1 1.4.4 3.4.5

2.2.0

2.4.0

2.6.0

2.7.1 1.4.6 0.6.0 0.5.02.1.00.8.2 3.4.61.5.25.2.1 0.80.0 0.5.01.7.04.4.0 0.10.0 0.6.10.7.01.2.10.15.0HDP2.3Oct2015 4.2.0

0.96.1

0.98.0 0.9.1

0.8.1

1.4.1 1.1.2

2.7.3 1.4.6 0.11.0 0.7.02.5.00.10.1.0 3.4.61.5.25.5.1 0.91.0 0.8.01.7.04.7.0 1.1.0 0.10.00.7.01.2.1+2.1***0.16.0

HDP2.6*1H2017

4.2.01.6.3+2.1** 1.1.2

2.7.1 1.4.6 0.6.0 0.5.02.2.10.9.0 3.4.61.5.25.2.1 0.80.0 0.5.01.7.04.4.0 0.10.0 0.6.10.7.01.2.10.15.0HDP2.4Mar2016 4.2.01.6.0 1.1.2

Zepp

elin

OngoingInnovationinApache

0.7.0

*HDP2.6– ShowscurrentApachebranchesbeingused.FinalcomponentversionsubjecttochangebasedonApachereleaseprocess.**Spark1.6.3+Spark2.1– HDP2.6supportsbothSpark1.6.3andSpark2.1asGA***Hive2.1isGAwithinHDP2.6.

2.7.3 1.4.6 0.9.0 0.6.02.4.00.10.0 3.4.61.5.25.5.1 0.91.0 0.7.01.7.04.7.0 1.0.1 0.10.00.7.01.2.1+2.1***0.16.0HDP2.5

Aug20164.2.01.6.2+

2.0** 1.1.20.6.0

Druid

0.9.2

Page 45: Hadoop Crash Course

45 ©HortonworksInc.2011– 2016.AllRightsReserved

NextGenerationDataVendorsInvestmentfortheEnterprise

VerticalIntegration with YARN and HDFSEnsure engines can run reliably and respectfully in a YARN based cluster

Provision, Manage & Monitor

AmbariZookeeper

Scheduling

Oozie

Loaddataandmanageaccordingtopolicy

Providelayeredapproachto

securitythroughAuthentication,Authorization,Accounting,andDataProtection

SECURITYGOVERNANCE

Deployandeffectivelymanagetheplatform

° ° ° ° ° ° ° ° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

JavaScala

Cascading

Stream

Storm

Search

Solr

NoSQL

HBaseAccumulo

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Others

ISV Engines

1 ° ° ° ° ° ° ° ° ° ° ° ° ° °

YARN: Data Operating System(ClusterResourceManagement)

HDFS (Hadoop Distributed File System)

Tez Slider SliderTez Tez

OPERATIONS

Horizontal Integration for Enterprise ServicesEnsure consistent enterprise services are applied across the Hadoop stack

Page 46: Hadoop Crash Course

46 ©HortonworksInc.2011– 2016.AllRightsReserved

Whatdodistributionsdo?

à Defineastackofcomponents• RichandlatestsetofApacheProjects(opensource&opencommunity)withoutlockin

à VerticalandHorizontalintegrationofcomponents• Vertical:BestSpeedandScale• Horizontal:OpenEnterpriseReady

à ProvisionandUpgradestack• Robust,EasyandAnywhere

à Acceleratetimetovalue(easyofuse)• NewFaceofHadoopwithUis fromAmbari,AmbariViews,Ranger,Falcon,Atlas

à PartnerEcosystem• RichandDeep

à Support• Industry’sbest,SmartSense andinfluencecommunity

Page 47: Hadoop Crash Course

HadoopOperations&Tools

Page 48: Hadoop Crash Course

48 ©HortonworksInc.2011– 2016.AllRightsReserved

How Do You Operate a Hadoop Cluster?

Apache™ Ambari isaplatformtoprovision,manageandmonitorHadoopclusters

Page 49: Hadoop Crash Course

49 ©HortonworksInc.2011– 2016.AllRightsReserved

Ambari Core Features and Extensibility

Install&Configure

Operate,Manage&Administer

Develop

Optimize&Tune

Developer

DataArchitect

Ambariprovidescoreservicesforoperations,developmentandextensionspointsforboth

ExtensibilityFeatures

Stacks,Blueprints&RESTAPIs

CoreFeatures

InstallWizard&Web

Web,OperatorViews,Metrics&Alerts

UserViews

UserViews

ViewsFramework&RESTAPIs

ViewsFramework

ViewsFramework

How?

ClusterAdmin

Page 50: Hadoop Crash Course

50 ©HortonworksInc.2011– 2016.AllRightsReserved

Newuserinterfaceenablesfast&easySQLdefinitionandexecution.

Page 51: Hadoop Crash Course

51 ©HortonworksInc.2011– 2016.AllRightsReserved

DataWorkerandDevOpsTooling

à Apluggablewaytoprovideacommonuserexperienceacrossmultipleuserpersonas.

Ambari Views

HDP

SystemAdmin/operators

DataWorkers

ApplicationDevelopers

AMBAR I

à Singlepointofentryforallusers.

à OpenCommunity

Page 52: Hadoop Crash Course

52 ©HortonworksInc.2011– 2016.AllRightsReserved

New User Views for DevOps

CapacitySchedulerViewBrowseandmanageYARNqueues

Tez ViewViewinformationrelatedtoTez jobsthatareexecutingonthecluster

Page 53: Hadoop Crash Course

53 ©HortonworksInc.2011– 2016.AllRightsReserved

NewUserViewsforDevelopment

PigViewAuthorandexecutePigScripts.

HiveViewAuthor,executeanddebugHive

queries.

FilesViewBrowseHDFSfilesystem.

Page 54: Hadoop Crash Course

54 ©HortonworksInc.2011– 2016.AllRightsReserved

ApacheZeppelin

• Web-basednotebookfordataengineers,dataanalystsanddatascientists• Bringsinteractivedataingestion,data

exploration,visualization,sharingandcollaborationfeaturestoHadoopandSpark

• Moderndatasciencestudio• ScalawithSpark• PythonwithSpark• SparkSQL• ApacheHive,andmore.

Page 55: Hadoop Crash Course

HadoopDataAccess

Page 56: Hadoop Crash Course

56 ©HortonworksInc.2011– 2016.AllRightsReserved

Access patterns enabled by YARN

YARN: Data Operating System

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° °

°

°N

HDFS Hadoop Distributed File System

Interactive Real-TimeBatch

Applications BatchNeeds to happen but, no timeframe limitations

InteractiveNeeds to happen at Human time

Real-Time Needs to happen at Machine Execution time.

Page 57: Hadoop Crash Course

57 ©HortonworksInc.2011– 2016.AllRightsReserved

Apache Hive: SQL in Hadoop

• Created by a team at Facebook

• Provides a standard SQL interface to data stored in Hadoop

• Quickly find value in raw data files

• Proven at petabyte scale

• Compatible with ALL major BI tools such as Tableau, Excel, MicroStrategy, Business Objects, etc…

SensorMobile

WeblogOperational

/MPP

SQLQueries

Page 58: Hadoop Crash Course

Page 58 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hive Architecture

User issues SQL query

Hive parses and plans query

Query converted to MapReduce and executed on Hadoop

2

3

Web UI

JDBC / ODBC

CLIHiveSQL

1

1HiveServer2 Hive

MR/Tez Compiler

Optimizer

Executor

2

Hive

MetaStore(MySQL, Postgresql,

Oracle)

MapReduce, Tez or Spark Job

Data DataData

Hadoop 3Data-local processing

Page 59: Hadoop Crash Course

59 ©HortonworksInc.2011– 2016.AllRightsReserved

Hive and the Stinger Initiative

BaseOptimizationsGeneratesimplifiedDAGsIn-memoryHashJoins

VectorQueryEngineOptimizedformodernprocessor

architectures

TezExpresstasksmoresimply

EliminatediskwritesPre-warmedContainers

ORCFileColumnStore

HighCompressionPredicate/FilterPushdowns

YARNNext-genHadoopdataprocessing

framework

+ +

QueryPlannerIntelligentCost-BasedOptimizer

PerformanceOptimizations100x+fastertimetoinsightDeeperanalyticalcapabilities

Page 60: Hadoop Crash Course

60 ©HortonworksInc.2011– 2016.AllRightsReserved

Stinger.next andSub-SecondSQL

Emergence of LLAP brings Sub-Second SQL response times within reach with Hive.

BATCH & INTERACTIVE BATCH & INTERACTIVE BATCH, INTERACTIVE & SUB-SECONDSPEED

DELIVERY

SQL

UPDATES

ENGINES

STINGERD E L I V E R E D

PROGRESSD E L I V E R E D FINALVERSION

HDP 2.1VERSION

0.13VERSION

HDP 2.3VERSION

1.2.1

SQL:2003+ SQL:2011 SUBSET

READ-ONLY SQL INSERT/UPDATE/DELETE

MR, TEZ MR, TEZ

F U T U R ES T I N G E R N E X T

COMPLETE ACID SUPPORT INCLUDING MERGE

COMPREHENSIVE SQL:2011 BASED ANALYTICS

MR, TEZ, LLAP

DELIVERED IN DEVELOPMENT

TieredDataStorage

Stinger.next Phase3

YARN:ContainerizedApplications

Page 61: Hadoop Crash Course

61 ©HortonworksInc.2011– 2016.AllRightsReserved

DataTypes SQL Features File Formats Latest Additions…Numeric CoreSQLFeatures Columnar ScalableCrossProduct

FLOAT/DOUBLE Date,Time andArithmeticalFunctions ORCFile PrimaryKey/Foreign KeyDECIMAL INNER,OUTER,CROSSandSEMIJoins Parquet Non-EquijoinINT/TINYINT/SMALLINT/BIGINT DerivedTableSubqueries

TextTechPreview:Proc.Extensions(PL/SQL)

BOOLEAN Correlated+ UncorrelatedSubqueries CSV FutureString UNIONALL Logfile ACIDMERGE

CHAR/VARCHAR UDFs, UDAFs,UDTFs Nested/Complex MultiSubquerySTRING CommonTableExpressions Avro Comparisontosub-selectBINARY UNIONDISTINCT JSON INTERSECT andEXCEPT

Date, Time AdvancedAnalytics XMLDATE OLAPandWindowingFunctions CustomFormatsTIMESTAMP CUBE andGroupingSets OtherFeaturesIntervalTypes NestedDataAnalytics XPath Analytics

ComplexTypes NestedDataTraversalARRAY LateralViewsMAP ACIDTransactionsSTRUCT INSERT/UPDATE/DELETEUNION

ApacheHive:JourneytoSQL:2011Analytics

LegendExisting

Future

NewwithHive2.0

Page 62: Hadoop Crash Course

62 ©HortonworksInc.2011– 2016.AllRightsReserved

Stor

age

Columnar Storage

ORCFile Parquet

Unstructured Data

JSON CSV

Text Avro

Custom

Weblog

Engi

ne

SQL Engines

RowEngine VectorEngineSQ

L

SQL Support

SQL:2011 Optimizer HCatalog HiveServer2

Cac

he

Block Cache

LinuxCache

Dis

tribu

ted

Exec

utio

n

Hadoop 1

MapReduce

Hadoop 2

Tez Spark

Vector Cache

LLAP

Persistent Server

Historical

Current

In-development

Legend

Apache Hive: Modern Architecture

Page 63: Hadoop Crash Course

63 ©HortonworksInc.2011– 2016.AllRightsReserved

ApacheTezisacriticalinnovationoftheStingerInitiative.

• Along with YARN, Tez not only improves Hive, but improvesallthingsbatchand interactiveforHadoop;Pig,Cascading…

• More Efficient Processing than MapReduce

• Reduceoperationsandcomplexityofbackendprocessing• AllowsforMapReduceReducewhichsavesharddiskoperations• Implementsa“service”whichisalwayson,decreasingstarttimesofjobs• AllowsCachingofDatainMemory

YARN

Dev

Cascading/ Scalding

WhyisTez Important?

°1 ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° °

°

°°

° ° ° ° ° ° °

° ° ° ° ° ° N

HDFS (Hadoop Distributed File

System)

Scripting

Pig

SQL

Hive

Tez Tez

Applications

Tez

YARN:DataOperatingSystem

Interactive Real-TimeBatch

Page 64: Hadoop Crash Course

64 ©HortonworksInc.2011– 2016.AllRightsReserved

ApacheTez

Hive– MapReduce Hive– Tez

SELECT a.state, COUNT(*), AVG(c.price) FROM a

JOIN b ON (a.id = b.id)JOIN c ON (a.itemId = c.itemId)

GROUP BY a.state

SELECTa.state

JOIN(a,c)SELECTc.price

SELECTb.id

JOIN(a,b)GROUPBYa.state

COUNT(*)AVG(c.price)

M M M

R R

M M

R

M M

R

M M

R

HDFS

HDFS

HDFS

M M M

R R

R

M M

R

R

SELECTa.state,c.itemId

JOIN(a,c)

JOIN(a,b)GROUPBYa.state

COUNT(*)AVG(c.price)

SELECTb.id

Tez avoidsunneededwritestoHDFS

Tez allowsReducer-onlyjobswithintheDAG

Page 65: Hadoop Crash Course

65 ©HortonworksInc.2011– 2016.AllRightsReserved

Sub-secondQueriesinHive:LLAP(LiveLongandProcess)

à Persistentdaemons– Savestimeonprocessstartup(eliminatescontainerallocationandJVMstartuptime)– AllcodeJITed withinaqueryortwo

à Datacachingwithanasync I/Oelevator– Hotdatacachedinmemory(columnaraware,soonlyhotcolumnscached)– Whenpossibleworkscheduledonnodewithdatacached,ifnotworkwillberuninothernode

à OperatorscanbeexecutedinsideLLAPwhenitmakessense– Large,ETLstylequeriesusuallydon’tmakesense– UsercodenotruninLLAPforsecurity

à Workingoninterfacetoallowotherdataenginestoreadsecurelyinparallel

à Betain2.0

Page 66: Hadoop Crash Course

66 ©HortonworksInc.2011– 2016.AllRightsReserved

Hive2withLLAP:ArchitectureOverview

Deep

Storage

HDFS S3+OtherHDFSCompatibleFilesystems

YARNCluster

LLAPDaemon

QueryExecutors

In-MemoryCache

LLAPDaemon

QueryExecutors

In-MemoryCache

LLAPDaemon

QueryExecutors

In-MemoryCache

LLAPDaemon

QueryExecutors

In-MemoryCache

QueryCoordinators

Coord-inator

Coord-inator

Coord-inator

HiveServer2(Query

Endpoint)

ODBC/JDBC SQL

Queries

Page 67: Hadoop Crash Course

67 ©HortonworksInc.2011– 2016.AllRightsReserved

HiveWithLLAPExecutionOptions

AM AM

T T T

R R

R

T T

T

R

M M M

R R

R

M M

R

R

Tez Only LLAP + Tez

T T T

R R

R

T T

T

R

LLAP only

Page 68: Hadoop Crash Course

68 ©HortonworksInc.2011– 2016.AllRightsReserved

Scripting Data Pipeline & ETLApache Pig

• Dataflowengineandscriptinglanguage(PigLatin)

• Allowsyoutotransform dataanddatasets

Advantages over MapReduce• Reducestimetowritejobs• Communitysupport• PiggybankhasasignificantnumberofUDF’stohelpadoption• TherearealargenumberofexistingshopsusingPIG

YARN:DataOperatingSystem

Interactive Real-TimeBatch

Page 69: Hadoop Crash Course

69 ©HortonworksInc.2011– 2016.AllRightsReserved

WhyusePig?

• Maybewewanttojointwodatasets,fromdifferentsources,onacommonvalue,andwanttofilter,andsort,andgettop5sites

Page 70: Hadoop Crash Course

70 ©HortonworksInc.2011– 2016.AllRightsReserved

Resource Management

Storage

Elegant Developer APIsDataFrames, Machine Learning, and SQL

Made for Data ScienceAll apps need to get predictive at scale and fine granularity

Democratize Machine LearningSpark is doing to ML on Hadoop what Hive did for SQL on Hadoop

CommunityBroad developer, customer and partner interest

Realize Value of Data Operating SystemA key tool in the Hadoop toolbox

ApacheSparkenthusiasm

Applications

SparkCoreEngine

ScalaJava

Pythonlibraries

MLlib(Machinelearning)

SparkSQL*

SparkStreaming*

SparkCoreEngine

Page 71: Hadoop Crash Course

71 ©HortonworksInc.2011– 2016.AllRightsReserved

Apache Spark & Apache Hadoop Perfect Together

General Purpose Data Access Engineforfast,large-scaledataprocessing

Designed for Iterative, In-Memorycomputationsandinteractivedatamining

Expressive Multi-Language APIsforJava,Scala,PythonandR

Built-in LibrariesEnabledataworkerstorapidlyiterateoverdatafor:ETL,MachineLearning,SQLandStreamprocessing

YARN

ScalaJava

PythonR

APIs

Spark Core Engine

Spark SQL

Spark StreamingMLlib GraphX

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

°

NHDFS

Page 72: Hadoop Crash Course

72 ©HortonworksInc.2011– 2016.AllRightsReserved

Apache Projects Enable Access Patterns

Various open source projects have incubated in order to meet these access pattern needs

Today, they can all run on a single cluster on a single set of data because of YARN

All powered by a broad open community

YARN: Data Operating System

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° °

°

°N

HDFS Hadoop Distributed File System

InteractiveSolr

SparkHivePig

Real-TimeHBase

AccumuloStorm

BatchMapReduce

Applications

Kafka

Page 73: Hadoop Crash Course

EvolvewithDataTrendsbyUsingOpenSource

à KeepupwithrapidlyevolvingDataTrends

à DifferentiateyourbizwithInsightsfromData

à Speedupyouradaptationtoconstantchangebyusingopensource

à AModernDataArchitectureisaJourney

Page 74: Hadoop Crash Course

HandsonLabOverview

Page 75: Hadoop Crash Course

HDP2.5Sandbox

à ProvidesFreepreconfiguredHDP– RunsinaVirtualMachine,

DockerorAzureHortonworks.com/sandbox

à EasytoUse– Operations

• Ambari– DevandDevOps

• AmbariUserViews– WebNotebook

• Zeppelin

à Workswith60+FreetutorialHortonworks.com/tutorials

Page 76: Hadoop Crash Course

SandboxStart

à ImportSandboxImage

à Takesabout4min

à SplashScreen– VirtualBoxSampleSplashScreen– SavethehostIP

Page 77: Hadoop Crash Course

http://localhost:8888

Page 78: Hadoop Crash Course

DataDiscoveryLab• Elefante WineCompanyhasafleetofover100trucks.

• Thegeolocationdatacollectedfromthetruckscontainseventsgeneratedwhilethetruckdriversaredriving.

• Thecompany’sgoalwithHadoopistoMitigateRisk:o Understandcorrelationsbetweenmilesdrivenandeventso Computetheriskfactorforeachdriverbasedonmileage&events

o LabEnvo Sandbox2.5

o LabDoco URL:http://tinyurl.com/hello-hdpo LoadDatao QueryDatao ProcessData

Page 79: Hadoop Crash Course

Elefante Wine Current Challenges

The CompanyElefante Wine is a boutique wine fulfillment company with a large fleet of trucks. It delivers wine in a highly-regulated industry with stringent transportation requirements.

The SituationRecently a number of driver violations led to fines and increased insurance rates

The Challenges• Rising Operational Costs• Driver Safety• Risk Management• Logistics Optimization

Page 80: Hadoop Crash Course

© Hortonworks Inc. 2012 Professional Services

Elefante WineCompanyhasalargefleetoftrucksinUSA

Atruckgeneratesmillionsofeventsforagivenroute;aneventcouldbe:

§ 'Normal'events:starting/stoppingofthevehicle

§ ‘Violation’events:speeding,excessiveaccelerationandbreaking,unsafetaildistance

Companyusesanapplicationthatmonitorstrucklocationsandviolationsfromthetruck/driverinreal-timetocalculaterisk

Route?Truck?Driver?

Analystsqueryabroadhistorytounderstandiftoday’sviolationsarepartofalargerproblemwithspecificroutes,trucks,ordrivers

Page 81: Hadoop Crash Course

Elefante Wine Risk and Driver Safety Challenges

Trucksoutfittedwithnewsensorsgeneratinglargevolumesofnewdata:

• Location

• Speed

• DriverViolations

Needtobeintegratereal-time&historicaldata

Increase safety and reduce liabilitiesAnticipate driver violations BEFORE they happen and take precautionary actionsFindpredictivecorrelationsindriverbehavioroverlargevolumesofreal-timedata

Difficult to deliver timely insights to the right people and systems to take action

Data DiscoveryUncover new findings

Predictive Analytics Identify your next best action

Better Understandingof the Past

Better Prediction of the Future

Page 82: Hadoop Crash Course

What’sourgoal?

à Solution:– CollectadditionaldataviasensorsintruckstobetterunderstandRiskFactors

à How:– Quicklystorenewsensordatainacommonrepository– Preparethedataforanalysis– Explorethedata– CalculateRisk– Generateareport

Page 83: Hadoop Crash Course

geolocation.csv

trucks.csv

TemporarytableB Geolocation

TemporarytableT Trucks

csv

csv ORC

ORCSQL

SQL

LOAD

LOAD

TemporaryTablesCreatedLoadedDeleted

Page 84: Hadoop Crash Course

Geolocation

Trucks

ORC

ORC

SQL

SQL

PIGorSparkRiskCalculation

Truck_mileage

ORC

Avg_mileage

ORC

DriverMileage

ORC

RiskFactor

ORC

Events

ORC

Trucking Risk Analysis – Hadoop ELT

Page 85: Hadoop Crash Course

CalculateRisk

Page 86: Hadoop Crash Course

GettingStartedResources

Page 87: Hadoop Crash Course

87 ©HortonworksInc.2011– 2016.AllRightsReserved

ConnectedDataArchitecturewithHDCforAWSMarketPlace

C LO U DIdealUseCases:DataScienceandExploration(Spark,Zeppelin)

ETLandDataPreparation(Hive,Spark)

AnalyticsandReporting(Hive2w/LLAP,Zeppelin)

CloudDataProcessing

(HDCforAWS)

Page 88: Hadoop Crash Course

88 ©HortonworksInc.2011– 2016.AllRightsReserved

BigDataTutorials

à GetStarted– hortonworks.com/tutorials– ApacheHadoop&Ecosystem

• hor.tn/hello-hdp– ApacheSpark

• hor.tn/spark-zep-intro– ApacheNiFi

• hor.tn/nifi-intro– UseCase

• IoT• SocialMedia

Page 89: Hadoop Crash Course

89 ©HortonworksInc.2011– 2016.AllRightsReserved

developer.hortonworks.com

Page 90: Hadoop Crash Course

90 ©HortonworksInc.2011– 2016.AllRightsReserved

HortonworksNourishestheCommunityHORTONWORKS

COMMUN I T Y CONNECT IONHORTONWORKS PARTNERWORKS

https://community.hortonworks.com

Page 91: Hadoop Crash Course

91 ©HortonworksInc.2011– 2016.AllRightsReserved

WanttocontinuethetechnicalIntroduction?

à HadoopSummitCrashCourses– Replays– Free

à hadoopsummit.org/san-jose/agenda– ApacheHadoop– ApacheSpark– ApacheNiFi– IoT &Streaming– DataScience

Page 92: Hadoop Crash Course

92 ©HortonworksInc.2011– 2016.AllRightsReserved

Thankyou!

[email protected]

@racoss

Page 93: Hadoop Crash Course

93 ©HortonworksInc.2011– 2016.AllRightsReserved

ProtectingtheElephantintheCastle…..Kerberos,

WireEncryption

HDFS Encryption

ApacheRangerNetworkSegmentation,

Firewalls

LDAP/AD

ApacheKnox