Top Banner

Click here to load reader

Apache Hadoop Crash Course - HS16SJ

Apr 16, 2017

ReportDownload

Technology

  • [email protected]#FutureOfData

  • 2 HortonworksInc.20112016.AllRightsReserved

    AgendaFutureofData

    TraditionalDataArchitectures

    WhatsApacheHadoop?

    DataAccesswithHadoop

    LabIntro

  • 3 HortonworksInc.20112016.AllRightsReserved

    CustomersarebuildingModernDataApplicationstotransformtheirindustriesrenovatingtheirITarchitecturesandinnovatingwiththeirDatainMotionorDataatResttopoweractionableintelligence.

    SocialMapping

    PaymentTracking

    FactoryYields

    DefectDetection

    CallAnalysis MachineDataProductDesign M&A

    DueDiligence

    NextProductRecs

    CyberSecurity

    RiskModeling

    AdPlacement

    ProactiveRepair

    DisasterMitigation

    InvestmentPlanning

    InventoryPredictions

    CustomerSupport

    SentimentAnalysis

    SupplyChain

    AdPlacement

    BasketAnalysis Segments

    Cross-Sell

    CustomerRetention

    VendorScorecards

    OptimizeInventories

    OPEXReduction

    MainframeOffloads

    HistoricalRecords

    DataasaService

    PublicData

    Capture

    FraudPrevention

    DeviceDataIngest

    RapidReporting

    DigitalProtection

    3 HortonworksInc.20112016.AllRightsReserved

  • FutureofData

  • 5 HortonworksInc.20112016.AllRightsReserved

    INTERNETOF

    ANYTHING

    TheFutureofDataisaboutactionableintelligencederivedfromaconstantlyconnectedsocietywitheasysecureaccesstorichdatasetscomingfromtheInternetofAnything

  • DataPowersHighwaySafety

  • 7 HortonworksInc.20112016.AllRightsReserved

    TirePressure

    Serverlog Mobile

    Sensor

    Location

    Precipitation

    Social

    Click-stream

    DataPowersHighwaySafety

  • 8 HortonworksInc.20112016.AllRightsReserved

    NewDataParadigmOpensUpNewOpportunity

    2.8zettabytesin2012

    44zettabytesin2020

    N E W

    1 zettabyte (ZB) = 1 million petabytes (PB); Sources: IDC, IDG Enterprise, and AMR Research

    Clickstream

    ERP,CRM,SCM

    Web&social

    Geolocation

    InternetofThings

    Server logs

    Files, emails

    Transformeveryindustryviafullfidelityofdataandanalytics

    Opportunity

    T R A D I T I O N A L

    LAGGARDS

    LEADERS

    AbilitytoConsumeData

    EnterpriseBlindSpot

  • 9 HortonworksInc.20112016.AllRightsReserved

    Whatdisruptedthedatacenter?

    ?

    Data?

  • 10 HortonworksInc.20112016.AllRightsReserved

    ModernDataApplications

    Polygot Persistence

    SQLNoSQL

    NewSQLSearch

    Graph

    At-Rest In-Motion

    AnalyticsDataVariety

    Integration

    DataLake Federation

    OptimizationStorage,Compute

    DistributedComputing

    CommodityHardware

    Cloud

    HybridDistributedComputing

  • 11 HortonworksInc.20112016.AllRightsReserved

    TheFutureofDataActionableIntelligence

    D AT A I N M O T I O N

    STORAG

    ESTORAG

    E

    GROUP2GROUP1

    GROUP4GROUP3

    D A T A A T R E S T

    INTERNETOF

    ANYTHING

    ConnectedDataPlatformsarepoweringActionableIntelligence

    Anyandalldatafromsensors,machines,

    geolocation,clicks,files,social.

    Securepoint-to-pointandbi-directionaldataflows

    Collectandcuratealldata.

  • 12 HortonworksInc.20112016.AllRightsReserved

    TraditionalDataArchitectures

  • 13 HortonworksInc.20112016.AllRightsReserved

    SystemsofIntelligence

    SystemsofEngagements

    SystemsofInteractions

    DataSystems

    13

    SystemsofRecord

    SystemsofInsight

    EventsInGray

    AnalyticsIn

    Green

    OperatorsDevelopers

  • 14 HortonworksInc.20112016.AllRightsReserved

    RDBMS

    Sales

    NoSQL

    Unstructured

    Visualization&Dashboards

    BusinessAnalytics

    DataMarts

    DataMarts Archive

    StatisticsOLAP

    EDW

    FileServer

    ClickstreamLogs

    Web&SocialLogs

    AudioVideo

    LogsLogs

    Logs

    Geolocation

    JSON

    ETL

    POS CRM ERP

    ECM

    Filter

    AppServer

    MessageBus

    Documents

  • 15 HortonworksInc.20112016.AllRightsReserved

    RDBMS

    Sales

    NoSQL

    Unstructured

    Visualization&Dashboards

    BusinessAnalytics

    DataMarts

    DataMarts Archive

    StatisticsOLAP

    EDW

    FileServer

    ClickstreamLogs

    Web&SocialLogs

    AudioVideo

    LogsLogs

    Logs

    Geolocation

    JSON

    ETL

    POS CRM ERP

    ECM

    Filter

    AppServer

    MessageBus

    Documents

    Tooexpensiveandslowasdatagrowthkeepsaccelerating

    Tooslowtogetthedatapreparedforanalytics

    Analyticsisonlyleveragingalimiteddataset

    Colddatabecomesarchivedandisnolongerusableforanalytics

    DataingestisrigidandslowfornewIoAT datatypes

    Limitedrealtimeinsights

    TraditionalDataArchitectureChallengeswithBigData

  • 16 HortonworksInc.20112016.AllRightsReserved

    RDBMS

    Sales

    NoSQL

    Unstructured

    Visualization&Dashboards

    BusinessAnalytics

    DataMarts

    DataMarts Archive

    StatisticsOLAP

    EDW

    FileServer

    ClickstreamLogs

    Web&SocialLogs

    AudioVideo

    LogsLogs

    Logs

    Geolocation

    JSON

    ETL

    POS CRM ERP

    ECM

    Filter

    AppServer

    MessageBus

    Documents

  • 17 HortonworksInc.20112016.AllRightsReserved

    Next Generation AnalyticsIterative & ExploratoryData is the structure

    IT TeamDelivers DataOn Flexible

    Platform

    BusinessUsers

    Explore andAsk Any Question

    Analyze ALL Available Information

    Whole population analytics connects the dots

    Traditional AnalyticsStructured & Repeatable

    Structure built to store data

    BusinessUsers

    DetermineQuestions

    IT TeamBuilds System

    To AnswerKnown Questions

    17

    Available Information

    AnalyzedInformation

    Capacity constrained down sampling of available information

    Carefully cleanse all information before any analysis

    AnalyzedInformation

    Analyze information as is & cleanse as needed

    AnalyzedInformation

    ModernDataApplications

  • 18 HortonworksInc.20112016.AllRightsReserved

    Next Generation AnalyticsIterative & ExploratoryData is the structure

    Traditional AnalyticsStructured & Repeatable

    Structure built to store data

    18

    ?AnalyzedInformation

    Question

    DataAnswer

    Hypothesis

    StartwithhypothesisTestagainstselecteddata

    Data leads the way Explore all data, identify correlations

    Data

    Correlation

    All Information

    Exploration

    Actionable Insight

    Analyzeafterlanding Analyzeinmotion

    ModernDataApplicationsHasTwoThemes

  • WhatsApacheHadoop?

  • 20 HortonworksInc.20112016.AllRightsReserved

    HadoopArchitecture

    DataAccessEngines

    DistributedReliableStorage

    DistributedComputeFrameworkResourceManagement,DataLocalityDataOperatingSystem

    Batch Interactive Real-time

    Governance&

    IntegrationSecurity

    Applications

    DeployAnywhere

  • 21 HortonworksInc.20112016.AllRightsReserved

    HadoopDataPlatformArchitecture

    StoreandprocessallofyourCorporateDataAssets

    YARN:DataOperatingSystem

    DATA MANAGEMENT

    Providelayeredapproachto

    securitythroughAuthentication,Authorization,Accounting,andDataProtection

    SECURITY

    Access yourdatasimultaneously inmultiple ways(batch, interactive, real-time)

    DATA ACCESS

    Loaddataandmanage according

    topolicy

    GOVERNANCE & INTEGRATION

    ENTERPRISEMGMT&SECURITY

    Empowerexistingoperationsandsecuritytoolstomanage Hadoop

    PRESENTATION&APPLICATION

    Enablebothexistingandnewapplicationtoprovidevaluetotheorganization

    Providedeploymentchoice acrosson-premise,appliance, virtualized,cloud

    DEPLOYMENTOPTIONS

    Deployandeffectivelymanage theplatform

    OPERATIONS

  • 22 HortonworksInc.20112016.AllRightsReserved runson

    ETL

    RDBMSImport/Export

    DistributedStorage&ProcessingFramework

    SecureNoSQL DB

    SQLonHBase

    NoSQL DB

    WorkflowManagement

    SQL

    StreamingDataIngestion

    ClusterSystemOperations

    SecureGateway

    DistributedRegistry

    ETL

    Search&Indexing

    EvenFasterDataProcessing

    DataManagement

    MachineLearning

    HadoopEcosystem

  • 23 HortonworksInc.20112016.AllRightsReserved

    OpenEnterpriseHadoopCapabilities

    YARN : Data Operating System

    DATA ACCESS SECURITYGOVERNANCE & INTEGRATION OPERATIONS

    1

    N

    Data Lifecycle & Governance

    FalconAtlas

    AdministrationAuthenticationAuthorizationAuditingData Protection

    RangerKnoxAtlasHDFSEncryptionData Workflow

    SqoopFlumeKafkaNFSWebHDFS

    Provisioning, Managing, & Monitoring

    AmbariCloudbreakZookeeper

    Scheduling

    Oozie

    Batch

    MapReduce

    Script

    Pig

    Search

    Solr

    SQL

    Hive

    NoSQL

    HBaseAccumuloPhoenix

    Stream

    Storm

    In-memory

    Spark

    Others

    ISV Engines

    Tez Tez Slider Slider

    DATA MANAGEMENT

    HortonworksDataPlatform

    DeploymentChoiceLinux Windows On-Premise Cloud

    HDFS Hadoop Distributed File System

  • 24 HortonworksInc.20112016.AllRightsReserved

    HORTONWORKS DATAPLATFORM

    DATAMGMT

    HDP2.2Dec2014

    HDP2.1April2014

    HDP2.0Oct2013

    HDP2.2Dec2014

    HDP2.1April2014

    HDP2.0Oct2013

    2.2.0

    2.4.0

    2.6.0

    Ong