Top Banner
Future-proofing your Data Lake Extending Storage and Lifecycle of Data Scott Gidley, Zaloni and Gus Horn, NetApp Webinar: October 05 2016
18

Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

Jan 21, 2017

Download

Technology

Zaloni
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

Future-proofing your Data Lake Extending Storage and Lifecycle of Data

Scott Gidley, Zaloni and Gus Horn, NetAppWebinar: October 05 2016

Page 2: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

• Award-winning provider of enterprise data lake management solutions:

Integrated data lake management platform

Self-service data preparation

• Data Lake Design and Implementation Services: POC, Pilot, Production, Operations, Training

• Data Science Professional Services

Page 3: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

3 Zaloni Proprietary

Increased Agility

New Insights

Improved Scalability

Data lakes are central to the modern data architecture

Page 4: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

4 Zaloni Proprietary

Data architecture modernizationTr

aditi

onal

Mod

ern

Data Lake

Sources ETL EDW

Derived (Transformed)

Discovery Sandbox

EDW

Streaming

Unstructured Data

Various Sources

Data DiscoveryAnalytics BI

Data ScienceData Discovery

Analytics BI

Page 5: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

Data Lake Promise

• Stores all types of data (structured and unstructured) in its raw format

• Stores data for longer periods of time to enable historical analysis

• Manages real-time, streaming, and reference data all in the same environment

• Integrates storage and compute environments

Data Lake Reality• Homogenous data storage degrades

performance and efficiency• Aged or non-relevant data pollutes the

data lake• Lack of business driven SLA’s for data

archival impacts compliance and automated initiatives

Zaloni Confidential and Proprietary - Provided under NDA

Big data opportunities come with challenges

Page 6: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

Zaloni Confidential and Proprietary - Provided under NDA

• Leverage the full power of a scale-out architecture with an actionable, scalable data lake

Data Lake 360° : Zaloni’s holistic approach to actionable big data

1. Enable the lake

2. Govern the data

• Improve data visibility, reliability and quality to reduce time-to-insight

3. Engage the business

• Safeguard sensitive data and enable regulatory compliance

• Foster a data-driven business through self-service data discovery and preparation

Page 7: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

Data lake’s show promise but success can be short-lived!

▪ Internet retailer relies on data lake to enable:▪ Real-time inventory analytics ▪ Customer next-best-offer programs

▪ Initial implementation shows promise and delivers measurable business value

▪ Increasing costs and decreasing performance due to unmanaged data growth limit long-term ROI

Real-Time Inventory Management

Customer 360:Next Best Offer

Page 8: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

Data Lake Reference Architecture

• Data required for LOB specific views - transformed from existing certified data

• Consumers are anyone with appropriate role-based access

• Standardized on corporate governance/ quality policies

• Consumers are anyone with appropriate role-based access

• Single version of truth

TransientLanding Zone Raw

Zone

Analytic Zone

Refined Zone

Sandbox

Data Lake

• Temporary store of source data

• Consumers are IT, Data Stewards

• Implemented in highly regulated industries

• Original source data ready for consumption

• Consumers are ETL developers, data stewards, some data scientists

• Single source of truth with history

• Data required for LOB specific views - transformed from existing certified data

• Consumers are anyone with appropriate role-based access

Sensors (or other time series data)

Relational Data Stores (OLTP/ODS/DW)

Logs(or other unstructured

data)

Social and shared data

Page 9: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

Data Lake Reference Architecture with Zaloni

Consumption ZoneSource System

File Data

DB Data

ETL Extracts

Streaming

TransientLanding Zone Raw Zone

Analytic Zone

Refined Zone

Sandbox

APIs

MetadataManagement

Data Quality Data Catalog Security

Data Lake

Business AnalystsResearchers

Data Scientists

DATA LAKE MANAGEMENT & GOVERNANCE PLATFORM

Sensors (or other time series data)

Relational Data Stores (OLTP/ODS/DW)

Logs(or other unstructured

data)

Social and shared data

Page 10: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

Bedrock Data Lifecycle Management – Policy Execution

Zaloni DLM – Future proof your data lake

Zaloni Confidential and Proprietary - Provided under NDA

Business AnalystsResearchersData Scientists

File Data DB Data ETL Extracts

Streams

APIs

Raw Data Zone Refined Data Zone Analytic Data ZoneDLM Policy

< 360 Days = Warm > 360 Days = S3 Vault

DLM Policy< 30 Days = Hot

> 30 & < 120 Days = Warm > 120 Days = S3 Vault

DLM Policy< 30 Days = Hot

> 30 Days = S3 Vault

INGEST ORGANIZE ENRICH ENGAGE

S3 Vault

StorageGRID Webscale

Hot

E-Series Flash

Warm

E-Series Disk

Consumption Zone

Applications

Data Lake

Data Storage

Data Tier

Bedrock Data Lifecycle Management – Policy Definition

Page 11: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

The complexities of the connected vehicle The classic problems associated with Big Data Volume, Velocity, Variability & Privacy!

© 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---11

Page 12: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

The promise of a Connected Car’s Data lakeHow to manage billions of unstructured records

© 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---12

INGESTManage data ingestion

so you know what is your Hadoop Data Lake

ORGANIZEDefine and capture

metadata for ease of searching and browsing

ENRICHOrchestrate and manage the data

preparation process

ENGAGESelf-service data

preparation

Page 13: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

Validated Certified Designs with all Distributions of Hadoop

• Map-R• Cloudera • Hortonworks

Uses high performance storage• Resilient Compact footprint• Protection of Data, DDP,

R5/R6/R10• Less Network Congestion

Higher capacity and density• 480TB in 4U• Expandable to 3.1 PB / Controller• Fully serviceable storage system• No Architectural limit

Reliability• 99.9999% reliability <35sec / year

The NetApp Solution for Hadoop

13Insight © 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use Only

Enterprise Grade Hadoop (Consistent performance during all modes of operation)

12Gb/S SAS

Data Nodes4:1 Ratio

10 GB Ether Net

10 G

B E

ther

Net

wor

k D

ata

Inte

nsiv

e si

de

10 GB Ether Net

1 or

10

GB

Eth

er N

etw

ork

Man

agem

ent

Hadoop Analytic Platform- High Performance HDFS- Heterogeneous File system- Tiered HOT/WARM/COLD Storage- Tested Validated Architecture

High Performance Building Block- High Performance HDFS- Scale to Thousands of Nodes- Exa-Bytes of Capacity

NFS Connector for Hadoop

Resource ManagerName Node(s)

Fully connected Building Block- High Performance NFS optimized- Augment existing Hadoop Cluster- Exa-Bytes of Capacity

Page 14: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

Fleet maintenance

▪Large commercial hauling company in US has over 400,000 leased vehicles▪Trucks are under warranty▪Fleet must operate and maximum efficiency to maintain profits

▪Truck drivers have predictable behavior▪They will continue to drive even with warning lights indicating problems with the vehicle, they keep on

trucking

▪Minor problems often times elevate to major ones if not addressed early on during the failure process

▪Perception of driver that the vehicle is under warranty and therefore if it is driving they will continue to the final destination i.e. completing the delivery before addressing any issue

Proactive maintenance is much more cost effective than reactive

© 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---14

Page 15: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

Solution for fleet maintenance

▪Placed cellular data telemetry devices in all leased vehicles

▪Collected all telemetry▪Speeds of vehicle and GPS coordinates▪All mechanical sensor data▪Could identify employee

▪Alerts driver to mechanical issue immediately and schedules proactive maintenance with appointment at next rest stop with predictive time out of service

▪Minor problems do not escalate to major failures

▪Immediate improvement of fleet uptime and reduce warranty expense and out of service situations

▪Saved over $5M in the first year of operation

Maintenance and vehicle readiness are correlated

© 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---15

Page 16: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

Large Strip-mining Operation in Mid West

▪Vehicles were large Caterpillar Earth Movers▪Maintenance cost in Millions (Oils, Hydraulic, Engine, Transmissions etc.)▪Vehicles only make money when moving product

▪Rather than Hobs meter (How many hours of operation) maintenance it was changed to telemetry based maintenance was implemented

▪Minor issues never progressed to major down time issues

▪Driver behavior had a direct correlation to vehicle damage and ware (brakes and suspension)

▪Maintenance cost reduction paid for Hadoop cluster and related software within the first quarter of operation

Telemetry proved benefits beyond the vehicle

© 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---16

Page 17: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

Savings extended beyond pure Maintenance

▪Vehicle load sensors transmitting load in real-time to production plant

▪Suspension load sensors transmitted road conditions▪Abnormal angles were detected in real time▪Pot holes and terrain require re-grading detected before causing excessive strain to the suspension of

Earth movers▪Prior to telemetry the mine guessed were to maintain the road and often were missing major issues

causing excessive suspension strain and out of limit failures costing Millions of dollars in down time and repairs

▪Driver behavior had a direct correlation to vehicle damage and ware (brakes and suspension)▪Drivers were better trained to learn how to brake and accelerate with the vehicles saving millions in

unneeded repairs

▪The side effect of telemetry produced more than $10M in cost reduction in vehicle and road maintenance with greater uptime of fleet

Route maintenance, driver behaviors and real-time product tracking

© 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---17

Page 18: Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

DATA LAKE MANAGEMENT AND GOVERNANCE PLATFORM

SELF-SERVICE DATA PREPARATION