Top Banner
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Axel Larsson, Enterprise Solutions Architect Joyjeet Banerjee, Enterprise Solutions Architect 9 April 2019 Consuming the Data Lake - Reporting, Analytics, Machine Learning
22

Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

Jul 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Axel Larsson, Enterprise Solutions ArchitectJoyjeet Banerjee, Enterprise Solutions Architect

9 April 2019

Consuming the Data Lake -Reporting, Analytics, Machine

Learning

Page 2: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

What have we learned so far

Athena?

Page 3: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

Anti-Pattern

Everything

Query

Page 4: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

Also an Anti-Pattern

Everything

Query

Page 5: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

One tool to rule them all

Page 6: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

Where do I start?

• Understand your data• Data Structure, Access patterns & characteristics,

Temperature, Cost, Size

• Know your audience• Business Users, Data Scientists, Developers

• Select the right service

Page 7: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

Archival

In-memory Warehouse

NoSQL

Hot data Warm data Cold data

Dat

a St

ruct

ure

Low

High

Object

Search

Understand your Data

Latency

Data volumeHighLow

Request rate

Cost / GBHigh Low

Page 8: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

Amazon ElastiCache

Amazon ES

AmazonDynamoDB Amazon S3 Amazon Glacier

Hot data Warm data Cold data

Dat

a St

ruct

ure

Low

High

Understand your Data

Latency

Data volumeHighLow

Request rate

Cost / GBHigh Low

NoSQLObject

Archival

Search

In-Memory Warehouse

Amazon Redshift

Page 9: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Who is your audience?

Page 10: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

PRIORITIES NEEDS

Creating engaging visual and narrative journeys for analytical solutionsData Visualizer

Manages data as a product. Ensures freshness and consistency of data; understands lineage and compliance needs; treats DS as customers

Data Product Manager

Monitoring for reliability, quickly diagnose deployment or availability issues

DevOps Engineer

ROLE

VisualizationDashboardsReporting

Reports – data quality, errors

Ad hoc queryingDashboards

Makes sense of data, generates and communicates insights to improve or create business processes, creates predictive ML models to support them

Data Scientist Ad hoc querying Robust ML tools

Builds scalable pipelines, transforms and loads data into structures complete with metadata that can be readily consumed by DS

Data Engineer

Ad hoc queryingQuick visualization

Vetting the priortization and ROI, funding projects, providing ongoing feedback

Business Sponsor

ReportingDashboards

Page 11: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Enabling your ConsumersDashboards – Reports – Ad-Hoc Analysis – Machine Learning

Page 12: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

DashboardsVisual Representation of key metrics that change over time• Data structure - Low• Usage - Near real-time visualization• Data temperature - Hot

Available Services:

LambdaDynamoDB

+ Streams

ElasticsearchAmazon Kinesis Firehose

Page 13: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

Dashboards – Near Real-time

Amazon EMR

AWS Glue

OR

ETL

Data Lake

AmazonS3

Raw Bucket Transformed Data Bucket

DynamoDBUsers

EC2

Containers

Serverless

OR

OR

Web serving layer

Page 14: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

Dashboards + Search

Amazon EMR

AWS Glue

OR

ETL

Data Lake

AmazonS3

Raw Bucket Transformed Data Bucket

DynamoDB

Amazon Kinesis Firehose

AWSLambda

Dynamo Streams

AmazonElasticsearch Users

Page 15: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

ReportsStatic representations of data rendered at a point in time• Usage - Point in time data extraction• Data structure - High• Data temperature - Cold

Available Services:

Amazon Redshift Athena

Page 16: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

Ad Hoc AnalysisInformation sought on an as-needed basis• Usage - Dynamic Data Querying• Data structure - Case based• Data temperature - Medium - cold

Available Services:

Amazon Redshift Athena Amazon EMR

Amazon ElasticSearch

Page 17: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

Reports and Ad-Hoc Analysis

Amazon QuickSight

OR

Amazon Redshift

Amazon EMR

AWS Glue

OR

ETL

Data Lake

AmazonS3

Raw Bucket Transformed Data Bucket

Athena

Page 18: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

Machine LearningData labeled with outcomes to train predication models• Usage - Machine learning data preparation• Data structure - Case based• Data temperature - Medium - cold

Available Services:

Amazon EMR

Page 19: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

Reports and Ad-Hoc Analysis

Amazon EMR

AWS Glue

OR

ETL

Data Lake

AmazonS3

Raw Bucket Transformed Data Bucket

Amazon EMR

Users

Page 20: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

What else?

Athena?

Page 21: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

Processing & Analytics

Transactional & RDBMS

DynamoDB

NoSQL DB Relational DatabaseAurora

BI & Data Visualization

Kinesis Streams & Firehose

Batch

EMRHadoop, Spark,

Presto

RedshiftData Warehouse

AthenaQuery Service

AWS Batch

Predictive

Real-time

AWS LambdaApache Storm

on EMR

Apache Flinkon EMR

Spark Streaming on EMR

ElasticsearchService

Kinesis Analytics, Kinesis Streams

ElastiCache DAX

Page 22: Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

Thank you!