Top Banner
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Olivier Klein Solutions Architect, AWS 23 rd June 2015 Cloud & Big Data Analytics Summit 2015 Hong Kong Real-Time Analytics at Scale in the AWS Cloud
33

Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Apr 13, 2018

Download

Documents

HoàngTử
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Olivier Klein

Solutions Architect, AWS

23rd June 2015

Cloud & Big Data Analytics Summit 2015

Hong Kong

Real-Time Analytics at Scale in the

AWS Cloud

Page 2: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Three Types of Data Analytics

Retrospective

analysis and

reporting

Here-and-now

real-time processing

and dashboards

Predictions

to enable smart

apps

Page 3: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Three Types of Data Analytics

Retrospective

analysis and

reporting

Here-and-now

real-time processing

and dashboards

Predictions

to enable smart

apps

Page 4: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

How Fast is Real-Time?

Page 5: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

“There’s no such thing as real time.

There’s only near-real time. Typically

when we talk about real-time, what

we mean is architectures that allow

you to respond to data without

persisting it to a database first!”

John Akred

CTO, Silicon Valley Data Science

Page 6: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

So what is near real-time?

• Ability to process data as it arrives

• Roughly speaking, process data in

“the present” rather than “the future”

• But what is “the present”?

• eCommerce – Attention span of a

potential customer

• Options Trader – Milliseconds

• Guided Missile – Microseconds

Page 7: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Solution: Stream Processing

• Stream “storage” which allows processing events as

they come in and react accordingly

Page 8: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

A high-throughput distributed messaging system.

Page 9: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

What do we expect from a real-time data stream?

Page 10: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Real-Time Data Stream Expectations

• What do we expect from a real-time data stream?

• Highly Available

• Fully Scalable

• Fault Tolerant

• (Temporary) Durable

• How can we achieve this?

• Multiple Datacenter Facilities

• Auto-Scalable Server Infrastructure

• Global Load-Balancers

• etc.

Page 11: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers
Page 12: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Oregon Beijing

Tokyo

Singapore

Ireland

GovCloud

Northern California Sydney

São Paulo

11 Regions

29 Availability Zones

53 Edge Locations

Continuous Expansion

Frankfurt

N. Virginia

AWS Global Infrastructure

Page 13: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Amazon Web Services

Core Services Compute Storage Database Networking

Infrastructure Regions Availability Zones Edge Locations

Platform Services

Analytics App Deployment Mobile

Access Control

Auditing Monitoring Encryption Security

Virtual Desktops

Collaboration & Sharing

App Delivery E-Mail Applications

API

&

SDKs

Page 14: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Compute Storage Database Networking

Amazon Web Services

Core Services

Infrastructure Regions Availability Zones Edge Locations

Platform Services

Analytics App Deployment Mobile

Access Control

Auditing Monitoring Encryption Security

Virtual Desktops

Collaboration & Sharing

App Delivery E-Mail Applications

API

&

SDKs

Page 15: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Let’s simplify Big Data with AWS!

Page 16: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Ingest Store Process Visualize

Data Answers

Time

Simplified Big Data Pipeline

Page 17: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Amazon S3

Amazon

DynamoDB

Amazon RDS

Ingest Store Process Visualize

Amazon Mobile

Analytics

Amazon

EC2

AWS

Import/Export

Amazon EMR

Amazon Redshift

Amazon

Lambda

Amazon

Kinesis Amazon Machine

Learning

Amazon

CloudSearch AWS Data

Pipeline

Amazon

EC2 Amazon

Glacier

Page 18: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Amazon

DynamoDB

Amazon RDS

Ingest Store Process Visualize

Amazon Mobile

Analytics

Amazon

EC2

AWS

Import/Export

Amazon EMR

Amazon Redshift

Amazon

Kinesis Amazon Machine

Learning

Amazon

CloudSearch AWS Data

Pipeline

Amazon S3 Amazon

Lambda

Amazon

EC2 Amazon

Glacier

Page 19: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Stream in Real Time: Amazon Kinesis

• Real-Time Data Processing over

large distributed streams

• Elastic capacity that scales to

millions of events per second

• React In real-time upon incoming

stream events

• Reliable stream storage replicated

across 3 facilities Amazon Kinesis

Page 20: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Kinesis

for Real-

Time

Page 21: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Amazon Kinesis: Produce and Consume

HTTP Post

AWS SDKs

LOG4J

Flume

Kinesis

Producer

Library (IoT)

Fluentd

App.4

[Machine Learning]

App.1

[Aggregate & De-Duplicate]

App.2

[Metric Extraction]

Amazon S3

Amazon

DynamoDB

Apache Storm

App.3

[Decision Making Tree]

Amazon EMR

Amazon Kinesis

Page 22: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Amazon

DynamoDB

Amazon RDS

Ingest Store Process Visualize

Amazon Mobile

Analytics

Amazon

EC2

AWS

Import/Export

Amazon EMR

Amazon Redshift

Amazon

Lambda

Amazon

Kinesis Amazon Machine

Learning

Amazon

CloudSearch AWS Data

Pipeline

Amazon S3 Amazon

EC2 Amazon

Glacier

Page 23: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

React in Real-Time: Amazon Lambda

• Run your code in the cloud, fully

managed and highly-available

• Triggered through invocation or

state changes in your setup

• Scales automatically to match the

incoming event rate

• Can be connected to an Amazon

Kinesis stream to react upon every

incoming event

• Charged per 100ms execution time

Amazon Kinesis

Amazon Lambda

Page 24: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Amazon

DynamoDB

Amazon RDS

Ingest Store Process Visualize

Amazon Mobile

Analytics

Amazon

EC2

AWS

Import/Export

Amazon EMR

Amazon Redshift

Amazon

Kinesis Amazon Machine

Learning

Amazon

CloudSearch AWS Data

Pipeline

Amazon

Lambda Amazon S3 Amazon

EC2 Amazon

Glacier

Page 25: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Amazon DynamoDB

• Schemaless Data Model

• Seamless scalability

• No storage or throughput limits

• Consistent low latency performance

• High durability and availability

• Replicated across 3 facilities

DynamoDB

table

items

attributes

Fully Managed NoSQL Database Service

Page 26: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers
Page 27: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

500,000 writes / second to their Amazon

DynamoDB tables

200 additional servers during Superbowl

0 additional servers right after

Page 28: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

1 instance x 100 hours = 100 instances x 1 hour

Page 29: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Let’s put it all together: Demo Time!

Page 30: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Amazon

Kinesis Twitter Stream

Amazon

DynamoDB

Amazon SNS

Amazon

Lambda

Demo: Live Twitter Feed Analysis

Amazon S3

Visualization with

D3.js

Page 31: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Demo: Live Twitter Feed Analysis

Cost of running this demo?

Kinesis Shard: $0.15/h

DynamoDB: $0.0065/h + $0.25/GB

Lambda: $0.000000208/100ms

S3: $0.03/GB

Total: $0.436502080 ~ $0.43

Highly available with virtually unlimited scalability.

Page 32: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

What’s next?

• Many AWS Services can help your Big

Data Roadmap

• Talk to us at the AWS and Masterson

booth to learn how to build a cost-

effective data analytics platform on us

• US$50 AWS Credits to get you started

$50

Page 33: Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon EC2 Amazon Glacier . Stream in Real Time: Amazon Kinesis ... 200 additional servers

Thank you!

Olivier Klein

Solutions Architect, AWS

23rd June 2015

Cloud & Big Data Analytics Summit 2015

Hong Kong