Top Banner
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Jon Handler, Principal Solutions Architect November 29, 2016 Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana BDM302
35

AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Jan 06, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Jon Handler, Principal Solutions Architect

November 29, 2016

Real-Time Data Exploration and

Analytics with Amazon Elasticsearch

Service and Kibana

BDM302

Page 2: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

What to do with a terabyte of logs?

Page 3: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)
Page 4: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

What to Expect from the Session

data source Amazon Kinesis Firehose Amazon Elasticsearch

Service

Kibana

123 4

Query DSL5

Page 5: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Demo: create an Amazon ES

domain

Page 6: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Shard 1 Shard 2 Shard 3 Shard 4

An index is a collection of documents, divided

into shards

Documents

Index

ID ID ID ID ID ID ID ID ID ID ID ID ID ID ID ID

...

Indexing, compression

Page 7: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Deployment of indices to a cluster

• Index 1– Shard 1

– Shard 2

– Shard 3

• Index 2– Shard 1

– Shard 2

– Shard 3

Amazon ES cluster

1

2

3

1

2

3

1

2

3

1

2

3

Primary Replica

1

3

3

1

Instance 1,

Master

2

1

1

2

Instance 2

3

2

2

3

Instance 3

Page 8: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

How many instances?

The index size will be about the same as the

corpus of source documents

• Double this if you are deploying an index replica

Size based on storage requirements

• Either local storage or 512GB of Amazon Elastic

Block Store (EBS) per instance

• Example: 2TB corpus will need 8 instances– Assuming a replica and using EBS

– With i2.2xlarge nodes using 1.6TB ephemeral storage, 4 nodes would be enough

Page 9: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Cluster with no dedicated masters

Amazon ES cluster

1

3

3

1

Instance 1,

Master

2

1

1

2

Instance 2

3

2

2

3

Instance 3

Page 10: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Cluster with dedicated masters

Amazon ES cluster

1

3

3

1

Instance 1

2

1

1

2

Instance 2

3

2

2

3

Instance 3Dedicated master nodes

Data nodes: queries and updates

Page 11: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Cluster with zone awareness

Amazon ES cluster

1

3

Instance 1

2

1 2

Instance 2

3

2

1

Instance 3

Availability Zone 1 Availability Zone 2

2

1

Instance 4

3

3

Page 12: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Best practices

Data nodes = Storage needed/Storage per node

Use GP2 EBS volumes

Use 3 dedicated master nodes for production deployments

Enable zone awareness

Set indices.fielddata.cache.size = 40

Page 13: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Amazon Elasticsearch Service

overview

Amazon Route

53

Elastic Load

BalancingAWS IAM

Amazon

CloudWatch

Elasticsearch API

AWS CloudTrail

Page 14: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Amazon Elasticsearch Service benefits

Easy to use

Open-source

compatible

Secure

Highly available

AWS integrated

Scalable

Page 15: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Kinesis Firehose

Page 16: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Kinesis Firehose overview

Delivery Stream: Underlying

AWS resource

Destination: Amazon ES,

Amazon Redshift, or Amazon

S3

Record: Put records in

streams to deliver to

destinations

Page 17: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Firehose delivery architecture today

intermediate

Amazon S3 bucket

backup S3 bucket

source records

data source

source records

Amazon Elasticsearch

Service

Firehose

delivery stream

delivery failure

Page 18: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Coming soon! Firehose delivery architecture

with transformations

intermediate

Amazon S3

bucket

backup S3 bucket

source records

data source

source records

Amazon Elasticsearch

Service

Firehose

delivery streamtransformed

records transformed

records

transformation failure

delivery failure

Page 19: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Kinesis Firehose features for ingest

Serverless scale Error handling S3 Backup

Page 20: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Demo: create a Kinesis

Firehose stream

Page 21: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Best practices

Use smaller buffer sizes to increase throughput, but be

careful of concurrency

Use index rotation based on sizing

Default: stream limits: 2,000 transactions/second, 5,000

records/second, and 5 MB/second

Page 22: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Upload template and data

Page 23: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Number of shards = index size/30GB

Define the number of shards

when you create the index

Less is more

Writes occupy 1 shard, reads

occupy all shards

Amazon ES cluster

1

3

3

1

Instance 1,

Master

2

1

1

2

Instance 2

3

2

2

3

Instance 3

Page 24: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Mapping controls how data is indexed

not_analyzed text is best for

Kibana visualizations

Define a _template to

apply to all new indexes

The template also defines the

number of shards

0 delete 1,3,5

1 get 2,3,4,6

2 head 1,7,9

3 post 2,8

4 put 24

Index

Writer

Page 25: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Transform log lines to search documents

d104.aa.net - - [01/Jul/1995:00:00:15 -0400] "GET /images/KSC-logosmall.gif HTTP/1.0" 200 1204

{"status": 200, "ident": "-", "@timestamp": "1995-07-01T00:00:05", "request": "/images/KSC-logosmall.gif HTTP/1.0", "auth": "-", "host": "d104.aa.net", "verb": "GET", "time": "01/Jul/1995:00:00:15 -0400", "size": 1204}

Page 26: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Send_data method

Page 27: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Demo: upload template, send

logs to Firehose

Page 28: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Best practices

• Use a template for settings

• Set number of shards based on 30 GB per shard

• Best case, 1 active shard per node

• For analysis use cases, set not_analyzed on all fields

Page 29: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Analyze Apache web logs

Page 30: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Amazon ES aggregations

Buckets – a collection of documents meeting some criterion

Metrics – calculations on the content of buckets

Bucket: time

Metr

ic: count

Page 31: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Best practices

Make sure that your fields are not_analyzed

Visualizations are based on buckets/metrics

Use a histogram on the x-axis first, then sub-aggregate

Page 32: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Run Elasticsearch in the AWS Cloud with Amazon

Elasticsearch Service

Use Kinesis Firehose to ingest data simply

Kibana for monitoring, Elasticsearch queries for

deeper analysisAmazon

Elasticsearch

Service

Page 33: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

What to do next

Qwiklab:

https://qwiklabs.com/searches/lab?keywords=introduction

%20to%20amazon%20elasticsearch%20service

Centralized logging solution

https://aws.amazon.com/answers/logging/centralized-

logging/

Our overview page on AWS

https://aws.amazon.com/elasticsearch-service/

Page 34: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Thank you!

Page 35: AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Remember to complete

your evaluations!