Top Banner
Spark and Elasticsearch for real-time data analysis 1 Costin Leau, @costinl Elastic
28

Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Jul 02, 2018

Download

Documents

dinhquynh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Spark and Elasticsearch for real-time data

analysis

1!

Costin Leau, @costinl Elastic

Page 2: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

What is Elasticsearch?

Scalable, real-time search and analytics engine

Open-source (on Github, Apache 2 License)

Page 3: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)
Page 4: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Unstructured search

Page 5: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Sorting

Page 6: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Pagination

Page 7: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Enrichment

Page 8: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Suggestions

Page 9: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Structured search

Page 10: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Aggregations

Page 11: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)
Page 12: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Elasticsearch for Apache Hadoop

Page 13: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Map/Reduce integration

Page 14: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Scala API

Page 15: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Java API

Page 16: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Spark SQL support

Page 17: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Spark SQL Data Sources

Page 18: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Partition-to-Partition Architecture

Page 19: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Dynamic Runtime Matching

Page 20: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Failure Handling

Page 21: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Co-location

Page 22: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Reacting to streaming data

Page 23: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Live loops

Data keeps on changing

Adapt set of rules

Improves reaction time

Build a model for fast decision making

Keeps the prevention rate high

Categorize data on the fly

Page 24: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Finding interesting data basic approach

Page 25: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Finding interesting data analytics

Page 26: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Finding interesting data through a ML model

Page 27: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

MLlib integration - wip

Hashing and featurize functions

Expose the Elasticsearch engine data structures

term vectors

term frequency

document frequency

(vectorize API in the works)

Page 28: Spark and Elasticsearch for real-time data analysis · What is Elasticsearch? Scalable, real-time search and analytics engine Open-source (on Github, Apache 2 License)

Thank you!

@costinl github.com/elastic

elastic.co