Top Banner
Satish Mohan Your Data, Your Search Tuesday, 12 March 13
43

elasticsearch

May 10, 2015

Download

Technology

Satish Mohan

Search-oriented architecture using elasticsearch
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: elasticsearch

Satish Mohan

Your Data,Your Search

Tuesday, 12 March 13

Page 2: elasticsearch

Enterprises today are collecting and have access to more data points in their ecosystem then ever.

Tuesday, 12 March 13

Page 3: elasticsearch

File Store Example

File / Folder Navigation

Integration - Mount Points

Limited Metadata

Hierarchical Structure

Regular File Store

Tuesday, 12 March 13

Page 4: elasticsearch

• Find a document from December 2011 about transfer containing proposal and David

• Find the document received from John containing David and transfer

• Find the revisions of transfer document

File Store Example

File / Folder Navigation

Integration - Mount Points

Limited Metadata

Hierarchical Structure

Tuesday, 12 March 13

Page 5: elasticsearch

• Find a document from December 2011 about transfer containing proposal and David

• Find the document received from John containing David and transfer

• Find the revisions of transfer document

File Store Example

File / Folder Navigation

Integration - Mount Points

Limited Metadata

Hierarchical Structure Collections / Documents

Local / Distributed Integrations

Semantic Metadata

Declarative Queries

Automatic Indexing

Provenance

Automatic Organization

Virtual Collections

Regular File Store

Intelligent File Store

Tuesday, 12 March 13

Page 6: elasticsearch

ElasticSearch is an open source, scalable, distributed, cloud-ready, highly-available full-text search engine and database with powerful aggregation features, communicating by JSON over RESTful HTTP, based on Apache Lucene.

Tuesday, 12 March 13

Page 7: elasticsearch

Structured Data

Unstructured Data Data Refinery

Message Queues

Inverted index

Transaction Log Versioning

Source Document

Capture & Curate

Index

Streams

Analyse

Search

MemoryShared FS FS + MemoryLocal FS

Document Store

Playing with ElasticSearch

Tuesday, 12 March 13

Page 8: elasticsearch

Playing with ElasticSearchRivers

• Data flows from sources using Rivers

• Continues to add data as it flows

• Can be added, removed, configured dynamically

Tuesday, 12 March 13

Page 9: elasticsearch

Playing with ElasticSearchRivers

• Data flows from sources using Rivers

• Continues to add data as it flows

• Can be added, removed, configured dynamically

ES NodeData Source

Data Source

Data Source

River

River

River

ES Index

Tuesday, 12 March 13

Page 10: elasticsearch

Playing with ElasticSearchRivers

• Data flows from sources using Rivers

• Continues to add data as it flows

• Can be added, removed, configured dynamically

ES NodeData Source

Data Source

Data Source

River

River

River

ES Index

Tuesday, 12 March 13

Page 11: elasticsearch

Playing with ElasticSearchRiver Modules

• CouchDB • JDBC

• MongoDB • Solr

• Wikipedia • Jira

• Twitter • CSV

• ActiveMQ • FileSystem

• RabbitMQ • SysInfo

• NSQ • Logs

• RSS • LDAP

Tuesday, 12 March 13

Page 12: elasticsearch

Playing with ElasticSearchIndex

• Describes document structure to the search engine

• Automatically created with sensible defaults

• Explicit mapping can be provided (generally, a good idea)

• Simple:

• string, integer/long, float/double, boolean, and null

• Complex:

• array, object

Tuesday, 12 March 13

Page 13: elasticsearch

Structured Data

Unstructured Data Data Refinery

Message Queues

Inverted index

Transaction Log Versioning

Source Document Shards

Replication Load Balancing Nodes

Distributed

Capture & Curate

Index

Streams

Analyse

Search

MemoryShared FS FS + MemoryLocal FS

Document Store

Playing with ElasticSearch

Tuesday, 12 March 13

Page 14: elasticsearch

Playing with ElasticSearchDistributed Model

• Number of shards is the scaling unit [ #shards > #nodes ]

• each one is a separate Lucene index thus, many per-index settings are available

• Moving shards around is faster than splitting them (no reindex)

• Replicas also serves reads, allowing to scale search

• # of replicas can be updated dynamically after index creation

Node 1

user (0)

user (1)

Node 2

user1 (0)

user (1)

Node 3

user (0)

user2 (0)

Automatic Discovery Protocol

Replica

Shard

Tuesday, 12 March 13

Page 15: elasticsearch

Playing with ElasticSearchIndex Aliases

curl -X POST 'http://localhost:9200/_aliases' -d '{

"actions" : [{

"add" : { “index” : “users”,

“alias” : “user_1”,

“filter” : { “term” : { “user” : “1” } }, “routing” : “1”

}} ]

}'

Indexing and search happens on the alias, with automatic use of routing and filtering

Tuesday, 12 March 13

Page 16: elasticsearch

Playing with ElasticSearchIndex Aliases

curl -X POST 'http://localhost:9200/_aliases' -d

' {"actions" : [ { "add" : {

"index" : "user_1", "alias" : "users"

}

},{ "add" : {

"index" : "user_2", "alias" : "users"

}

} ]}'

users

user_1

user_2

curl -X GET "http://localhost:9200/users/_search?q=..."

Tuesday, 12 March 13

Page 17: elasticsearch

Structured Data

Unstructured Data Data Refinery

Message Queues

Inverted index

Transaction Log Versioning

Source Document Shards

Replication Load Balancing Nodes

Distributed

Capture & Curate

Index

Streams

Analyse

Search

Transport

HTTP WebSockets

Thrift

ZeroMQ

memcached

TCP

MemoryShared FS FS + MemoryLocal FS

Document Store

Playing with ElasticSearch

Tuesday, 12 March 13

Page 18: elasticsearch

Structured Data

Unstructured Data Data Refinery

Message Queues

Inverted index

Transaction Log Versioning

Source Document

Data Sources

Tokenisers

Retrieval Models

Structured Results

Language Bindings Transport

Shards

Replication Load Balancing Nodes

Distributed

Capture & Curate

Index

Streams

Analyse

Search

Transport

HTTP WebSockets

Thrift

ZeroMQ

memcached

TCP

Modules

Extend

MemoryShared FS FS + MemoryLocal FS

Document Store

Playing with ElasticSearch

Tuesday, 12 March 13

Page 19: elasticsearch

Structured Data

Unstructured Data Data Refinery

Message Queues

Inverted index

Transaction Log Versioning

Source Document

Data Sources

Tokenisers

Retrieval Models

Structured Results

Language Bindings Transport

Shards

Replication Load Balancing Nodes

Distributed

Zen EC2

Capture & Curate

Index

Streams

Analyse

Search

Transport

HTTP WebSockets

Thrift

ZeroMQ

memcached

TCP

Modules

Extend

Discovery

MemoryShared FS FS + MemoryLocal FS

Document Store

Playing with ElasticSearch

Tuesday, 12 March 13

Page 20: elasticsearch

Structured Data

Unstructured Data Data Refinery

Message Queues

Inverted index

Transaction Log Versioning

Source Document

Data Sources

Tokenisers

Retrieval Models

Structured Results

Language Bindings Transport

Shards

Replication Load Balancing Nodes

Distributed

Zen EC2

mvel Python

Groovy

Javascript

Capture & Curate

Index

Streams

Analyse

Search

Transport

HTTP WebSockets

Thrift

ZeroMQ

memcached

TCP

Modules

Extend

Script

Discovery

MemoryShared FS FS + MemoryLocal FS

Document Store

Playing with ElasticSearch

Tuesday, 12 March 13

Page 21: elasticsearch

Structured Data

Unstructured Data Data Refinery

Message Queues

Inverted index

Transaction Log Versioning

Source Document

Data Sources

Tokenisers

Retrieval Models

Structured Results

Language Bindings Transport

Shards

Replication Load Balancing Nodes

Distributed

Zen EC2

mvel Python

Groovy

Javascript

Capture & Curate

Index

Streams

Analyse

Search

Transport

HTTP WebSockets

Thrift

ZeroMQ

memcached

TCP

Modules

Extend

Script

Monitor

Discovery

MemoryShared FS FS + MemoryLocal FS

Document Store

Playing with ElasticSearch

Tuesday, 12 March 13

Page 22: elasticsearch

Structured Data

Unstructured Data Data Refinery

Message Queues

Inverted index

Transaction Log Versioning

Source Document

Data Sources

Tokenisers

Retrieval Models

Structured Results

Language Bindings Transport

Shards

Replication Load Balancing Nodes

Distributed

Zen EC2

mvel Python

Groovy

Javascript

Capture & Curate

Index

Streams

Analyse

Search

Transport

HTTP WebSockets

Thrift

ZeroMQ

memcached

TCP

Modules

Extend

Script

Monitor

Discovery

RESTful

MemoryShared FS FS + MemoryLocal FS

Document Store

Playing with ElasticSearch

Tuesday, 12 March 13

Page 23: elasticsearch

REST API : http://host:port/[ index]/[type]/[_action/id]HTTP Methods : GET, POST, PUT, DELETE

Playing with ElasticSearch

Tuesday, 12 March 13

Page 24: elasticsearch

REST API : http://host:port/[ index]/[type]/[_action/id]HTTP Methods : GET, POST, PUT, DELETE

Playing with ElasticSearch

Some Definitions.....

• index -> Like a database

• type -> Like a table

• id -> Like a row in a table

Tuesday, 12 March 13

Page 25: elasticsearch

Playing with ElasticSearchREST API : http://host:port/[index]/[type]/_action/id]HTTP Methods : GET, POST, PUT, DELETE

curl -X POST "http:// localhost:9200/art ic les/art ic le/1 " -d '{

" t i t le" : "E last icSearch Understands JSON!" , "body" : "E last icSearch not only “works” with JSON, i t understands i t ! Let ’s f i rst . . . " , "publ ished_on" : "2013/02/06 10:00:00" , " tags" : [ "search" , " json"] , "author" : {

" f i rst_name" : "Bruce" , " last_name" : "Croft" , "emai l" : "bruce@croft .org"

}} '

requ

est

Tuesday, 12 March 13

Page 26: elasticsearch

Playing with ElasticSearchREST API : http://host:port/[index]/[type]/_action/id]HTTP Methods : GET, POST, PUT, DELETE

curl -X POST "http:// localhost:9200/art ic les/art ic le/1 " -d '{

" t i t le" : "E last icSearch Understands JSON!" , "body" : "E last icSearch not only “works” with JSON, i t understands i t ! Let ’s f i rst . . . " , "publ ished_on" : "2013/02/06 10:00:00" , " tags" : [ "search" , " json"] , "author" : {

" f i rst_name" : "Bruce" , " last_name" : "Croft" , "emai l" : "bruce@croft .org"

}} '

{"ok" : true,"_ index":"art ic les" ,"_type":"art ic le" ,"_ id" :"1" ,"_version":1

}

requ

est

resp

onse

Tuesday, 12 March 13

Page 27: elasticsearch

Playing with ElasticSearchREST API : http://host:port/[index]/[type]/_action/id]HTTP Methods : GET, POST, PUT, DELETE

requ

est

curl -X GET "http://localhost:9200/articles/_search?q=author.first_name:BRUCE"

Tuesday, 12 March 13

Page 28: elasticsearch

Playing with ElasticSearchREST API : http://host:port/[index]/[type]/_action/id]HTTP Methods : GET, POST, PUT, DELETE

{"took":1,"t imed_out" : fa lse,"_shards" :{"total" :5 ,"successful" :5 ," fa i led" :0} ,"hits" : {

" total" :1 , "max_score":0.30685282, "hits" : [{ "_ index":"art ic les" , "_type":"art ic le" , "_ id" :"1" , "_score":0.30685282, "_source" :

{ " t i t le" : "E last icSearch Understands JSON!" , "body" : "E last icSearch not only “works” with JSON, i t understands i t ! Let ’s f i rst . . . " , "publ ished_on" : "2013/02/06 10:00:00" , " tags" : [ "search" , " json"] , "author" : {

" f i rst_name" : "Bruce" , " last_name" : "Croft" , "emai l" : "bruce@croft .org"

}} } ] } }

requ

est

curl -X GET "http://localhost:9200/articles/_search?q=author.first_name:BRUCE"

resp

onse

Tuesday, 12 March 13

Page 29: elasticsearch

Playing with ElasticSearchREST API : http://host:port/[index]/[type]/_action/id]HTTP Methods : GET, POST, PUT, DELETE

{"took":1,"t imed_out" : fa lse,"_shards" :{"total" :5 ,"successful" :5 ," fa i led" :0} ,"hits" : {

"total" :1 , "max_score":0.30685282, "hits" : [{ "_index":"art ic les", "_type":"art ic le", "_ id":"1", "_score":0.30685282, "_source" :

{ "t i t le" : "Elast icSearch Understands JSON!", "body" : "Elast icSearch not only “works” with JSON, i t understands it ! Let ’s f irst . . ." , "published_on" : "2013/02/06 10:00:00", "tags" : ["search", " json"] , "author" : {

"f irst_name" : "Bruce", " last_name" : "Croft" , "email" : "[email protected]"

}}

} ] } }

requ

est

curl -X GET "http://localhost:9200/articles/_search?q=author.first_name:BRUCE"

resp

onse

Location & ID

Document Source

Total number of documents

Tuesday, 12 March 13

Page 30: elasticsearch

Structured Data

Unstructured Data Data Refinery

Message Queues

Inverted index

Transaction Log Versioning

Source Document

Data Sources

Tokenisers

Retrieval Models

Structured Results

Language Bindings Transport

Shards

Replication Load Balancing Nodes

Distributed

Zen EC2

mvel Python

Groovy

Javascript

Capture & Curate

Index

Streams

Analyse

Search

Transport

HTTP WebSockets

Thrift

ZeroMQ

memcached

TCP

Modules

Extend

Script

Monitor

Discovery

RESTful Micro Apps

MemoryShared FS FS + MemoryLocal FS

Document Store

Playing with ElasticSearch

Tuesday, 12 March 13

Page 31: elasticsearch

Structured Data

Unstructured Data Data Refinery

Message Queues

Inverted index

Transaction Log Versioning

Source Document

Data Sources

Tokenisers

Retrieval Models

Structured Results

Language Bindings Transport

Shards

Replication Load Balancing Nodes

Distributed

Zen EC2

mvel Python

Groovy

Javascript

HTML5/CSS3 Javascript

Capture & Curate

Index

Streams

Analyse

Search

Transport

HTTP WebSockets

Thrift

ZeroMQ

memcached

TCP

Modules

Extend

Script

Monitor

Discovery

RESTful Micro Apps

MemoryShared FS FS + MemoryLocal FS

Document Store

Playing with ElasticSearch

Tuesday, 12 March 13

Page 32: elasticsearch

Micro Applications

Rich, interactive single-page web applications powered by JavaScript, HTML and CSS.

Tuesday, 12 March 13

Page 33: elasticsearch

Micro Applications

Tuesday, 12 March 13

Page 34: elasticsearch

Micro Applications

Tuesday, 12 March 13

Page 35: elasticsearch

Micro Applications

Tuesday, 12 March 13

Page 36: elasticsearch

Micro Applications

Tuesday, 12 March 13

Page 37: elasticsearch

Micro ApplicationsRich, interactive single-page web applications powered by JavaScript, HTML and CSS.

• A self-described framework for ambitious applications

• Rails-inspired “convention over configuration” approach

• High level abstractions, two-way binding and auto-updating templates

Data Model

ControllerRouter

View

Model Model

Controller

View View

View

Tuesday, 12 March 13

Page 38: elasticsearch

Micro ApplicationsRich, interactive single-page web applications powered by JavaScript, HTML and CSS.

• A self-described framework for ambitious applications

• Rails-inspired “convention over configuration” approach

• High level abstractions, two-way binding and auto-updating templates

• Ember Data

• Client side storage adapter

• Provides a common interface to persist application data

• RESTful HTTP service - primary endpoint

• Browser’s localStorage

• Emerging web databases such as IndexedDB

Data Model

ControllerRouter

View

Model Model

Controller

View View

View

Tuesday, 12 March 13

Page 39: elasticsearch

Playing with ElasticSearchMore Features.....

• document oriented • load balancing

• versioning • plugins

• parent/child docs • more_like_this

• scripting • multi_field mapping

• dynamic mapping templates • percolation

• bulk indexing • facets

• geo location • index aliases

• auto-complete • ngrams & edge-ngrams

• histograms • rivers

Tuesday, 12 March 13

Page 40: elasticsearch

Structured Data

Unstructured Data Data Refinery

Message Queues

Inverted index

Transaction Log Versioning

Source Document

Data Sources

Tokenisers

Retrieval Models

Structured Results

Language Bindings Transport

Shards

Replication Load Balancing Nodes

Distributed

Zen EC2

mvel Python

Groovy

Javascript

HTML5/CSS3 Javascript

Capture & Curate

Index

Streams

Analyse

Search

Transport

HTTP WebSockets

Thrift

ZeroMQ

memcached

TCP

Modules

Extend

Script

Monitor

Discovery

RESTful Micro Apps

MemoryShared FS FS + MemoryLocal FS

Document Store

An alternative that would allow scientists or even casual users to perform analysis of distributed data regardless of where the data resides.

Tuesday, 12 March 13

Page 41: elasticsearch

Search is the primary interface for getting information today. Let’s build on it.

Search

DiscoverAnalyse

Tuesday, 12 March 13

Page 42: elasticsearch

Tuesday, 12 March 13

Page 43: elasticsearch

Data Management Tools - Challenges

• Interactive queries, data exploration or iterative query refinement poses significant challenges for current methods

• Building and running jobs and queries requires deep understanding of cluster size and structure, job performance, etc.

• Time-consuming to set up, deploy and use

Tuesday, 12 March 13