Introduction to Elasticsearch 27th May 2014 - BigData Meetup Eric Rodriguez @wavyx
Jan 26, 2015
Introduction to Elasticsearch27th May 2014 - BigData Meetup
Eric Rodriguez @wavyx
About MeEric Rodriguez Founder of data.be !• Web entrepreneur • Data addict • Multi-Language: PHP, Java/
Groovy/Grails, .Net, …
be.linkedin.com/in/erodriguez !github.com/wavyx !@wavyx
Elasticsearch - Company
• Founded in 2012 => http://www.elasticsearch.com
• Professional services
• Training
• Consultancy / Development support
• Production support subscription (3 levels of SLAs)
Enterprises using Elasticsearch
(M)ELK Stack
• Elasticsearch - Search server based on Lucene
• Logstash - Tool for managing events and logs
• Kibana - Visualize logs and time-stamped data
• Marvel - Monitor your cluster’s heartbeat
You Know, for Search…
Logstash• Collect, parse, index, and search logs
Kibana• A versatile dashboard to see and interact with your data
Marvel• Monitor the health of your cluster
cluster-wide metrics, overview of all nodes and indices and events (master election, new nodes)
real time, search and
analytics engine
open-source
Lucene
JSON
schema free
documentstore
RESTful
API
documentation
scalability
high availability
distributed
multi tenancy
per-operation persistence
Use Cases• Full-Text Search
• Data Store
• Analytics
• Alerts
• Ads
• …
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Elasticsearch core• Apache Lucene is a high-performance, full-featured text search engine library
written entirely in Java
• Elasticsearch added value: “Simple is best”
• Simple API (with documentation)
• JSON & RESTful
• Sharding & Replication
• Extensibility: plugins and scripts
• Interoperability: clients and integrations
Terms for DBAs
• Index
• Type
• Document
• Fields
• Mapping
ElasticsearchRDBMs
• Database
• Table
• Row
• Column
• Schema
Plug & Play
• Zero configuration
• 4 LoC to get started ;)
Alive !
=> http://localhost:9200/?pretty
REST• Check your cluster, node, and index health, status, and statistics
• Administer your cluster, node, and index data and metadata
• Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes
• Execute advanced search operations such as paging, sorting, filtering, scripting, faceting, aggregations, and many others
Basic Operations 1/3
• Add a document
• Create index
Basic Operations 2/3
• Modify/Replace a document
• Delete a document
• Delete index
Basic Operations 3/3• Update a document
Mapping 1/2
• Define how a document should be mapped (similar to schema): searchable fields, tokenization, storage, ..
• Explicit mapping is defined on an index/type level
• A default mapping is automatically created
Mapping 2/2• Core types: string, integer/long, float/double, boolean, and null
• Other types: Array, Object, Nested, IP, GeoPoint, GeoShape, Attachment
• Example
Search API 1/2
• Multi-index, Multi-type
• Uri search - Google like Operators (AND/OR), fields, sort, paging, wildcards, …
Search API 2/2• Paging & Sort
• Fields: selection, scripts
• Post filter
• Highlighting
• Rescoring
• Explain
• …
Query DSL• “SQL” for elasticsearch
• Queries should be used
• for full text search
• where the result depends on a relevance score
• Filters should be used
• for binary yes/no searches
• for queries on exact values
Basic Queries
Basic Filters
Analysis 1/2• Analysis is extracting “terms” from a given text
• Processing natural language to make it computer searchable
• Configurable registry of Analyzers that can be used
• to break indexed (analyzed) fields when a document is indexed
• to process query strings
Analysis 2/2
• Analyzers are composed of
• a single Tokenizer (may be preceded by one or more CharFilters)
• zero or more TokenFilters
• Default Analyzersstandard, pattern, whitespace, language, snowball
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Analytics• Aggregation of information: similar to “group by”
• Facets
• Aggregated data based on a search query
• One-dimensional results
• Ex: “term facets” return facetcounts for various values for a specific field Think color, tag, category, …
• Aggregations (ES 1.0+)
• Nested Facets
• Basic Stats: mean, min, max, std dev, term counts
• Significant Terms, Percentiles, Cardinality estimations
Facets• not yet deprecated, but use aggregations!
• Various Facets terms, range, histogram, date, statistical, geo distance, …
Aggregations• A generic powerful framework that can be divided into 2 main families:
• Bucketing Each bucket is associated with a key and a document criterion The aggregation process provides a list of buckets - each one with a set of documents that "belong" to it.
• MetricAggregations that keep track and compute metrics over a set of documents.
• Aggregations can be nested !
Bucket Aggregators• global
• filter
• missing
• terms
• range
• date range
• ip range
• histogram
• date histogram
• geo distance
• geohash grid
• nested
• reverse nested
• top hits (version 1.3)
Metrics Aggregators• count
• stats
• extended stats
• cardinality
• percentiles
• min
• max
• sum
• avg
Search for end users
• Suggesters - “Did you mean” Terms, Phrases, Completion, Context
• “More like this” Find documents that are "like" provided text by running it against one or more fields
Percolator• Classic ES
1. Add & Index documents
2. Search with queries
3. Retrieve matching documents
• Percolator
1. Add & Index queries
2. Percolate documents
3. Retrieve matching queries
Why Percolate ?!
• Alerts: social media mentions, weather forecast, news alerts
• Automatic Monitoring: price monitoring, stock alerts, logs
• Ads: display targeted ads based on user’s search queries
• Enrich: percolate new documents, then add query matches as document tags
High Availability 1/2• Sharding - Write Scalability
• Split logical data over multiple machines & Control data flows
• Each index has a fixed number of shards
• Improve indexing performance
• Replication - Read Scalability
• Each shard can have 0-many replicas (dynamic setup)
• Removing SPOF (Single Point Of Failure)
• Improve search performance
High Availability 2/2• Zen Discovery
• Automatic discovery of nodes within a cluster and electing a master node
• Useful for failover and replication
• Specific modules: Amazon EC2, Microsoft Azure, Google Compute Engine
• Snapshot & Restore module
Cluster Management• Marvel - http://www.elasticsearch.org/overview/marvel/
• BigDesk - http://bigdesk.org/
• Paramedic - https://github.com/karmi/elasticsearch-paramedic
• KOPF - https://github.com/lmenezes/elasticsearch-kopf/
• Elastic HQ - http://www.elastichq.org/
Clients & Integration• Ecosystem: Kibana, Logstash, Marvel, Hadoop integration
• API Clients: Java, Javascript, Groovy, PHP, Perl, Python, .Net, Ruby, Scala, Clojure, Go, Erlang, …
• Integrations: Grails, Django, Play!, Symfony2, Carrot2, Spring, Drupal, Wordpress, …
• Rivers: CouchDB, JDBC, MongoDB, Neo4j, Redis, RabbitMQ, ActiveMQ, Amazon SQS, File System, Twitter, Wikipedia, RSS, …
Fast & Furious EvolutionVersion 1.1March 25, 2014
• Cardinality Agg
• Percentiles Agg
• Significant Terms Agg
• Search Templates
• Cross fields search
• Alias for indices & templates
Version 1.2May 22, 2014• Java 7
• Indexing & Merging performance
• Aggregations performance
• Context suggester
• Deep scrolling
• Field value factor
Benchmark API coming in 1.3
Version 1.0Feb 12, 2014• Aggregations
• Snapshot & Restore
• Distributed Percolator
• Cat API
• Federated search
• Doc values
• Circuit breaker
Resources• http://www.elasticsearch.org/guide/
• http://www.elasticsearch.org/videos/
• http://www.elasticsearchtutorial.com/
• http://exploringelasticsearch.com/
• http://joelabrahamsson.com/elasticsearch-101/
• http://belczyk.com/2014/01/elasticsearch-recomended-learning-materials/
• http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules-plugins.html
Books• Elasticsearch Server
http://www.packtpub.com/elasticsearch-server-2e/book
• Elasticsearch in Action http://www.manning.com/hinman/
Books• Elasticsearch Cookbook
http://www.packtpub.com/elasticsearch-cookbook/book
• Mastering Elasticsearch http://www.packtpub.com/mastering-elasticsearch-querying-and-data-handling/book
Books• Elasticsearch - The Definitive Guide
http://www.elasticsearch.org/blog/elasticsearch-definitive-guide/
Thank [email protected] - @wavyx
be.linkedin.com/in/erodriguez - github.com/wavyxhttp://www.meetup.com/ElasticSearch-User-Group-Belux-Belgium-Luxembourg/