Sonian Inc.•Cloud-based email archiving •Founded in 2007•Headquarters: Newton, MA
Small team of about15 developers distributed
from Campinas, Brazil toVancouver, Canada
Using elasticsearch since June 2010, v0.8.0
6 billionrecords indexed in elasticsearch
We have about
100,000Netflix DVD Titles
3,000,000Pages in en.wikipedia.org
22,000,000Books in Library of Congress catalog
150,000,000Linked-in profiles
3,000,000,000Estimated bing.com index size
6,000,000,000
Sonian Inc. index size
50,000,000,000
Estimated google.com index size
Infrastructure
http://www.sonian.com/awssonian-technical-diagram/
Ingestion (safe): ClojureSearch Engine: elasticsearchWeb App: Ruby on Rail
Deployment: ChefMonitoring: Sensu
10 clusters6 AWS Regions
2-17 nodes in each cluster
Custom version of elasticsearch
based on 0.19.9with several plugins
jetty plugin
• jetty-based http transport• SSL support• Authentication• Request logging (json, plain)
Request logs are also indexed in elasticsearch
Open sourcehttps://github.com/sonian/elasticsearch-jetty
Zookeeper plugin
Zookeeper-based discoveryReplacement for zen
discovery
Experimental!
Open sourcehttps://github.com/sonian/elasticsearch-zookeeper
Valve plugin
•Custom jetty plugin filter•Rejects bulk indexing requests if cluster is overloaded
Lessons learned in the last two years
or
Proper Care and Feeding of
Elasticsearch Nodes
Rule1: Give nodes plenty of space
Running out of disk space or memory is the simplest
way to corrupt your index.
Make sure elasticsearch doesn’t swap
It reduces performance and causes nodes to leave
clusters
elasticsearch.yml
bootstrap.mlockall: true
Increase the number of open file descriptors to 64k.
Rule 2: Distributed but well connected
All nodes should be able to talk to each other all the
time
Otherwise your cluster might get split-brain
syndrome
Consider setting
discovery.zen.minimum_master_nodes
Rule 3: Throttle the bulk indexing load
Asynchronous architecture makes es scalable and fast, but susceptible to running
out of memory under excessive bulk indexing
load.
Rule 4: Try to make all shards approximately the
same size
Elasticsearch allocates shards based on the number of shards. It
doesn’t consider shard sizes or available disk
space.
4 rules for happy elasticsearch
1. Give nodes plenty of space
2. Distributed but well connected
3. Throttle the load4. Make all shards the
same size
Questions?
More Information
Latest stable release: 0.19.10
Web Site: http://www.elasticsearch.org/
Follow @elasticsearch on twitter
IRC: #elasticsearch on irc.freenode.net
GitHub: https://github.com/elasticsearch/elasticsearch
Mailing list: elasticsearch on http://groups.google.com/
Stackoverflow tag: elasticsearch