Top Banner
Elasticsearch - index server used as a document database (with examples) Robert Lujo, 2014
43

ElasticSearch - index server used as a document database

Jun 24, 2015

Download

Software

Robert Lujo

Presentation held on 5.10.2014 on http://2014.webcampzg.org/talks/.

Although ElasticSearch (ES) primary purpose is to be used as index/search server, in its featureset ES overlaps with common NoSql database; better to say, document database.

Why this could be interesting and how this could be used effectively?

Talk overview:

- ES - history, background, philosophy, featureset overview, focus on indexing/search features
- short presentation on how to get started - installation, indexing and search/retrieving
- Database should provide following functions: store, search, retrieve -> differences between relational, document and search databases
- it is not unusual to use ES additionally as an document database (store and retrieve)
- an use-case will be presented where ES can be used as a single database in the system (benefits and drawbacks)
- what if a relational database is introduced in previosly demonstrated system (benefits and drawbacks)

ES is a nice and in reality ready-to-use example that can change perspective of development of some type of software systems.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ElasticSearch - index server used as a document database

Elasticsearch - index server used as a document database

!(with examples)

!Robert Lujo, 2014

Page 2: ElasticSearch - index server used as a document database

about me

software

professionally 17 y.

freelancer

more info -> linkedin

Page 3: ElasticSearch - index server used as a document database

Elasticsearchsearch server based on Apache Lucene

distributed, multitenant-capable

full-text search engine

RESTful web interface

schema-free JSON documents

NoSQL capabilities

https://en.wikipedia.org/wiki/Elasticsearch

Page 4: ElasticSearch - index server used as a document database

Elasticsearchfirst release in February 2010

until now raised total funding > $100M

latest release 1.3 & 1.4 beta

+ Logstash+ Kibana => ELK stack

Apache 2 Open Source License

Page 5: ElasticSearch - index server used as a document database

Very popularand used by

!

!

!

!

!

… Wikimedia, Mozilla, Stack Exchange, Quora, CERN …!

!

Page 6: ElasticSearch - index server used as a document database

Professional services also available

Page 7: ElasticSearch - index server used as a document database

What about docs?

Page 8: ElasticSearch - index server used as a document database

FeaturesSys: !

real time data, distributed, multi-tenancy, real time analytics, high availability

Dev:!

restful api, document oriented, schema free, full text search, per-operation persistence, conflict management

http://www.elasticsearch.org/overview/elasticsearch/

Page 9: ElasticSearch - index server used as a document database

Install, run …

prerequisite: JDK - Java (Lucene remember?)

wget https://download.elasticsearch.org/.../elasticsearch-1.3.4.zip

unzip elasticsearch-1.3.4.zip

elasticsearch-1.3.4/bin/elasticsearch

Page 10: ElasticSearch - index server used as a document database

& use!# curl localhost:9200

{ "status" : 200, "name" : "The Night Man", "version" : { "number" : "1.3.4", "build_hash" : “…”, "build_timestamp" : “…”, "build_snapshot" : false, "lucene_version" : "4.9" }, "tagline" : "You Know, for Search" }

Page 11: ElasticSearch - index server used as a document database

create index & put some data

# curl -XPUT localhost:9200/mainindex/company/1 -d '{ "name" : "CoolComp Ltd.", "employees" : 10, "founded" : "2014-10-05", "services" : ["software", "consulting"], "management": [ {"role" : "CEO", "name" : "Petar Petrovich"}, {"name" : "Ivan Ivić"} ], "updated" : "2014-10-05T22:31:55" }’ => {"_index":"mainindex","_type":"company","_id":"1","_version":4,"created":false}

Page 12: ElasticSearch - index server used as a document database

fetch document by id (key/value database)

# curl -XGET localhost:9200/mainindex/company/1 !=> !{“_index":"mainindex", “_type":"company", “_id”:"1","_version" : 4, ”found”:true, "_source":{ "name" : "CoolComp Ltd.", "employees" : 10, … }}

Page 13: ElasticSearch - index server used as a document database

search documents# curl -XGET 'http://localhost:9200/maindex/_search?q=management.name:petar' # no type! {“took”:128,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0}, “hits”:{ "total":1, "max_score":0.15342641, “hits” : [ {“_index":"mainindex","_type":"company", “_id":"1", “_score":0.15342641, "_source":{ "name" : "CoolComp Ltd.", … "updated" : "2014-10-05T22:31:55"

Page 14: ElasticSearch - index server used as a document database

Database is …an organized (or structured) collection of data

!

Database management system (DBMS) is …!

software system provides interface between users and database(s)

4 common groups of interactions:

1. Data definition

2. Update - CrUD

3. Retrieval - cRud

4. Administration

Page 15: ElasticSearch - index server used as a document database

Elasticsearch is a database?

1. Data definition

2. Update - CrUD

3. Retrieval - cRud

4. Administration

Page 16: ElasticSearch - index server used as a document database

Data representation - document-oriented-databaseDocument-oriented-database - “NoSql branch”? Not really but …

Document is … blah blah blah … something like this: !

{ “_id” : 1, “_type” : “company”, "name" : "CoolComp Ltd.", "employees" : 10, "founded" : "2014-10-05", "services" : ["software", "consulting"], "management": [ {"role" : "CEO", "name" : "Petar Petrovich"}, {"name" : "Ivan Ivić"} ], "updated" : "2014-10-05T22:31:55" }

Page 17: ElasticSearch - index server used as a document database

Data representation - relational databases

company: id name employees founded -- --------------- --------- ------------ 1 'CoolComp Ltd.' 10 '2014-10-05' !services: id value --- ---------------- 1 'software' 2 'consulting' !company_services: id id_comp id_serv --- ------- -------- 1 1 1 2 1 2 !person: id name -- ----------------- 1 'Petar Petrovich' 2 'Ivan Ivić' comp_management: id id_comp id_pers role --- ------- ------- ----- 1 1 1 CEO 2 1 2 MEM

Page 18: ElasticSearch - index server used as a document database

Data definitionElasticsearch is “schemaless”

But it provides defining schema - mappings

Very important when setting up for search:

• data types - string, integer, float, date/tst, boolean, binary, array, object, nested, geo, attachment

• search analysers, boosting, etc.

Page 19: ElasticSearch - index server used as a document database

Data definition - compared to RDBMS

But we loose some things what RDBMS offers:

• data validation / integrity

• removing data redundancy - normalization

• “fine grained” structure definition

• standard and common usage (SQL)

Page 20: ElasticSearch - index server used as a document database

RetrievalWe had this example before:!

!# curl -XGET 'http://localhost:9200/maindex/_search?q=management.name:petar' # no type!

!equivalent SQL query:!

!select * from company where exists( select 1 from comp_management cm inner join peron p on p.id=cm.id_pers where lower(p.name) like '%peter%');

Page 21: ElasticSearch - index server used as a document database

Retrieval - ES-QDSL

based on my experience, I would rather use ES:• for searches: full text, fuzzy, multi field, multi

document types, multi indexes/databases• in programming - better to convert/deal with

JSON than with ORM/raw SQL results• single web page applications

Page 22: ElasticSearch - index server used as a document database

Retrieval - SQL

on the other hand, I would rather use SQL and RDBMS:

• when composing complex query - easier to do with SQL

• for data exploring/researching!

SQL is much more expressive DSL

Page 23: ElasticSearch - index server used as a document database

Joining & denormalization

object hierarchy … must be denormalized.

increases retrieval performance (since no query joining is necessary),

uses more space

makes keeping things consistent and up-to-date more difficult

They’re excellent for write-once-read-many-workloads

https://www.found.no/foundation/elasticsearch-as-nosql/

Page 24: ElasticSearch - index server used as a document database

Joining optionsES has several ways to “join” objects/documents/types:

1. embedding objects

2. “nested” objects

3. parent / child relation between types

4. compose manual query

When fetching by id - very handy (1 & 2).

When quering - not so handy.

Page 25: ElasticSearch - index server used as a document database

Updating - CrUD !Elasticsearch

I would rather use Elasticsearch:• when creating, updating and deleting single

nested document

Page 26: ElasticSearch - index server used as a document database

Updating - CrUD!RDBMS

on the other hand, RDBS I found handy:• for flat entities/documents• for mass objects manipulation• transactions & integrity (ACID)

Page 27: ElasticSearch - index server used as a document database

Administrationinstall, configure, maintenance, monitoring, scaling … quite satisfing!!

OS specific install - apt-get, yum, zypper, brew, …!

plugins installation ./bin/plugin -i Elasticsearch/marvel/latest

Page 28: ElasticSearch - index server used as a document database

Administration - tools

Page 29: ElasticSearch - index server used as a document database

Elasticsearch as Database

!

!

!

!

!

to avoid maintenance and development time overhead

Page 30: ElasticSearch - index server used as a document database

Hybrid solutionElasticsearch + …

Page 31: ElasticSearch - index server used as a document database

ES + … - hybrid solution

So why can you use ElasticSearch as a single point of truth (SPOT)?

Elasticsearch … used in addition to another database.

A database system for constraints, correctness and robustness, transactionally updatable,

master record which is then asynchronously pushed/pulled to Elasticsearch

Page 32: ElasticSearch - index server used as a document database

Hybrid solution

Page 33: ElasticSearch - index server used as a document database

Elasticsearch riversbesides classic indexing - rivers provide alternative way for inserting data into ES

service that fetches the data from an external source (one shot or periodically) and puts the data into the cluster

Besides listed on official site:

• RDBMS/JDBC

• MongoDB

• Redis

• Couchbase

• …

Page 34: ElasticSearch - index server used as a document database

• Indexing & reindexing subdocuments is major job

• upsert mode• issues - not indexing, memory hungry, full

reindex when new field/subdoc• building AST when building a query - quite

demanding• satisfied with the final result!

Use-case - RDBMS & Elasticsearch

Page 35: ElasticSearch - index server used as a document database

What about others?

Page 36: ElasticSearch - index server used as a document database

Riak & Solr

September 16, 2014 - With 2.0, we have added distributed Solr to Riak Search. For every instance of Riak, there is an instance of Solr. While this drastically improves full-text search, it also improves Riak’s overall functionality. Riak Search now allows for Riak to act as a document store (instead of key/value) if needed.

Despite being a part of Riak, Riak Search is a separate Erlang application. It monitors for changes to data in Riak and propagates those changes to indexes managed by Solr.

Page 37: ElasticSearch - index server used as a document database

Couchbase and Elasticsearch

integrates Couchbase Server and Elasticsearch,

by streaming data in real-time from Couchbase to Elasticsearch.

combined solution … with full-text search, indexing and querying and real-time analytics … content store or aggregation of data from different data sources.

Couchbase Server provides easy scalability, low-latency document access, indexing and querying of JSON documents and real-time analytics with incremental map reduce.

Page 38: ElasticSearch - index server used as a document database

MongoDB and Elasticsearch

“addition of Elasticsearch represents only a first step in its mission to enable developers to choose the database that's right for their needs”

“big weakness of MongoDB is the free text search, which MongoDB tried to address in version 2.4 in some aspects.”

Page 39: ElasticSearch - index server used as a document database

Not to forget good old school …

Page 40: ElasticSearch - index server used as a document database

RDBMS with FTS

Page 41: ElasticSearch - index server used as a document database

Elasticsearch use when …

you need very good, reliable, handy, web oriented search index engine

you have intensive read and document oriented application

“write” balance - depending on how much - ES as a NoSQL only or as a hybrid solution

Page 42: ElasticSearch - index server used as a document database

Summaryno silver bullet, “the right tool for the job”

learn & get familiar with different solutions and choose optimal one

be objective & productive

General trend are heterogenous => lot of integration tasks lately

learn new things & have fun!

Page 43: ElasticSearch - index server used as a document database

Thank you for your patience!

Questions?

!

[email protected]

@trebor74hr