Top Banner
Berlin Buzzwords· June 2010 Basho Technologies Rusty Klophaus - @rklophaus Riak Search A Full-Text Search and Indexing Engine based on Riak
53

Riak Search - Berlin Buzzwords 2010

Jan 15, 2015

Download

Technology

Rusty Klophaus

Riak Search is a distributed data indexing and search platform built on top of Riak. The talk will introduce Riak Search, covering overall goals, architecture, and core functionality.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Riak Search - Berlin Buzzwords 2010

Berlin Buzzwords· June 2010

Basho Technologies

Rusty Klophaus - @rklophaus

Riak SearchA Full-Text Search

and Indexing Engine

based on Riak

Page 2: Riak Search - Berlin Buzzwords 2010

Why did we build it?

What are the major goals?

How does it work?

2

Page 3: Riak Search - Berlin Buzzwords 2010

Part One

Why did we build

Riak Search?

3

Page 4: Riak Search - Berlin Buzzwords 2010

Riak is

a scalable, highly-available, networked,

open-source key/value store.

4

Page 5: Riak Search - Berlin Buzzwords 2010

Key/Value

CLIENT RIAK

5

Writing to a Key/Value Store

Page 6: Riak Search - Berlin Buzzwords 2010

Object

CLIENT RIAK

6

Writing to a Key/Value Store

Page 7: Riak Search - Berlin Buzzwords 2010

Key

Object

CLIENT RIAK

Querying a Key/Value Store

7

Page 8: Riak Search - Berlin Buzzwords 2010

Key + Instructions

Object(s)

CLIENT RIAK

Walk to Related

Keys

Querying Riak via LinkWalking

8

Page 9: Riak Search - Berlin Buzzwords 2010

Key(s) + JS Functions

Computed Value(s)

CLIENT RIAK

Map

Reduce

Map

Querying Riak via Map/Reduce

9

Page 10: Riak Search - Berlin Buzzwords 2010

Key/Value Stores

like

Key-Based Queries

10

Page 11: Riak Search - Berlin Buzzwords 2010

where Category == "Shoes"

CLIENT RIAK

WTF!? I'm aKV store!

Query by Secondary Index

11

Page 12: Riak Search - Berlin Buzzwords 2010

"Converse AND Shoes"

CLIENT RIAK

This is getting old.

Full-Text Query

12

Page 13: Riak Search - Berlin Buzzwords 2010

These kinds of queries

need an Index.

*Market Opportunity!*

13

Page 14: Riak Search - Berlin Buzzwords 2010

Part Two

What are the major

goals of Riak Search?

14

Page 15: Riak Search - Berlin Buzzwords 2010

Your Application

Riak

An application built on Riak.

15

Page 16: Riak Search - Berlin Buzzwords 2010

Your Application

RiakIndex

Object

Hrm... I need an index.

16

Page 17: Riak Search - Berlin Buzzwords 2010

Your Application

Riak???

Hrm... I need an index with more features.

17

Page 18: Riak Search - Berlin Buzzwords 2010

Your Application

RiakLucene

Lucene should do the trick...

18

Page 19: Riak Search - Berlin Buzzwords 2010

Your Application

Lucene Lucene Lucene Riak

...shard to add more storage capacity...

19

Page 20: Riak Search - Berlin Buzzwords 2010

Your Application

Lucene Lucene Lucene

Lucene Lucene Lucene

Lucene Lucene Lucene

Riak

...replicate to add more throughput.

20

Page 21: Riak Search - Berlin Buzzwords 2010

Your Application

Lucene Lucene Lucene

Lucene Lucene Lucene

Lucene Lucene Lucene

Riak

...replicate to add more throughput.

21

Operations nightmare!

Page 22: Riak Search - Berlin Buzzwords 2010

Your Application

Riak-ifiedLucene

Riak

What do we really want?

22

Page 23: Riak Search - Berlin Buzzwords 2010

Your Application

RiakSearch

Riak

What do we really want?

23

Page 24: Riak Search - Berlin Buzzwords 2010

Functionality? Be like Lucene (and more).

• Lucene Syntax

• Leverages Java Lucene Analyzers

• Solr Endpoints

• Integration via Riak Post-Commit Hook (Index)

• Integration via Riak Map/Reduce (Query)

• Near-Realtime

• Schema-less

24

Page 25: Riak Search - Berlin Buzzwords 2010

Operations? Be like Riak.

• No special nodes

• Add nodes, get more compute and storage

• Automatically load balance

• Replicas for durability and performance

• Index and query in parallel

• Swappable storage backends

25

Page 26: Riak Search - Berlin Buzzwords 2010

Part Three

How do we do it?

26

Page 27: Riak Search - Berlin Buzzwords 2010

A Gentle Introduction to

Document Indexing

27

Page 28: Riak Search - Berlin Buzzwords 2010

Every dog has his day.#1

day, 1

dog, 1

every, 1

has, 1

his, 1

Inverted IndexDocument

The Inverted Index

28

Page 29: Riak Search - Berlin Buzzwords 2010

The dog's bark is worse than his bite.

Every dog has his day.

Let the cat out of the bag.

It's raining cats and dogs.

#1

#2

#3

#4

Combined Inverted IndexDocuments

and, 4

bag, 3

bark, 2

bite, 2

cat, 3

cat, 4

day, 1

dog, 1

dog, 2

dog, 4

every, 1

has, 1

...

The Inverted Index

29

Page 30: Riak Search - Berlin Buzzwords 2010

"dog AND cat"

AND

dog cat

At Query Time...

30

Page 31: Riak Search - Berlin Buzzwords 2010

AND

dog cat

dog, 1

dog, 2

dog, 4

cat, 3

cat, 4

At Query Time...

31

Page 32: Riak Search - Berlin Buzzwords 2010

AND(Merge Intersection)

1

2

4

3

4

Result: 4

At Query Time...

32

Page 33: Riak Search - Berlin Buzzwords 2010

OR(Merge Union)

1

2

4

3

4

Result: 1, 2, 3, 4

At Query Time...

33

Page 34: Riak Search - Berlin Buzzwords 2010

Complex Behavior from Simple Structures

34

Page 35: Riak Search - Berlin Buzzwords 2010

Storage Approaches...

35

Page 36: Riak Search - Berlin Buzzwords 2010

Riak Search uses

Consistent Hashing

to store data on

Partitions

36

Page 37: Riak Search - Berlin Buzzwords 2010

Partitions = 10

Number of Nodes = 5

Partitions per Node = 2

Replicas (NVal) = 2

Introduction to Consistent Hashing and Partitions

37

Page 38: Riak Search - Berlin Buzzwords 2010

Object

Introduction to Consistent Hashing and Partitions

38

Page 39: Riak Search - Berlin Buzzwords 2010

Document Partitioning

vs.

Term Partitioning

39

Page 40: Riak Search - Berlin Buzzwords 2010

...and the

Resulting Tradeoffs

40

Page 41: Riak Search - Berlin Buzzwords 2010

Every dog has his day.#1

Document Partitioning @ Index Time

41

Page 42: Riak Search - Berlin Buzzwords 2010

"dog OR cat"

Document Partitioning @ Query Time

42

Page 43: Riak Search - Berlin Buzzwords 2010

Every dog has his day.#1

day, 1

dog, 1

every, 1

has, 1

his, 1

Term Partitioning @ Index Time

43

Page 44: Riak Search - Berlin Buzzwords 2010

day, 1 has, 1

every, 1his, 1

dog, 1

Term Partitioning @ Index Time

44

Page 45: Riak Search - Berlin Buzzwords 2010

"dog OR cat"

Term Partitioning @ Query Time

45

Page 46: Riak Search - Berlin Buzzwords 2010

Document Partitioning Term Partitioning

+ Lower Latency Queries

- Lower Throughput

- Lots of Disk Seeks

- Higher Latency Queries

+ Higher Throughput

- Hotspots in Ring (the "Obama" problem)

Tradeoffs...

46

Page 47: Riak Search - Berlin Buzzwords 2010

Riak Search: Term Partitioning

47

Term-partitioning is the most viable approach for our beta clients’ needs: high throughput on Really Big Datasets.

Optimizations:

• Term splitting to reduce hot spots

• Bloom filters & caching to save query-time bandwidth

• Batching to save query-time & index-time bandwidth

Support for either approach eventually.

Page 48: Riak Search - Berlin Buzzwords 2010

Part Four

Review

48

Page 49: Riak Search - Berlin Buzzwords 2010

"Converse AND Shoes"

CLIENT RIAK

WTF!? I'm a

KV store!

Riak Search turns this...

49

Page 50: Riak Search - Berlin Buzzwords 2010

"Converse AND Shoes"

CLIENT RIAK

Gladly!

...into this...

50

Page 51: Riak Search - Berlin Buzzwords 2010

"Converse AND Shoes"

CLIENT RIAK

Keys or Objects

...into this...

51

Page 52: Riak Search - Berlin Buzzwords 2010

Your Application

RiakSearch

Riak

...while keeping operations easy.

52

Page 53: Riak Search - Berlin Buzzwords 2010

Thanks! Questions?

Search Team:

John Muellerleile - @jrecursive

Rusty Klophaus - @rklophaus

Kevin Smith - @kevsmith

Currently working with a small set of Beta users.

Open-source release planned for Q3.

www.basho.com