Top Banner
A SNEAK PEEK AT CBFT Couchbase Full-Text Server Marty Schoch & Steve Yen, Couchbase, Inc.
50

A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

Aug 15, 2015

Download

Technology

Couchbase
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

A SNEAK PEEK AT CBFT

Couchbase Full-Text ServerMarty Schoch & Steve Yen, Couchbase, Inc.

Page 2: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 2

about the speakers

Steve [email protected] Couchbase

Page 3: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 3

about the speakers

Marty Schochlead contributor to bleve

the most popular, open-source

full-text indexing engine

for golang

Page 4: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 4

agenda

why cbft?what’s full-text search and how’s it work?designdemostatus / roadmap / what’s next

Page 5: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 5

agenda

why cbft?what’s full-text search and how’s it work?designdemostatus / roadmap / what’s next

Page 6: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 6

why cbft?

couchbase connectors…

yes

yes

Lucidworks yes

Page 7: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 7

why cbft?

couchbase connectors… yet another tier &

cluster to manage

yesyes

yesyes

Lucidworks yesyes

Page 8: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 8

why cbft?

why cbft?

simpleintegrated

80/20 of features

Page 9: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 9

agenda

why cbft?what’s full-text search and how’s it work?designdemostatus / roadmap / what’s next

Page 10: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 10

what’s full-text search?

Page 11: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 11

advanced search

Page 12: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 12

search results

Page 13: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 13

search results SpellingSuggestions

Page 14: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 14

search results SpellingSuggestions

Result TextSnippets

Page 15: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 15

search results SpellingSuggestions

Result TextSnippets

HighlightedSearch Terms

Page 16: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 16

faceted search

Page 17: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 17

JSON document in Couchbase

Key: akay1980

Document: {

“name”: “Alan Kay”, “description”: “... the wisest

engineer ...” }

Page 18: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 18

Text Analysis : tokenizer + token filters

A pipeline of transformations

One Tokenizer

Zero or more Token Filters

Page 19: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 19

“… the wisest engineer …”

thewises

tenginee

r• Seems like simple whitespace… but, this doesn’t work for

all languages• Unicode standard rules help (see Unicode Standard Annex

#29)• Still need to account for exceptions

• E-mail addresses and URLs don’t follow normal rules

Text Analysis : tokenizer + token filters

Page 20: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 20

Text Analysis : tokenizer + token filters

thewises

tengineer

Stop WordRemoval the

wisest

engineer

Stemming wise engineer

Page 21: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 21

Inverted Index

wise

engineer…

…, akay1980, …

…, akay1980, …

Inverted Index

Page 22: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 22

Search

wise

engineer…

…, akay1980, …

…, akay1980, …

engineersInverted Index

Page 23: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 23

Search

wise

engineer…

…, akay1980, …

…, akay1980, …

engineers

engineer

Apply the same analysis at search time that we used at index time.

Inverted Index

Page 24: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 24

Search

wise

engineer…

…, akay1980, …

…, akay1980, …

engineers

engineer

Exact Match

Apply the same analysis at search time that we used at index time.

Inverted Index

Page 25: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 25

Document Scoring

• tf/idf scoring• Term Frequency• How often does a term occur in

a doc?• More often yields a higher score

• Inverse Document Frequency• How many docs have this term?• More docs yield lower score

(because the term is more common)

Page 26: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 26

Quality Results

• Getting high quality results depends on the right text analysis

• Beware: adjustments that increase precision may reduce recall (and the other way around)

Page 27: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 27

agenda

why cbft?what’s full-text search and how’s it work?designdemostatus / roadmap / what’s next

Page 28: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 28

cbft design / index partitioning

Page 29: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 29

cbft design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

Page 30: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 30

cbft design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

index partitions: A B C

Page 31: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 31

cbft design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

index partitions: A B C

Page 32: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 32

cbft design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

index partitions: A B C

(groups of vbuckets) 0-399 400-799 800-1023

Page 33: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 33

cbft design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

index partitions: A B C

(groups of vbuckets) 0-399 400-799 800-1023

cbft nodes:

X

Page 34: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 34

cbft design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

index partitions: A B C

(groups of vbuckets) 0-399 400-799 800-1023

assign to cbft nodes:

cbft nodes:

X

Page 35: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 35

cbft design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

index partitions: A B C

(groups of vbuckets) 0-399 400-799 800-1023

assign to cbft nodes:

cbft nodes:

X Y Z

Page 36: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 36

cbft design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

index partitions: A B C

(groups of vbuckets) 0-399 400-799 800-1023

assign to cbft nodes:replicas, too:

cbft nodes:

X Y Z

Page 37: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 37

cbft design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

index partitions: A B C

(groups of vbuckets) 0-399 400-799 800-1023

assign to cbft nodes:replicas, too:

cbft nodes:

X Y Z

Page 38: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 38

cbft design / indexing

couchbase couchbase couchbase

cbft cbft cbft

DCP streams

Page 39: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 39

cbft design / indexing

couchbase couchbase couchbase

cbft cbft cbft

DCP streams

Page 40: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 40

cbft design / queries

cbft cbft

a query sentto any cbftnode…

your application

cbftR

ES

T

Page 41: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 41

cbft design / queries

cbft cbft

a query sentto any cbftnode…

…is scatter / gathered

to the other cbft nodes

your application

cbftR

ES

T

Page 42: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 42

agenda

why cbft?what’s full-text search and how’s it work?designdemostatus / roadmap / what’s next

Page 43: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 43

agenda

why cbft?what’s full-text search and how’s it work?designdemostatus / roadmap / what’s next

Page 44: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 44

project status

cbft is developer preview!

please help kick the tires

http://labs.couchbase.com/cbft

Page 45: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 45

project status / roadmap / what’s next

today

bleve full-text engine yadvanced mappings yfaceted search y

incremental indexing y

index partitioning and replication y

index aliasesy

Page 46: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 46

project status / roadmap / what’s next

today future

bleve full-text engine yy

advanced mappings yy

faceted search yy

incremental indexing yy

index partitioning and replication yy

index aliasesy y

integrated into Couchbase Server & N1QLy

API stabilityy

production qualityy

performance optimization / tuningy

forestdb storage & partial rollbacks y

security, SSLy

more docs, examples, SDK supporty

Page 47: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 47

links & Q+A

http://labs.couchbase.com/cbftdownloads, getting started, tech docs

and, share your feedback!

THANKS! (and please do the survey!)

Page 48: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 48

Page 49: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

A SNEAK PEEK AT CBFT

couchbase full-text server

THANKS! (and please do the survey!)

Page 50: A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

©2015 Couchbase Inc. 50

cbft design

couchbase couchbase couchbase

cbft cbft cbft

cfg

DCP streamsfor incrementalindex updates

a cfg bucketholds metadata

about the indexes