Top Banner
47

Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

Sep 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 2: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 3: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 4: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 5: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 6: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 7: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 8: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 9: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 10: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 11: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 12: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 13: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 14: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 15: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 16: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 17: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 18: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 19: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 20: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 21: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’
Page 22: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

1

groonga storage engine

Brazil, Inc. Tasuku SUENAGA a.k.a. gunyarakun

Page 23: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

What I talk about

2

Page 24: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

3

MySQL fulltext index • Phrase search is slow. • Updating index is slow. • Cannot combine full text search index with other indexes(like B-Tree).

Our prior product Tritonn solves.

Page 25: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

Tritonn • Tritonn = groonga + patches for MyISAM

4

Page 26: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

5

MySQL v.s. Tritonn MySQL(5.0)’s Fulltext Index Tritonn

Index size 109 MB 1028 MB Phrase search for ‘united states’ 44.91 sec 0.40 sec Indexing after inserting recs 1,474 sec 1,808 sec Inserting recs after idx. creation 28,182 sec 1,839 sec Where MATCH AGAINST and order by primary key

20.33 sec 0.89 sec Where MATCH AGAINST and primary key > 200000

6.55 sec 0.32 sec

Target dataset : Wikipedia English 458,713 record 1088MB

Page 27: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

So Tritonn provides … • Fast phrase search • Fast index update (realtime) • Works well with other indexes.

6 6 6

But some problems remain.

Page 28: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

Remaining problems • MyISAM based ‒ Table lock • when updating table, read accesses are blocked.

• Patch based ‒ Patch maintainance and building patched MySQL is too messy.

7 7

Need for a new solution.

Page 29: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

New solution is • groonga storage engine ‒ Use column store of groonga instead of MyISAM. ‒ Not patch but storage engine.

8

Tritonn (old) groonga storage engine(new)

Page 30: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

Advantage • Table lock free ‒ Column store of groonga is lock-free.

• Only access columns required ‒ Not row-based.

• Easy to build and develop

9 9 9

And some optimization for typical queries

Page 31: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

Optimization(1) • COUNT(*) optimization. ‒ For queries like below.

10

SELECT COUNT(*) FROM table WHERE MATCH(col) AGAINST (‘query’);

Page 32: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

Optimization(2) • ORDER BY score and LIMIT optimization. ‒ For queries like below.

11

SELECT * FROM table WHERE MATCH(col) AGAINST (‘query’)

ORDER BY MATCH(col) AGAINST (‘query’)

LIMIT 10;

Page 33: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

Conclusion of my part • groonga storage engine provides

• Fast phrase search • Fast index update (realtime) •  Inserting records doesn’t block reading records

12

Page 34: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

The combination of Groonga and Spider

Kentoku SHIBA kentokushiba at gmail dot com

Page 35: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

The combination of Groonga and Spider

In this time ...

Page 36: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

What is Spider Storage Engine

DB1 tbl_a

1.Request

2.Just connect to spider

3.Response

DB2 tbl_b

DB3 tbl_c

AP

SPIDER

Spider Storage Engine is a storage engine for database sharding transparently.

Page 37: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

The combination of Groonga and Spider

You can get following power by combination of Groonga and Spider.

- The optimization for the fulltext searching with sorting by score. - The optimization for the sorting by range partition key column. - The optimization for the fulltext searching with filtering by

partition key column.

Page 38: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

The optimization for the fulltext searching with sorting by score

(The case of scanning all partitions)

Page 39: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

Sorting by score

- Parallel searching - 2 step limitation

DB1

t1

DB2

t1

DB3

t1

DB4

t1

SELECT * FROM t1 WHERE MATCH(c2) AGAINST('hoge') ORDER BY _score LIMIT 100;

SELECT * FROM t1 WHERE MATCH(c2) AGAINST('hoge') ORDER BY _score LIMIT 100;

1

2 2 2 Parallel searching is comming soon.

Page 40: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

The optimization for the sorting by range partition key column

(coming soon)

Page 41: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

The sorting by range partition key column

- Sort optimization with range partition

DB1

t1

DB2

t1

DB3

t1

DB4

t1

SELECT * FROM t1 WHERE MATCH(c2) AGAINST('hoge') ORDER BY c1 LIMIT 100;

SELECT * FROM t1 WHERE MATCH(c2) AGAINST('hoge') ORDER BY c1 LIMIT 100; (LIMIT value is decreasing gradually)

1

2 3 4

c1 < 50 c1 < 100 c1 >= 100

Page 42: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

The optimization for the fulltext searching with filtering by partition key column

Page 43: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

The filtering by partition key column

- Partition pruning

DB1

t1

DB2

t1

DB3

t1

DB4

t1

SELECT * FROM t1 WHERE MATCH(c2) AGAINST('hoge') AND c1 = 60 ORDER BY _score LIMIT 100;

SELECT * FROM t1 WHERE MATCH(c2) AGAINST('hoge') AND c1 = 60 ORDER BY _score LIMIT 100;

1

2

c1 < 50 c1 < 100 c1 >= 100

Page 44: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

End of the session

Page 45: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

Source code and binary

If you want to try introduced Spider features,

you can download from here and try.

source code http://groonga.org/pkg/mysql-5.5.8-spider-2.24h-vp-0.13-hs-1.0.src.tgz

binary (Linux x86_64 glibc2.3)

http://groonga.org/pkg/mysql-5.5.8-spider-2.24h-vp-0.13-hs-1.0.bin.tgz

initialize SQL http://groonga.org/pkg/spider-init-2.24-for-5.5.8.tgz

Page 46: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

Contact us

If you have some questions, comments or suggestions, please contact us from here.

http://bit.ly/fSs5vx

Page 47: Mroonga - Fast fulltext search for all languages on MySQL · 5 MySQL v.s. Tritonn MySQL(5.0)ʼs Fulltext Index Tritonn Index size 109 MB 1028 MB Phrase search for ‘united states’

Kentoku SHIBA (kentokushiba at gmail dot com)

Thank you for

taking your time!!

Daijiro MORI (morita at razil dot jp) Tasuku SUENAGA (a at razil dot jp)