Top Banner
New features and enhancements of Spider Storage Engine for sharding MariaDB Corporation Kentoku SHIBA
48

New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

May 22, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

New features and enhancements

of Spider Storage Engine for sharding

MariaDB Corporation

Kentoku SHIBA

Page 2: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

1. what is SPIDER?

2. how long is SPIDER used in the big environment?

3. why SPIDER? what SPIDER can do for you?

4. when SPIDER is right for you?

what cases should you use SPIDER?

5. SPIDER sharding architecture

6. how to get SPIDER working?

7. SPIDER’s other features

8. New features and enhancements

Agenda

Page 3: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

What is Spider

Page 4: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

What is the Spider Storage Engine?

Spider is a sharding solution and proxying

solution. Spider Storage Engine is a

plugin of MariaDB/MySQL. Spider tables

can be used to federate from other servers

MariaDB/MySQL/OracleDB tables as if they

stand on local server. And Spider can

create database sharding by using table

partitioning feature.

Page 5: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

What is the Spider Storage Engine?

1.request

2. Execute SQL

4.response

AP

All databases can be used as ONE database through Spider.

APAP AP AP

SPIDER(MariaDB/MySQL)

MariaDB

tbl_a

MySQL

tbl_b

SPIDER(MariaDB/MySQL)

SPIDER(MariaDB/MySQL)

OracleDB

tbl_c

3. Distributed SQL3. Distributed SQL 3. Distributed SQL

Page 6: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

What is the Spider Storage Engine?

Spider is bundled in MariaDB

from 10.0 and all patches for MariaDB is

applied in 10.3

Page 7: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

How long is SPIDER used

in the big environment?

Page 8: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

How long is SPIDER used in the big environment?

Siemens

They handle 200 Billion records per

quarter on 3 Spider nodes and 4 data

nodes.

They use this cluster for data quality

analytics.

Page 9: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

How long is SPIDER used in the big environment?

Tencent Games

They handle 100TB datas on 396 Spider

nodes and 2800 data nodes.

They use this cluster for produce online

games.

Page 10: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Why SPIDER?

What SPIDER can do for you?

Page 11: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Why Spider? What Spider can do for you?

For federation

You can attach tables from other servers or

from local server by using Spider.

For sharding

You can divide huge tables and huge

traffics to multiple servers by using Spider.

Page 12: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Why Spider? What Spider can do for you?

Cross shard join

You can join all tables by using Spider,

even if tables are on different servers.

Page 13: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

simple

sharding

solution

Join operation with simple sharding solution (without Spider)

DB1

tbl_a1

1.Request

2. Execute SQL with JOIN

3.Response

DB2

AP

Join operation requires that all joined tables are on same

server.

APAP AP AP

tbl_a2tbl_b1 tbl_b2

Page 14: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Join operation with Spider

1.request

2. Execute SQL with JOIN

3.response

AP

You can JOIN all tables, even if tables are on different servers.

APAP AP AP

SPIDER(MariaDB/MySQL)

DB1

tbl_a1

DB2

tbl_a2tbl_b1 tbl_b2

Page 15: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Why Spider? What Spider can do for you?

Join push down

If it is possible, Spider executes JOIN

operation at data node directly.

Page 16: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

JOIN push down

1.request

2. Execute SQL with JOIN

3.response

AP

If all tables are on same data node, Spider executes JOIN

operation on data node directly.

APAP AP AP

SPIDER(MariaDB/MySQL)

DB1

tbl_a

DB2

tbl_ctbl_b tbl_d

Page 17: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

JOIN push down

Simple join operation are two times faster

on simple JOIN pushdown test.

Also, in this pushdown of JOIN, when

aggregate functions are included in the

query, since the aggregation processing is

also executed at the data node, the amount

of data transfer is greatly reduced and it

becomes super high speed.

Page 18: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

When SPIDER is right for you?

What cases should you use SPIDER?

Page 19: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

When SPIDER is right for you? What cases should you use SPIDER?

You should better use Spider

1.when you have 2 or more services and

the services needs to use data of

other services.

2.when you need scaling out for huge data

or write traffics.

Page 20: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

When SPIDER is right for you? What cases should you use SPIDER?

You should better use Spider

3.Unless some big data solutions you can

benefit indexing on shards.

4.You need sharding using sharding key

you want.

5.You need sharding and consistency.

Page 21: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

SPIDER sharding architecture

Page 22: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

SPIDER sharding architecture

Spider stores partitions on different servers.

This sharding design is done using the

database native table partitioning. You

can use all partitioning rules. (key, range,

hash, and so on)

Page 23: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Sharding

1.request

2. Execute SQL

3.response

AP

Using table partitioning rules

APAP AP AP

SPIDER(MariaDB/MySQL)

DB1

tbl_a

DB2

tbl_a

DB3

tbl_a

DB4

tbl_a

Page 24: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

SPIDER sharding architecture

You can federate multiple servers for the

same partition to bring HA and load

balancing per partition.

Page 25: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Duplicating

1.request

2. Execute SQL

3.response

AP

Duplicating for each partitions

APAP AP AP

SPIDER(MariaDB/MySQL)

DB1

tbl_a

DB2

tbl_a

DB3

tbl_a

DB4

tbl_a

DB5

tbl_a

DuplicatingDuplicating

Page 26: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

SPIDER sharding architecture

When writing multiple replicates or on

multiple servers, Spider use 2 phase

commit to preserve write consistency.

Page 27: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Write consistency

1.request

2. Execute SQL

3.response

AP

Using 2 phase commit

APAP AP AP

SPIDER(MariaDB/MySQL)

DB1

tbl_a

DB2

tbl_a

DB3

tbl_a

DB4

tbl_a

2 phase commit

Page 28: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

How to get SPIDER working?

Page 29: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

How to get SPIDER working? (1/5)

1. Install Spider bundled with

MariaDB/MySQL.

2. Login MariaDB/MySQL then

install Spider as a plugin.

(execute install_spider.sql)

3. Create Spider table.

Page 30: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

How to get SPIDER working? (2/5)

Create one to one Spider table.

CREATE TABLE t1(

c1 int,

c2 varchar(100),

PRIMARY KEY(c1)

)ENGINE=spider DEFAULT CHARSET=utf8

COMMENT '

table "rt1", database "test", port "3306",

host "host name of data node",

user "user name for data node",

password "password for data node"

';

Set engine name to “Spider” and write connect

information (and parameter) in the comment.

Page 31: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

How to get SPIDER working? (3/5)

You can create Spider tables without column definitions in

MariaDB. In this case Spider gets the column definition from

data node.

CREATE TABLE t1

ENGINE=spider DEFAULT CHARSET=utf8

COMMENT '

table "rt1", database "test", port "3306",

host "host name of data node",

user "user name for data node",

password "password for data node"

';

Page 32: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

How to get SPIDER working? (4/5)

Create one to many (sharding) Spider table

CREATE TABLE t1(

c1 int,

c2 varchar(100),

PRIMARY KEY(c1)

)ENGINE=spider DEFAULT CHARSET=utf8

COMMENT 'table "rt1", database "test", port "3306",

user "user name for data node", password "password for data node"'

PARTITION BY RANGE(c1) (

PARTITION p0 VALUES LESS THAN (100000) COMMENT 'host "h1"',

PARTITION p1 VALUES LESS THAN (200000) COMMENT 'host "h2"',

PARTITION p2 VALUES LESS THAN (300000) COMMENT 'host "h3"',

PARTITION p3 VALUES LESS THAN MAXVALUE COMMENT 'host "h4"'

);

Write shared connect information to table comment,

shard specific connect information to partition comment.

Page 33: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

How to get SPIDER working? (5/5)

You can use “CREATE SERVER” statement for defining

connection information.

CREATE SERVER srv1

FOREIGN DATA WRAPPER mysql

HOST 'host name of data node',

DATABASE 'test',

USER 'user name for data node',

PASSWORD 'password for data node',

PORT 3306

;

You can use create server definition by writing “server” parameter

into table/partition comment.

CREATE TABLE t1(

c1 int,

c2 varchar(100),

PRIMARY KEY(c1)

)ENGINE=spider DEFAULT CHARSET=utf8

COMMENT 'table "rt1", server "srv1"';

Page 34: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Spider’s other features

Page 35: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Spider’s other features

Fulltext/Geo search feature

You can use backend Fulltext/Geo search feature

transparently.

NoSQL feature (not available for MariaDB yet)

You can use HandlerSocket for Spider.

OracleDB connecting

You can use OracleDB for data node.

Note: You need to build from source code

for using this feature

Page 36: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Spider’s other features

Parallel searching

You can search sharded table by parallel.

Direct updating

Improve updating performance.

Direct aggregating

Improve aggregating(group by) performance.

Page 37: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Spider’s other features

Engine condition pushdown

Improve searching with full-scan performance.

Multi Range Read

(include Batched Key Access)

Improve searching with join performance.

Page 38: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

New features and enhancements

Page 39: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

DDL(Create Table) Pushdown (working)

Create Table

When Spider gets DDL, Spider creates

DDL for data nodes and pushes it.

SPIDER(MariaDB/MySQL)

DB1

tbl_a

DB2

tbl_a

Create TableCreate Table

Page 40: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

The setting of DDL pushdown

Working…

Page 41: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Support more aggregate functions for direct join

Support count(distinct), sum(distinct),

avg() and avg(distinct).

SPIDER(MariaDB/MySQL)

DB1

tbl_a

DB2

tbl_a

Select count(distinct c2)

from tbl_a where c1 = 1;

Select count(distinct c2)

from tbl_a where c1 = 1;

Page 42: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Support outer joins for direct join

Spider can pushdown a query with outer joins if

these tables are on same data node.

SPIDER(MariaDB/MySQL)

DB1

tbl_a

DB2

tbl_a

Select … from tbl_a left join tbl_b

on … where tbl_a.c2 = tbl_b.c1;

Select … from tbl_a left join tbl_b

on … where tbl_a.c2 = tbl_b.c1;

Page 43: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Transaction isolation for slave sql threads

Replication

Spider changes transaction isolation level

for slave sql threads.

SPIDER(MariaDB/MySQL)

DB1

tbl_a

DB2

tbl_a

Master

tbl_aRepeatable Read

Repeatable read for user threads

Read committed only for slave sql threads

(spider_slave_trx_isolation = 1)

Page 44: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Use disk by size of a resultset from a data node

Spider uses a disk temporary table if a

resultset from a data node is too big.

SPIDER(MariaDB/MySQL)

DB1

tbl_a

DB2

tbl_a

Page 45: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Parallel checksum table

Checksum Table

SPIDER(MariaDB/MySQL)

DB1

tbl_a

DB2

tbl_a

Checksum TableChecksum Table

Page 46: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Other new features and enhancements

Adjusting wait_timeout

A feature of keeping connections.

Synchornizing sql_mode

Support rules of sql_mode transparently.

Support mixed charsets in a query

Support multiple lingual use cases.

Performance improvement of partition with

a large amount of partitions (working)

Improve performance of sharding environments.

Page 47: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Other new features and enhancements

Bugfixes

Thanks to

Eric, Mattias, Simon at Booking.com

Felix, Will at Tencent Games

Michal at Fedora Project

and others.

Page 48: New features and enhancements of Spider Storage Engine for ... · What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is

Thank you for

taking your

time!!