Top Banner
Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith Natalia Chechina , Phil Trinder London Riak Meetup - October 22, 2013 1 http://www.release-project.eu
31

Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Oct 19, 2019

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Scalable Persistent Storage for Erlang

Theory and Practice

Amir Ghaffari Jon Meredith

Natalia Chechina , Phil Trinder

London Riak Meetup - October 22, 2013

1

http://www.release-project.eu

Page 2: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Outline

• RELEASE Project

• General principles of scalable DBMSs

• NoSQL DBMSs for Erlang

• Riak 1.1.1 Scalability in Practice

• Investigating the scalability of distributed Erlang

• Riak Elasticity

• Conclusion & Future work

2

Page 3: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

RELEASE project

• RELEASE is an European project

aiming to scale Erlang onto

commodity architectures with 100,000

cores.

3

Page 4: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

RELEASE project

The RELEASE consortium work at following levels:

Virtual machine

Language

scalable Computation model

Scalable In-memory data structures

Scalable Persistent data structures

Infrastructure levels

Profiling and refactoring tools

4

Page 5: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

General principles of scalable DBMSs

Data Fragmentation

1. Decentralized model (e.g. P2P model)

2. Systematic load balancing (make life easier for developer)

3. Location transparency

5

0-2K 2k-4K 4k-6K 16k-18K 18k-20K

20kB

e.g. 20k data is fragmented among 10 nodes

Page 6: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

General principles of scalable DBMSs

Replication 1. Decentralized model (e.g. P2P model)

2. Location transparency

3. Asynchronous replication (write is considered complete as soon

as on node acknowledges it)

6

X

e.g. Key X is replicated on three nodes

.

.

X

.

.

X

.

.

X

Page 7: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

General principles of scalable DBMSs

Consistency Availability

Partition

Tolerance ACID Systems Eventual

Consistency

CAP theorem: cannot

simultaneously guarantee:

•Partition tolerance: system

continues to operate despite nodes

can't talk to each other

•Availability: guarantee that every

request receives a response

•Consistency: all nodes see the

same data at the same time

Not achievable because network failures are inevitable

7

Solution: Eventual consistency and reconciling conflicts via data versioning

ACID=Atomicity, Consistency, Isolation, Durability

Page 8: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

NoSQL DBMSs for Erlang

Mnesia CouchDB Riak Cassandra

Fragmentation •Explicit placement

•Client-server

•Automatic by using

a hash function

•Explicit placement

•Multi-server

•Lounge is not part of

each CouchDB node

•Implicit placement

•Peer to peer

•Automatic by using

consistent hash

technique

•Implicit placement

•Peer to peer

•Automatic by using

consistent hash

technique

Replication •Explicit placement

•Client-server

•Asynchronous

( Dirty operation)

•Explicit placement

•Multi-server

•Asynchronous

•Implicit placement

•Peer to peer

•Asynchronous

•Implicit placement

•Peer to peer

•Asynchronous

Partition

Tolerant

•Strong consistency •Eventual consistency

•Multi-Version

Concurrency Control

for reconciliation

•Eventual consistency

•Vector clocks for

reconciliation

•Eventual consistency

•Use timestamp to

reconcile

Query

Processing

&

Backend

Storage

•The largest possible

Mnesia table is 4Gb

•No limitation

•Supports Map/Reduce

Queries

•Bitcask has memory

limitation

•LevelDB has no

limitation

•Supports

Map/Reduce queries

•No limitation

•Supports Map/Reduce

queries

8

Page 9: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Initial Evaluation Results

General Principles

Initial Evaluation

• Mnesia

• CouchDB

• Riak

• Cassendra

Scalable persistent storage for SD Erlang can be provided by

Dynamo-style DBMSs such as Riak,Cassandra

9

Page 10: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Riak Scalability in Practice

• Basho Bench: a benchmarking tool for Riak

• We measure Basho Bench on 348-node Kalkyl cluster

• Scalability: How does adding more Riak nodes affect the

throughput?

• There are two kinds of nodes in a cluster:

• Traffic generators

• Riak nodes

10

Page 11: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Node Organisation

11

Heuristic: one traffic generator per 3 Riak nodes

Page 12: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Traffic Generator

12

Page 13: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Riak 1.1.1 Scalability

Benchmark on 100-node cluster (800 cores)

13

Page 14: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Failures

14

Page 15: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Profiling Resource Usage

15

CPU Usage

Page 16: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Profiling Resource Usage

16

DISK Usage

Page 17: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Profiling Resource Usage

17

Memory Usage

Page 18: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Profiling Resource Usage

18

Network Traffic of Generator Nodes

Page 19: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Profiling Resource Usage

19

Network Traffic of Riak Nodes

Page 20: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Bottleneck for Riak Scalability

CPU, RAM, Disk, and Network profiling reveal that

they can't be bottleneck for Riak scalability.

Is the Riak scalability limits due to limits in

distributed Erlang?

To find out, let’s measure the scalability of

distributed Erlang.

20

Page 21: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

DE-Bench

21

• DE-Bench: a benchmarking tool for distributed

Erlang

• It is based on Basho Bench

• Measures the throughput of a cluster of Erlang nodes

• Records the latency of distributed Erlang commands

individually

Page 22: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Distributed Erlang Commands

22

• Spawn/RPC: peer to peer commands

• register_name : global name tables located on every node

• unregister_name : global name tables located on every node

• whereis_name : a lookup in the local table

Register

Unregister

Erlang VM Erlang VM Erlang VM Erlang VM

Global

name table

Global

name table

Global

name table

Global

name table

Page 23: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

DE-Bench’s P2P Design

23

Physical host

1

Physical host

2

Page 24: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Frequency of Global Operation

24

Frequently Max Throughput

1% 30 nodes

0.5% 50 nodes

0.33% 70 nodes

0% 1600 nodes

Global Operations limit the scalability of distributed Erlang

Page 25: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Riak Software Scalability

• Monitoring global.erl module from OTP library shows

that Riak does NOT use any global operation.

• Instrumenting gen_server.erl module reveals that:

Of the 15 most time-consuming operations, only the time of

rpc:call grows with cluster size.

Moreover, of the five Riak RPC calls, only start_put_fsm

function from module riak_kv_put_fsm_sup grows with cluster

size.

25

Page 26: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Eliminating the Bottlenecks

• Independently, Basho identified that two supervisor

processes, i.e. riak_kv_get/put_fsm_sup, become

bottleneck under heavy load, exhibiting build up in

message queue length.

• To improve the Riak scalability in version 1.3 and 1.4

Basho applied a number of techniques and introduced

new library sidejob

(https://github.com/basho/sidejob).

26

Page 27: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Riak1.1.1 Elasticity

Time-line shows Riak cluster losing and gaining nodes

27

Page 28: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Riak1.1.1 Elasticity

How Riak cluster deals with nodes leaving and joining

28

Page 29: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Observation

• Number of failures (37)

• Number of successful operations (approximately 3.41

million)

• When failed nodes come back up, the throughput has

grown that shows Riak1.1.1 has a good elasticity.

29

Page 30: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

Conclusion and Future work

Our benchmark confirms that Riak has a good elasticity.

We establish for the first time scientifically the scalability limit of Riak

1.1.1 as 60 nodes.

We have shown how global operations limits the scalability of distributed

Erlang.

Riak scalability bottelnecks are eliminated in Riak versions 1.3 and

upcoming versions.

In RELEASE, we are working to scale up distributed Erlang by grouping

nodes in smaller partitions.

30

Page 31: Scalable Persistent Storage for Erlang Theory and Practiceamirg/publications/riak_meetup.pdf · Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith

References

Benchmarking Riak https://github.com/amirghaffari/benchmark_riak

Basho Bench http://docs.basho.com/riak/latest/ops/building/benchmarking/

DE-Bench https://github.com/amirghaffari/DEbench

A. Ghaffari, N. Chechina, P. Trinder, and J. Meredith. Scalable Persistent Storage for

Erlang: Theory and Practice. In Proceedings of the Twelfth ACM SIGPLAN Workshop

on Erlang, pages 73-74, September 2013. ACM Press.

Clusters at UPPMAX http://www.uppmax.uu.se/hardware

Sidejob https://github.com/basho/sidejob

31