Top Banner
Designing and Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn (@cmeik) Université catholique de Louvain, Belgium 1
163

Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Jul 27, 2018

Download

Documents

ngobao
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Designing and Evaluating a Distributed Computing Language RuntimeChristopher Meiklejohn (@cmeik) Université catholique de Louvain, Belgium

1

Page 2: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

RA

RB

Page 3: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

RA

RB

1

set(1)

Page 4: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

RA

RB

1

set(1)

3

2

set(2)

set(3)

Page 5: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

RA

RB

1

set(1)

3

2

set(2)

set(3)

?

?

Page 6: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Synchronization• To enforce an order

Makes programming easier

6

Page 7: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Synchronization• To enforce an order

Makes programming easier

• Eliminate accidental nondeterminismPrevent race conditions

6

Page 8: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Synchronization• To enforce an order

Makes programming easier

• Eliminate accidental nondeterminismPrevent race conditions

• TechniquesLocks, mutexes, semaphores, monitors, etc.

6

Page 9: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Difficult Cases• “Internet of Things”,

Low power, limited memory and connectivity

7

Page 10: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Difficult Cases• “Internet of Things”,

Low power, limited memory and connectivity

• Mobile GamingOffline operation with replicated, shared state

7

Page 11: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Weak Synchronization• Can we achieve anything without synchronization?

Not really.

8

Page 12: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Weak Synchronization• Can we achieve anything without synchronization?

Not really.

• Strong Eventual Consistency (SEC)“Replicas that deliver the same updates have equivalent state”

8

Page 13: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Weak Synchronization• Can we achieve anything without synchronization?

Not really.

• Strong Eventual Consistency (SEC)“Replicas that deliver the same updates have equivalent state”

• Primary requirementEventual replica-to-replica communication

8

Page 14: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Weak Synchronization• Can we achieve anything without synchronization?

Not really.

• Strong Eventual Consistency (SEC)“Replicas that deliver the same updates have equivalent state”

• Primary requirementEventual replica-to-replica communication

• Order insensitive! (Commutativity)

8

Page 15: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Weak Synchronization• Can we achieve anything without synchronization?

Not really.

• Strong Eventual Consistency (SEC)“Replicas that deliver the same updates have equivalent state”

• Primary requirementEventual replica-to-replica communication

• Order insensitive! (Commutativity)

• Duplicate insensitive! (Idempotent)

8

Page 16: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

RA

RB

Page 17: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

RA

RB

1

set(1)

Page 18: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

RA

RB

1

set(1)

3

2

set(2)

set(3)

Page 19: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

RA

RB

1

3

2

3

3

set(1) set(2)

set(3)

max(2,3)

max(2,3)

Page 20: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

How can we succeed with Strong Eventual Consistency?

13

Page 21: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Programming SEC1. Eliminate accidental nondeterminism

(ex. deterministic, modeling non-monotonic operations monotonically)

14

Page 22: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Programming SEC1. Eliminate accidental nondeterminism

(ex. deterministic, modeling non-monotonic operations monotonically)

2. Retain the properties of functional programming(ex. confluence, referential transparency over composition)

14

Page 23: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Programming SEC1. Eliminate accidental nondeterminism

(ex. deterministic, modeling non-monotonic operations monotonically)

2. Retain the properties of functional programming(ex. confluence, referential transparency over composition)

3. Distributed, and fault-tolerant runtime(ex. replication, membership, dissemination)

14

Page 24: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Programming SEC1. Eliminate accidental nondeterminism

(ex. deterministic, modeling non-monotonic operations monotonically)

2. Retain the properties of functional programming(ex. confluence, referential transparency over composition)

3. Distributed, and fault-tolerant runtime(ex. replication, membership, dissemination)

15

Page 25: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Convergent ObjectsConflict-Free Replicated Data Types

16SSS 2011

Page 26: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Conflict-Free Replicated Data Types

• Many types exist with different propertiesSets, counters, registers, flags, maps, graphs

17

Page 27: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Conflict-Free Replicated Data Types

• Many types exist with different propertiesSets, counters, registers, flags, maps, graphs

• Strong Eventual ConsistencyInstances satisfy SEC property per-object

17

Page 28: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

RA

RB

RC

Page 29: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

RA

RB

RC

{1}

(1, {a}, {})

add(1)

Page 30: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

RA

RB

RC

{1}

(1, {a}, {})

add(1)

{1}

(1, {c}, {})

add(1)

Page 31: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

RA

RB

RC

{1}

(1, {a}, {})

add(1)

{1}

(1, {c}, {})

add(1)

{}

(1, {c}, {c})

remove(1)

Page 32: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

RA

RB

RC

{1}

(1, {a}, {})

add(1)

{1}

(1, {c}, {})

add(1)

{}

(1, {c}, {c})

remove(1)

{1}

{1}

{1}

(1, {a, c}, {c})

(1, {a, c}, {c})

(1, {a, c}, {c})

Page 33: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Programming SEC1. Eliminate accidental nondeterminism

(ex. deterministic, modeling non-monotonic operations monotonically)

2. Retain the properties of functional programming(ex. confluence, referential transparency over composition)

3. Distributed, and fault-tolerant runtime(ex. replication, membership, dissemination)

23

Page 34: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Convergent Programs Lattice Processing

24PPDP 2015

Page 35: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Lattice Processing (Lasp)• Distributed dataflow

Declarative, functional programming model

25

Page 36: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Lattice Processing (Lasp)• Distributed dataflow

Declarative, functional programming model

• Convergent data structuresPrimary data abstraction is the CRDT

25

Page 37: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Lattice Processing (Lasp)• Distributed dataflow

Declarative, functional programming model

• Convergent data structuresPrimary data abstraction is the CRDT

• Enables compositionProvides functional composition of CRDTs that preserves the SEC property

25

Page 38: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

26

%% Create initial set. S1 = declare(set),

%% Add elements to initial set and update. update(S1, {add, [1,2,3]}),

%% Create second set. S2 = declare(set),

%% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).

Page 39: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

27

%% Create initial set. S1 = declare(set),

%% Add elements to initial set and update. update(S1, {add, [1,2,3]}),

%% Create second set. S2 = declare(set),

%% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).

Page 40: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

28

%% Create initial set. S1 = declare(set),

%% Add elements to initial set and update. update(S1, {add, [1,2,3]}),

%% Create second set. S2 = declare(set),

%% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).

Page 41: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

29

%% Create initial set. S1 = declare(set),

%% Add elements to initial set and update. update(S1, {add, [1,2,3]}),

%% Create second set. S2 = declare(set),

%% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).

Page 42: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

30

%% Create initial set. S1 = declare(set),

%% Add elements to initial set and update. update(S1, {add, [1,2,3]}),

%% Create second set. S2 = declare(set),

%% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).

Page 43: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Programming SEC1. Eliminate accidental nondeterminism

(ex. deterministic, modeling non-monotonic operations monotonically)

2. Retain the properties of functional programming(ex. confluence, referential transparency over composition)

3. Distributed, and fault-tolerant runtime(ex. replication, membership, dissemination)

31

Page 44: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Distributed Runtime Selective Hearing

32W-PSDS 2015

Page 45: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Selective Hearing• Epidemic broadcast based runtime system

Provide a runtime system that can scale to large numbers of nodes, that is resilient to failures and provides efficient execution

33

Page 46: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Selective Hearing• Epidemic broadcast based runtime system

Provide a runtime system that can scale to large numbers of nodes, that is resilient to failures and provides efficient execution

• Well-matched to Lattice Processing (Lasp)

33

Page 47: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Selective Hearing• Epidemic broadcast based runtime system

Provide a runtime system that can scale to large numbers of nodes, that is resilient to failures and provides efficient execution

• Well-matched to Lattice Processing (Lasp)

• Epidemic broadcast mechanisms provide weak ordering but are resilient and efficient

33

Page 48: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Selective Hearing• Epidemic broadcast based runtime system

Provide a runtime system that can scale to large numbers of nodes, that is resilient to failures and provides efficient execution

• Well-matched to Lattice Processing (Lasp)

• Epidemic broadcast mechanisms provide weak ordering but are resilient and efficient

• Lasp’s programming model is tolerant to message re-ordering, disconnections, and node failures

33

Page 49: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Selective Hearing• Epidemic broadcast based runtime system

Provide a runtime system that can scale to large numbers of nodes, that is resilient to failures and provides efficient execution

• Well-matched to Lattice Processing (Lasp)

• Epidemic broadcast mechanisms provide weak ordering but are resilient and efficient

• Lasp’s programming model is tolerant to message re-ordering, disconnections, and node failures

• “Selective Receive”Nodes selectively receive and process messages based on interest.

33

Page 50: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Layered Approach

34

Page 51: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Layered Approach• Membership

Configurable membership protocol which can operate in a client-server or peer-to-peer mode

34

Page 52: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Layered Approach• Membership

Configurable membership protocol which can operate in a client-server or peer-to-peer mode

• Broadcast (via Gossip, Tree, etc.)Efficient dissemination of both program state and application state via gossip, broadcast tree, or hybrid mode

34

Page 53: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Layered Approach• Membership

Configurable membership protocol which can operate in a client-server or peer-to-peer mode

• Broadcast (via Gossip, Tree, etc.)Efficient dissemination of both program state and application state via gossip, broadcast tree, or hybrid mode

• Auto-discoveryIntegration with Mesos, auto-discovery of Lasp nodes for ease of configurability

34

Page 54: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Membership Overlay

Page 55: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Membership Overlay

Broadcast Overlay

Page 56: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Membership Overlay

Broadcast Overlay

Page 57: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Membership Overlay

Broadcast Overlay

Mobile Phone

Page 58: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Membership Overlay

Broadcast Overlay

Mobile Phone

Distributed Hash Table

Page 59: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Membership Overlay

Broadcast Overlay

Mobile Phone

Distributed Hash Table

Lasp Execution

Page 60: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Programming SEC1. Eliminate accidental nondeterminism

(ex. deterministic, modeling non-monotonic operations monotonically)

2. Retain the properties of functional programming(ex. confluence, referential transparency over composition)

3. Distributed, and fault-tolerant runtime(ex. replication, membership, dissemination)

41

Page 61: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

What can we build? Advertisement Counter

42

Page 62: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Advertisement Counter• Mobile game platform selling

advertisement spaceAdvertisements are paid according to a minimum number of impressions

43

Page 63: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Advertisement Counter• Mobile game platform selling

advertisement spaceAdvertisements are paid according to a minimum number of impressions

• Clients will go offlineClients have limited connectivity and the system still needs to make progress while clients are offline

43

Page 64: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Lasp Operation

User-Maintained CRDT

Lasp-Maintained CRDT

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

44

Page 65: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

45

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Lasp Operation

User-Maintained CRDT

Lasp-Maintained CRDT

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

Page 66: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

46

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Lasp Operation

User-Maintained CRDT

Lasp-Maintained CRDT

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

Page 67: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

47

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Lasp Operation

User-Maintained CRDT

Lasp-Maintained CRDT

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

Page 68: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

48

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Lasp Operation

User-Maintained CRDT

Lasp-Maintained CRDT

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

Page 69: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

49

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Lasp Operation

User-Maintained CRDT

Lasp-Maintained CRDT

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

Page 70: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

50

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Lasp Operation

User-Maintained CRDT

Lasp-Maintained CRDT

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

Page 71: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

51

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Lasp Operation

User-Maintained CRDT

Lasp-Maintained CRDT

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

Page 72: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

52

Ads

Rovio Ad Counter 1

Rovio Ad Counter 2

Riot Ad Counter 1

Riot Ad Counter 2

Contracts

AdsContracts

AdsWith

Contracts

Riot Ads

Rovio Ads

FilterProduct

Read � 50,000

Remove

Increment

Read

Union

Lasp Operation

User-Maintained CRDT

Lasp-Maintained CRDT

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Rovio Ad Counter

1

Rovio Ad Counter

2

Riot Ad Counter

1

Client Side, Single Copy at Client

Page 73: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Evaluation Initial Evaluation

53

Page 74: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Background Distributed Erlang

• Transparent distributionBuilt-in, provided by Erlang/BEAM, cross-node message passing.

54

Page 75: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Background Distributed Erlang

• Transparent distributionBuilt-in, provided by Erlang/BEAM, cross-node message passing.

• Known scalability limitationsAnalyzed in academic in various publications.

54

Page 76: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Background Distributed Erlang

• Transparent distributionBuilt-in, provided by Erlang/BEAM, cross-node message passing.

• Known scalability limitationsAnalyzed in academic in various publications.

• Single connectionHead of line blocking.

54

Page 77: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Background Distributed Erlang

• Transparent distributionBuilt-in, provided by Erlang/BEAM, cross-node message passing.

• Known scalability limitationsAnalyzed in academic in various publications.

• Single connectionHead of line blocking.

• Full membershipAll-to-all failure detection with heartbeats and timeouts.

54

Page 78: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Background Erlang Port Mapper Daemon• Operates on a known port

Similar to Solaris sunrpc style portmap: known port for mapping to dynamic port-based services.

55

Page 79: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Background Erlang Port Mapper Daemon• Operates on a known port

Similar to Solaris sunrpc style portmap: known port for mapping to dynamic port-based services.

• Bridged networkingProblematic for cluster in bridged networking with dynamic port allocation.

55

Page 80: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Experiment Design• Single application

Advertisement counter example from Rovio Entertainment.

56

Page 81: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Experiment Design• Single application

Advertisement counter example from Rovio Entertainment.

• Runtime configurationApplication controlled through runtime environment variables.

56

Page 82: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Experiment Design• Single application

Advertisement counter example from Rovio Entertainment.

• Runtime configurationApplication controlled through runtime environment variables.

• MembershipFull membership with Distributed Erlang via EPMD.

56

Page 83: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Experiment Design• Single application

Advertisement counter example from Rovio Entertainment.

• Runtime configurationApplication controlled through runtime environment variables.

• MembershipFull membership with Distributed Erlang via EPMD.

• DisseminationState-based object dissemination through anti-entropy protocol (fanout-based, PARC-style.)

56

Page 84: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Experiment Orchestration• Docker and Mesos with Marathon

Used for deployment of both EPMD and Lasp application.

57

Page 85: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Experiment Orchestration• Docker and Mesos with Marathon

Used for deployment of both EPMD and Lasp application.

• Single EPMD instance per slaveControlled through the use of host networking and HOSTNAME: UNIQUE constraints in Mesos.

57

Page 86: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Experiment Orchestration• Docker and Mesos with Marathon

Used for deployment of both EPMD and Lasp application.

• Single EPMD instance per slaveControlled through the use of host networking and HOSTNAME: UNIQUE constraints in Mesos.

• LaspLocal execution using host networking: connects to local EPMD.

57

Page 87: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Experiment Orchestration• Docker and Mesos with Marathon

Used for deployment of both EPMD and Lasp application.

• Single EPMD instance per slaveControlled through the use of host networking and HOSTNAME: UNIQUE constraints in Mesos.

• LaspLocal execution using host networking: connects to local EPMD.

• Service DiscoveryService discovery facilitated through clustering EPMD instances through Sprinter.

57

Page 88: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Ideal Experiment• Local Deployment

High thread concurrency when operating with lower node count.

58

Page 89: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Ideal Experiment• Local Deployment

High thread concurrency when operating with lower node count.

• Cloud DeploymentLow thread concurrency when operating with a higher node count.

58

Page 90: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Results Initial Evaluation

59

Page 91: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Initial Evaluation• Moved to DC/OS exclusively

Environments too different: too much work needed to be adapted for things to work correctly.

60

Page 92: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Initial Evaluation• Moved to DC/OS exclusively

Environments too different: too much work needed to be adapted for things to work correctly.

• Single orchestration taskDispatched events, controlled when to start and stop the evaluation and performed log aggregation.

60

Page 93: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Initial Evaluation• Moved to DC/OS exclusively

Environments too different: too much work needed to be adapted for things to work correctly.

• Single orchestration taskDispatched events, controlled when to start and stop the evaluation and performed log aggregation.

• BottleneckEvents immediately dispatched: would require blocking for processing acknowledgment.

60

Page 94: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Initial Evaluation• Moved to DC/OS exclusively

Environments too different: too much work needed to be adapted for things to work correctly.

• Single orchestration taskDispatched events, controlled when to start and stop the evaluation and performed log aggregation.

• BottleneckEvents immediately dispatched: would require blocking for processing acknowledgment.

• UnrealisticEvents do not queue up all at once for processing by the client.

60

Page 95: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Lasp Difficulties• Too expensive

2.0 CPU and 2048 MiB of memory.

61

Page 96: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Lasp Difficulties• Too expensive

2.0 CPU and 2048 MiB of memory.

• Weeks spent adding instrumentationProcess level, VM level, Erlang Observer instrumentation to identify heavy CPU and memory processes.

61

Page 97: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Lasp Difficulties• Too expensive

2.0 CPU and 2048 MiB of memory.

• Weeks spent adding instrumentationProcess level, VM level, Erlang Observer instrumentation to identify heavy CPU and memory processes.

• Dissemination too expensive1000 threads to a single dissemination process (one Mesos task) leads to backed up message queues and memory leaks.

61

Page 98: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Lasp Difficulties• Too expensive

2.0 CPU and 2048 MiB of memory.

• Weeks spent adding instrumentationProcess level, VM level, Erlang Observer instrumentation to identify heavy CPU and memory processes.

• Dissemination too expensive1000 threads to a single dissemination process (one Mesos task) leads to backed up message queues and memory leaks.

• UnrealisticTwo different dissemination mechanisms: thread to thread and node to node: one is synthetic.

61

Page 99: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

EPMD Difficulties• Nodes become unregistered

Nodes randomly unregistered with EPMD during execution.

62

Page 100: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

EPMD Difficulties• Nodes become unregistered

Nodes randomly unregistered with EPMD during execution.

• Lost connectionEPMD loses connections with nodes for some arbitrary reason.

62

Page 101: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

EPMD Difficulties• Nodes become unregistered

Nodes randomly unregistered with EPMD during execution.

• Lost connectionEPMD loses connections with nodes for some arbitrary reason.

• EPMD task restarted by MesosRestarted for an unknown reason, which leads Lasp instances to restart in their own container.

62

Page 102: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Overhead Difficulties• Too much state

Client would ship around 5 GiB of state within 90 seconds.

63

Page 103: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Overhead Difficulties• Too much state

Client would ship around 5 GiB of state within 90 seconds.

• Delta disseminationDelta dissemination only provides around a 30% decrease in state transmission.

63

Page 104: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Overhead Difficulties• Too much state

Client would ship around 5 GiB of state within 90 seconds.

• Delta disseminationDelta dissemination only provides around a 30% decrease in state transmission.

• Unbounded queuesMessage buffers would lead to VMs crashing because of large memory consumption.

63

Page 105: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Evaluation Rearchitecture

64

Page 106: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Ditch Distributed Erlang• Pluggable membership service

Build pluggable membership service with abstract interface initially on EPMD and later migrate after tested.

65

Page 107: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Ditch Distributed Erlang• Pluggable membership service

Build pluggable membership service with abstract interface initially on EPMD and later migrate after tested.

• Adapt Lasp and Broadcast layerIntegrate pluggable membership service throughout the stack and librate existing libraries from distributed Erlang.

65

Page 108: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Ditch Distributed Erlang• Pluggable membership service

Build pluggable membership service with abstract interface initially on EPMD and later migrate after tested.

• Adapt Lasp and Broadcast layerIntegrate pluggable membership service throughout the stack and librate existing libraries from distributed Erlang.

• Build service discovery mechanismMechanize node discovery outside of EPMD based on new membership service.

65

Page 109: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (Membership Layer)

• Pluggable protocol membership layerAllow runtime configuration of protocols used for cluster membership.

66

Page 110: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (Membership Layer)

• Pluggable protocol membership layerAllow runtime configuration of protocols used for cluster membership.

• Several protocol implementations:

66

Page 111: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (Membership Layer)

• Pluggable protocol membership layerAllow runtime configuration of protocols used for cluster membership.

• Several protocol implementations:

• Full membership via EPMD.

66

Page 112: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (Membership Layer)

• Pluggable protocol membership layerAllow runtime configuration of protocols used for cluster membership.

• Several protocol implementations:

• Full membership via EPMD.

• Full membership via TCP.

66

Page 113: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (Membership Layer)

• Pluggable protocol membership layerAllow runtime configuration of protocols used for cluster membership.

• Several protocol implementations:

• Full membership via EPMD.

• Full membership via TCP.

• Client-server membership via TCP.

66

Page 114: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (Membership Layer)

• Pluggable protocol membership layerAllow runtime configuration of protocols used for cluster membership.

• Several protocol implementations:

• Full membership via EPMD.

• Full membership via TCP.

• Client-server membership via TCP.

• Peer-to-peer membership via TCP (with HyParView)

66

Page 115: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (Membership Layer)

• Pluggable protocol membership layerAllow runtime configuration of protocols used for cluster membership.

• Several protocol implementations:

• Full membership via EPMD.

• Full membership via TCP.

• Client-server membership via TCP.

• Peer-to-peer membership via TCP (with HyParView)

• VisualizationProvide a force-directed graph-based visualization engine for cluster debugging in real-time.

66

Page 116: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (Full via EPMD or TCP)

• Full membershipNodes have full visibility into the entire graph.

67

Page 117: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (Full via EPMD or TCP)

• Full membershipNodes have full visibility into the entire graph.

• Failure detectionPerformed by peer-to-peer heartbeat messages with a timeout.

67

Page 118: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (Full via EPMD or TCP)

• Full membershipNodes have full visibility into the entire graph.

• Failure detectionPerformed by peer-to-peer heartbeat messages with a timeout.

• Limited scalabilityHeartbeat interval increases when node count increases leading to false or delayed detection.

67

Page 119: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (Full via EPMD or TCP)

• Full membershipNodes have full visibility into the entire graph.

• Failure detectionPerformed by peer-to-peer heartbeat messages with a timeout.

• Limited scalabilityHeartbeat interval increases when node count increases leading to false or delayed detection.

• TestingUsed to create the initial test suite for Partisan.

67

Page 120: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (Client-Server Model)

• Client-server membershipServer has all peers in the system as peers; client has only the server as a peer.

68

Page 121: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (Client-Server Model)

• Client-server membershipServer has all peers in the system as peers; client has only the server as a peer.

• Failure detectionNodes heartbeat with timeout all peers they are aware of.

68

Page 122: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (Client-Server Model)

• Client-server membershipServer has all peers in the system as peers; client has only the server as a peer.

• Failure detectionNodes heartbeat with timeout all peers they are aware of.

• Limited scalabilitySingle point of failure: server; with limited scalability on visibility.

68

Page 123: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (Client-Server Model)

• Client-server membershipServer has all peers in the system as peers; client has only the server as a peer.

• Failure detectionNodes heartbeat with timeout all peers they are aware of.

• Limited scalabilitySingle point of failure: server; with limited scalability on visibility.

• TestingUsed for baseline evaluations as “reference” architecture.

68

Page 124: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (HyParView, default)

• Partial view protocolTwo views: active (fixed) and passive (log n); passive used for failure replacement with active view.

69

Page 125: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (HyParView, default)

• Partial view protocolTwo views: active (fixed) and passive (log n); passive used for failure replacement with active view.

• Failure detectionPerformed by monitoring active TCP connections to peers with keep-alive enabled.

69

Page 126: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Partisan (HyParView, default)

• Partial view protocolTwo views: active (fixed) and passive (log n); passive used for failure replacement with active view.

• Failure detectionPerformed by monitoring active TCP connections to peers with keep-alive enabled.

• Very scalable (10k+ nodes during academic evaluation)However, probabilistic; potentially leads to isolated nodes during churn.

69

Page 127: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Sprinter (Service Discovery)

• Responsible for clustering tasksUses Partisan to cluster all nodes and ensure connected overlay network: reads information from Marathon.

70

Page 128: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Sprinter (Service Discovery)

• Responsible for clustering tasksUses Partisan to cluster all nodes and ensure connected overlay network: reads information from Marathon.

• Node localOperates at each node and is responsible for taking actions to ensure connected graph: required for probabilistic protocols.

70

Page 129: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Sprinter (Service Discovery)

• Responsible for clustering tasksUses Partisan to cluster all nodes and ensure connected overlay network: reads information from Marathon.

• Node localOperates at each node and is responsible for taking actions to ensure connected graph: required for probabilistic protocols.

• Membership mode specificKnows, based on the membership mode, how to properly cluster nodes and enforces proper join behaviour.

70

Page 130: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Debugging Sprinter• S3 archival

Nodes periodically snapshot their membership view for analysis.

71

Page 131: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Debugging Sprinter• S3 archival

Nodes periodically snapshot their membership view for analysis.

• Elected node (or group) analyses Periodically analyses the information in S3 for the following:

71

Page 132: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Debugging Sprinter• S3 archival

Nodes periodically snapshot their membership view for analysis.

• Elected node (or group) analyses Periodically analyses the information in S3 for the following:

• Isolated node detection Identifies isolated nodes and takes corrective measures to repair the overlay.

71

Page 133: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Debugging Sprinter• S3 archival

Nodes periodically snapshot their membership view for analysis.

• Elected node (or group) analyses Periodically analyses the information in S3 for the following:

• Isolated node detection Identifies isolated nodes and takes corrective measures to repair the overlay.

• Verifies symmetric relationship Ensures that if a node knows about another node, the relationship is symmetric: prevents I know you, but you don’t know me.

71

Page 134: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Debugging Sprinter• S3 archival

Nodes periodically snapshot their membership view for analysis.

• Elected node (or group) analyses Periodically analyses the information in S3 for the following:

• Isolated node detection Identifies isolated nodes and takes corrective measures to repair the overlay.

• Verifies symmetric relationship Ensures that if a node knows about another node, the relationship is symmetric: prevents I know you, but you don’t know me.

• Periodic alertingAlerts regarding disconnected graphs so external measures can be taken, if necessary.

71

Page 135: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Evaluation Next Evaluation

72

Page 136: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Evaluation Strategy• Deployment and runtime configuration

Ability to deploy a cluster of node and configure simulations at runtime.

73

Page 137: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Evaluation Strategy• Deployment and runtime configuration

Ability to deploy a cluster of node and configure simulations at runtime.

• Each simulation:

73

Page 138: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Evaluation Strategy• Deployment and runtime configuration

Ability to deploy a cluster of node and configure simulations at runtime.

• Each simulation:

• Different application scenario Uniquely execute a different application scenario at runtime based on runtime configuration.

73

Page 139: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Evaluation Strategy• Deployment and runtime configuration

Ability to deploy a cluster of node and configure simulations at runtime.

• Each simulation:

• Different application scenario Uniquely execute a different application scenario at runtime based on runtime configuration.

• Result aggregation Aggregate results at end of execution and archive these results.

73

Page 140: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Evaluation Strategy• Deployment and runtime configuration

Ability to deploy a cluster of node and configure simulations at runtime.

• Each simulation:

• Different application scenario Uniquely execute a different application scenario at runtime based on runtime configuration.

• Result aggregation Aggregate results at end of execution and archive these results.

• Plot generationAutomatically generate plots for the execution and aggregate the results of multiple executions.

73

Page 141: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Evaluation Strategy• Deployment and runtime configuration

Ability to deploy a cluster of node and configure simulations at runtime.

• Each simulation:

• Different application scenario Uniquely execute a different application scenario at runtime based on runtime configuration.

• Result aggregation Aggregate results at end of execution and archive these results.

• Plot generationAutomatically generate plots for the execution and aggregate the results of multiple executions.

• Minimal coordination Work must be performed with minimal coordination, as a single orchestrator is a scalability bottleneck for large applications.

73

Page 142: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Completion Detection• “Convergence Structure”

Uninstrumented CRDT of grow-only sets containing counters that each node manipulates.

74

Page 143: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Completion Detection• “Convergence Structure”

Uninstrumented CRDT of grow-only sets containing counters that each node manipulates.

• Simulates a workflowNodes use this operation to simulate a lock-stop workflow for the experiment.

74

Page 144: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Completion Detection• “Convergence Structure”

Uninstrumented CRDT of grow-only sets containing counters that each node manipulates.

• Simulates a workflowNodes use this operation to simulate a lock-stop workflow for the experiment.

• Event GenerationEvent generation toggles a boolean for the node to show completion.

74

Page 145: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Completion Detection• “Convergence Structure”

Uninstrumented CRDT of grow-only sets containing counters that each node manipulates.

• Simulates a workflowNodes use this operation to simulate a lock-stop workflow for the experiment.

• Event GenerationEvent generation toggles a boolean for the node to show completion.

• Log Aggregation Completion triggers log aggregation.

74

Page 146: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Completion Detection• “Convergence Structure”

Uninstrumented CRDT of grow-only sets containing counters that each node manipulates.

• Simulates a workflowNodes use this operation to simulate a lock-stop workflow for the experiment.

• Event GenerationEvent generation toggles a boolean for the node to show completion.

• Log Aggregation Completion triggers log aggregation.

• ShutdownUpon log aggregation completion, nodes shutdown.

74

Page 147: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Completion Detection• “Convergence Structure”

Uninstrumented CRDT of grow-only sets containing counters that each node manipulates.

• Simulates a workflowNodes use this operation to simulate a lock-stop workflow for the experiment.

• Event GenerationEvent generation toggles a boolean for the node to show completion.

• Log Aggregation Completion triggers log aggregation.

• ShutdownUpon log aggregation completion, nodes shutdown.

• External monitoringWhen events complete execution, nodes automatically begin the next experiment.

74

Page 148: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Results Next Evaluation

75

Page 149: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Results Lasp• Single node orchestration: bad

Not possible once you exceed a few nodes: message queues, memory, delays.

76

Page 150: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Results Lasp• Single node orchestration: bad

Not possible once you exceed a few nodes: message queues, memory, delays.

• Partial ViewsRequired: rely on transitive dissemination of information and partial network knowledge.

76

Page 151: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Results Lasp• Single node orchestration: bad

Not possible once you exceed a few nodes: message queues, memory, delays.

• Partial ViewsRequired: rely on transitive dissemination of information and partial network knowledge.

• ResultsReduced Lasp memory footprint to 75MB; larger in practice for debugging.

76

Page 152: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Results Partisan• Fast churn isolates nodes

Need a repair mechanism: random promotion of isolated nodes; mainly issues of symmetry.

77

Page 153: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Results Partisan• Fast churn isolates nodes

Need a repair mechanism: random promotion of isolated nodes; mainly issues of symmetry.

• FIFO across connectionsNot per connection, but protocol assumes across all connections leading to false disconnects.

77

Page 154: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Results Partisan• Fast churn isolates nodes

Need a repair mechanism: random promotion of isolated nodes; mainly issues of symmetry.

• FIFO across connectionsNot per connection, but protocol assumes across all connections leading to false disconnects.

• Unrealistic system modelYou need per message acknowledgements for safety.

77

Page 155: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Results Partisan• Fast churn isolates nodes

Need a repair mechanism: random promotion of isolated nodes; mainly issues of symmetry.

• FIFO across connectionsNot per connection, but protocol assumes across all connections leading to false disconnects.

• Unrealistic system modelYou need per message acknowledgements for safety.

• Pluggable protocol helps debuggingBeing able to switch to full membership or client-server assists in debugging protocol vs. application problems.

77

Page 156: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Latest Results• Reproducibility at 300 nodes for full applications

Connectivity, but transient partitions and isolated nodes at 500 - 1000 nodes (across 140 instances.)

78

Page 157: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Latest Results• Reproducibility at 300 nodes for full applications

Connectivity, but transient partitions and isolated nodes at 500 - 1000 nodes (across 140 instances.)

• Limited financially and by AmazonHarder to run larger evaluations because we’re limited financially (as a university) and because of Amazon limits.

78

Page 158: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Latest Results• Reproducibility at 300 nodes for full applications

Connectivity, but transient partitions and isolated nodes at 500 - 1000 nodes (across 140 instances.)

• Limited financially and by AmazonHarder to run larger evaluations because we’re limited financially (as a university) and because of Amazon limits.

• Mean state reduction per clientAround 100x improvement from our PaPoC 2016 initial evaluation results.

78

Page 159: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Plat à emporter• Visualizations are important!

Graph performance, visualize your cluster: all of these things lead to easier debugging.

79

Page 160: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Plat à emporter• Visualizations are important!

Graph performance, visualize your cluster: all of these things lead to easier debugging.

• Control changesNo Lasp PR accepted without divergence, state transmission, and overhead graphs.

79

Page 161: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Plat à emporter• Visualizations are important!

Graph performance, visualize your cluster: all of these things lead to easier debugging.

• Control changesNo Lasp PR accepted without divergence, state transmission, and overhead graphs.

• AutomationDevelopers use graphs when they are easy to make: lower the difficulty for generation and understand how changes alter system behaviour.

79

Page 162: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

Plat à emporter• Visualizations are important!

Graph performance, visualize your cluster: all of these things lead to easier debugging.

• Control changesNo Lasp PR accepted without divergence, state transmission, and overhead graphs.

• AutomationDevelopers use graphs when they are easy to make: lower the difficulty for generation and understand how changes alter system behaviour.

• Make work easily testableWhen you test locally and deploy globally, you need to make things easy to test, deploy and evaluate (for good science, I say!)

79

Page 163: Designing and Evaluating a Distributed Computing Language ... · Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn ... 26 %% Create initial ... Selective

80

Christopher Meiklejohn@cmeik http://www.lasp-lang.org http://github.com/lasp-lang

Thanks!