Top Banner
Budapest University of Technology and Economics Department of Measurement and Information Systems Budapest University of Technology and Economics Fault Tolerant Systems Research Group Sharded Joins for Scalable Incremental Graph Queries János Maginecz, Gábor Szárnyas
77

Sharded Joins for Scalable Incremental Graph Queries

Apr 13, 2017

Download

Engineering

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sharded Joins for Scalable Incremental Graph Queries

Budapest University of Technology and EconomicsDepartment of Measurement and Information Systems

Budapest University of Technology and EconomicsFault Tolerant Systems Research Group

Sharded Joins for Scalable Incremental Graph Queries

János Maginecz, Gábor Szárnyas

Page 2: Sharded Joins for Scalable Incremental Graph Queries

Agile Model-Driven Development

Modeling

Page 3: Sharded Joins for Scalable Incremental Graph Queries

Agile Model-Driven Development

Modeling

Early validationsTransformations

Page 4: Sharded Joins for Scalable Incremental Graph Queries

Agile Model-Driven Development

Modeling

Codegeneration

Early validationsTransformations

Page 5: Sharded Joins for Scalable Incremental Graph Queries

Agile Model-Driven Development

Modeling

Codegeneration

Testing

Early validationsTransformations

Page 6: Sharded Joins for Scalable Incremental Graph Queries

Agile Model-Driven Development

Modeling

Codegeneration

Testing

Early validationsTransformations

Page 7: Sharded Joins for Scalable Incremental Graph Queries

Agile Model-Driven Development

Modeling

Codegeneration

Testing

Early validationsTransformations

Scalabilitychallenges

Page 8: Sharded Joins for Scalable Incremental Graph Queries

Performance issues

Agile Model-Driven Development

Modeling

Codegeneration

Testing

Early validationsTransformations

Scalabilitychallenges

Page 9: Sharded Joins for Scalable Incremental Graph Queries

Performance issues

Agile Model-Driven Development

Modeling

Codegeneration

Testing

Early validationsTransformations

Scalabilitychallenges

Page 10: Sharded Joins for Scalable Incremental Graph Queries

Scalability

Incrementality

Page 11: Sharded Joins for Scalable Incremental Graph Queries

Scalability

Incrementality

Storing partialresults

Page 12: Sharded Joins for Scalable Incremental Graph Queries

Scalability

Incrementality

Storing partialresults

Trackingchanges

Page 13: Sharded Joins for Scalable Incremental Graph Queries

MDD

Scalability

Incrementality

Storing partialresults

Trackingchanges

Page 14: Sharded Joins for Scalable Incremental Graph Queries

MDD

Scalability

Incrementality

Incremental queries

Storing partialresults

Trackingchanges

Page 15: Sharded Joins for Scalable Incremental Graph Queries

MDD

Scalability

Incrementality

Incremental queries

Incremental transformation

Storing partialresults

Trackingchanges

Page 16: Sharded Joins for Scalable Incremental Graph Queries

MDD

Scalability

Incrementality

Incremental queries

Storing partialresults

Trackingchanges

Page 17: Sharded Joins for Scalable Incremental Graph Queries

Motivating Example

Error pattern for an AUTOSAR validation constraint

Communicationchannel

Logical signal Mapping Physical signal

Page 18: Sharded Joins for Scalable Incremental Graph Queries

Motivating Example

Error pattern for an AUTOSAR validation constraint

Communicationchannel

Logical signal Mapping Physical signal

Validation

Page 19: Sharded Joins for Scalable Incremental Graph Queries

Motivating Example

Error pattern for an AUTOSAR validation constraint

Communicationchannel

Logical signal Mapping Physical signal

Invalid submodel

Validation

Page 20: Sharded Joins for Scalable Incremental Graph Queries

Motivating Example

Error pattern for an AUTOSAR validation constraint

Communicationchannel

Logical signal Mapping Physical signal

Invalid submodel

Validation

Page 21: Sharded Joins for Scalable Incremental Graph Queries

Motivating Example

Error pattern for an AUTOSAR validation constraint

Communicationchannel

Logical signal Mapping Physical signal

Invalid submodel

Validation

Valid submodel

Page 22: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 23: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 24: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 25: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodes

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 26: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodes

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 27: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodes

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 28: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodes

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 29: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodes

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 30: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodesStore interim results

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 31: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodesStore interim results

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 32: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodesStore interim results

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 33: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodesStore interim results

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 34: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodesStore interim results

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 35: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodesStore interim results

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 36: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodesStore interim results

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 37: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodesStore interim resultsRead result set

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Result set

Page 38: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodesStore interim resultsRead result set

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 39: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodesStore interim resultsRead result setEdit model

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 40: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodesStore interim resultsRead result setEdit modelPropagating changes

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 41: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodesStore interim resultsRead result setEdit modelPropagating changes

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 42: Sharded Joins for Scalable Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodesStore interim resultsRead result setEdit modelPropagating changesRead result set

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Result set

Page 43: Sharded Joins for Scalable Incremental Graph Queries

Single Workstation Rete Implementation

Rete-based incremental graph query engine

Open-source Eclipse project

Java Virtual Machine limitations

o Cannot handle 15+ GB heap memory efficiently

Proposed solution

o Horizontal scaling: distributed system

Page 44: Sharded Joins for Scalable Incremental Graph Queries

IncQuery-D Architecture

Transaction

In-memory EMF model

Rete net

Indexer layer

EMF-INCQUERY

Page 45: Sharded Joins for Scalable Incremental Graph Queries

IncQuery-D Architecture

Transaction

In-memory EMF model

Rete net

Indexer layer

EMF-INCQUERY

In-memory storage

Page 46: Sharded Joins for Scalable Incremental Graph Queries

IncQuery-D Architecture

Transaction

In-memory EMF model

Rete net

Indexer layer

EMF-INCQUERY

Indexing

In-memory storage

Page 47: Sharded Joins for Scalable Incremental Graph Queries

IncQuery-D Architecture

Transaction

In-memory EMF model

Rete net

Indexer layer

EMF-INCQUERY

Indexing

In-memory storage

Production network• Stores intermediate query results• Propagates changes

Page 48: Sharded Joins for Scalable Incremental Graph Queries

IncQuery-D Architecture

Transaction

In-memory EMF model

Rete net

Indexer layer

EMF-INCQUERY

Page 49: Sharded Joins for Scalable Incremental Graph Queries

IncQuery-D Architecture

Server 1

Databaseshard 1

Server 2

Databaseshard 2

Server 3

Databaseshard 3

Transaction

Databaseshard 0

Server 0

Rete net

Indexer layer

INCQUERY-D

Page 50: Sharded Joins for Scalable Incremental Graph Queries

IncQuery-D Architecture

Server 1

Databaseshard 1

Server 2

Databaseshard 2

Server 3

Databaseshard 3

Transaction

Databaseshard 0

Server 0

Rete net

INCQUERY-D

Page 51: Sharded Joins for Scalable Incremental Graph Queries

IncQuery-D Architecture

Server 1

Databaseshard 1

Server 2

Databaseshard 2

Server 3

Databaseshard 3

Transaction

Databaseshard 0

Server 0

Rete net

INCQUERY-D

Distributed indexer Model access adapter

Page 52: Sharded Joins for Scalable Incremental Graph Queries

IncQuery-D Architecture

Server 1

Databaseshard 1

Server 2

Databaseshard 2

Server 3

Databaseshard 3

Transaction

Databaseshard 0

Server 0

INCQUERY-D

Distributed query evaluation network

Distributed indexer Model access adapter

Page 53: Sharded Joins for Scalable Incremental Graph Queries

IncQuery-D Architecture

Server 1

Databaseshard 1

Server 2

Databaseshard 2

Server 3

Databaseshard 3

Transaction

Databaseshard 0

Server 0

INCQUERY-D

Distributed query evaluation network

Distributed indexer Model access adapter

Distributed persistent storage

Page 54: Sharded Joins for Scalable Incremental Graph Queries

IncQuery-D Architecture

Server 1

Databaseshard 1

Server 2

Databaseshard 2

Server 3

Databaseshard 3

Transaction

Databaseshard 0

Server 0

INCQUERY-D

Distributed query evaluation network

Distributed indexer Model access adapter

Distributed indexing, notification

Distributed persistent storage

Page 55: Sharded Joins for Scalable Incremental Graph Queries

IncQuery-D Architecture

Server 1

Databaseshard 1

Server 2

Databaseshard 2

Server 3

Databaseshard 3

Transaction

Databaseshard 0

Server 0

INCQUERY-D

Distributed query evaluation network

Distributed indexer Model access adapter

Distributed indexing, notification

Distributed persistent storage

Distributed production network• Each intermediate node can be allocated

to a different host• Remote internode communication

Page 56: Sharded Joins for Scalable Incremental Graph Queries

IncQuery-D Architecture

Server 1

Databaseshard 1

Server 2

Databaseshard 2

Server 3

Databaseshard 3

Transaction

Databaseshard 0

Server 0

INCQUERY-D

Distributed query evaluation network

Distributed indexer Model access adapter

Page 57: Sharded Joins for Scalable Incremental Graph Queries

IncQuery-D Architecture

Server 1

Databaseshard 1

Server 2

Databaseshard 2

Server 3

Databaseshard 3

Transaction

Databaseshard 0

Server 0

INCQUERY-D

Distributed query evaluation network

Indexer Indexer Indexer Indexer

Join

Join

Antijoin

Page 58: Sharded Joins for Scalable Incremental Graph Queries

Working around Memory Limits

Host-2

Host-1

Input

Node A

Node B

Distributed

Output

Host-1

Input

Node A

Node B

Local

Output

Solution 1

Simple and efficientMemory of the machine is an upper bound for the network

Nodes run on different computersThe memory of each node is limited to the assigned machine

+

+

Page 59: Sharded Joins for Scalable Incremental Graph Queries

Working around Memory Limits

Host-2

Host-1

Input

Node A

Node B

Distributed

Output

Host-1

Input

Node A

Node B

Local

Output

Solution 1

EMF-IncQuery IncQuery-D

Simple and efficientMemory of the machine is an upper bound for the network

Nodes run on different computersThe memory of each node is limited to the assigned machine

+

+

Page 60: Sharded Joins for Scalable Incremental Graph Queries

Host-3Host-1

Host-2

Working around Memory LimitsDistributed

+Sharded

Input

Node A

Node B

Output

Solution 2

Host-2

Host-1

Input

Node A

Node B

Distributed

Output

Nodes may be allocated on more than 1 computerNetwork overhead

+

Nodes run on different computersThe memory of each node is limited to the assigned machine

+

Page 61: Sharded Joins for Scalable Incremental Graph Queries

Host-3Host-1

Host-2

Working around Memory LimitsDistributed

+Sharded

Input

Node A

Node B

Output

Solution 2

IncQuery-DS

Host-2

Host-1

Input

Node A

Node B

Distributed

Output

IncQuery-D

Nodes may be allocated on more than 1 computerNetwork overhead

+

Nodes run on different computersThe memory of each node is limited to the assigned machine

+

Page 62: Sharded Joins for Scalable Incremental Graph Queries

Í

Join

Antijoin

Join / Shard 2Join / Shard 1

Sharded Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 63: Sharded Joins for Scalable Incremental Graph Queries

Í

Join

Antijoin

Join / Shard 2Join / Shard 1

Sharded Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 64: Sharded Joins for Scalable Incremental Graph Queries

Í

Join

Antijoin

Join / Shard 2Join / Shard 1

Sharded Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 65: Sharded Joins for Scalable Incremental Graph Queries

Í

Join

Antijoin

Join / Shard 2Join / Shard 1

Sharded Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 66: Sharded Joins for Scalable Incremental Graph Queries

Í

Join

Antijoin

Join / Shard 2Join / Shard 1

Sharded Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Page 67: Sharded Joins for Scalable Incremental Graph Queries

Í

Join

Antijoin

Join / Shard 2Join / Shard 1

Sharded Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

IncQuery-DSDistributed and Sharded

Page 68: Sharded Joins for Scalable Incremental Graph Queries

Validation of Critical Systems

Model validation for large models

Well-formedness contraints with complex graph patterns

Train Benchmark

o Open-source performance measurement framework

o Presented yesterday

Page 69: Sharded Joins for Scalable Incremental Graph Queries

Train Benchmark

Phases

o Initial read and validation

o Small changes and revalidation

• Simulating modifications from a user

Goal: Measure response times

Execution timeExecution time

Read Transformation RevalidationValidation

× 10× 3

Page 70: Sharded Joins for Scalable Incremental Graph Queries

Sharding Results

Page 71: Sharded Joins for Scalable Incremental Graph Queries

Sharding Results

Page 72: Sharded Joins for Scalable Incremental Graph Queries

Sharding Results

Page 73: Sharded Joins for Scalable Incremental Graph Queries

Join Optimization

Hash join

o Using hash maps

Sort merge join

o Using red-black trees

Collection frameworks

o Standard library in Scala

o Goldman Sachs Collections

Page 74: Sharded Joins for Scalable Incremental Graph Queries

Join Optimization Results – Execution Time

Page 75: Sharded Joins for Scalable Incremental Graph Queries

Join Optimization Results – Execution Time

Does not scalewell for updates

Page 76: Sharded Joins for Scalable Incremental Graph Queries

Summary Designed a sharded Rete engine

Evaluated its scalability

Analysis of join algorithms and collection frameworks

Future work

o Domains with similar challenges

Page 77: Sharded Joins for Scalable Incremental Graph Queries

Ω