Top Banner
Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey
13

Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey.

Mar 31, 2015

Download

Documents

Layne Edds
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey.

Dynamic Data Partitioning for Distributed Graph Databases

Xavier Martínez PalauDavid Domínguez Sal

Josep Lluís Larriba Pey

Page 2: Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey.

Dyn

amic

Dat

a Pa

rtit

ioni

ng

2

Outline

IntroductionContributionsSystem OverviewExperiments

Page 3: Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey.

Dyn

amic

Dat

a Pa

rtit

ioni

ng

3

Outline

IntroductionContributionsSystem OverviewExperiments

Page 4: Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey.

Dyn

amic

Dat

a Pa

rtit

ioni

ng

4

Introduction: Databases

Database Software to store large amounts of data High performance

Several ways to store a graph Graph database Relational database RDF Key-value datastore …

Page 5: Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey.

Dyn

amic

Dat

a Pa

rtit

ioni

ng

5

Distributed Databases

Distributed databases store more data and improve throughput

Page 6: Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey.

Dyn

amic

Dat

a Pa

rtit

ioni

ng

6

Outline

IntroductionContributionsSystem OverviewExperiments

Page 7: Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey.

Dyn

amic

Dat

a Pa

rtit

ioni

ng

7

Contributions

System design in two levels Physical storage Memory management

Data access pattern monitoring Specific data structure

Load and network balancing Increased throughput

Page 8: Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey.

Dyn

amic

Dat

a Pa

rtit

ioni

ng

8

Outline

IntroductionContributionsSystem OverviewExperiments

Page 9: Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey.

Dyn

amic

Dat

a Pa

rtit

ioni

ng

9

System Overview

Memory managment Storage

Page 10: Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey.

Dyn

amic

Dat

a Pa

rtit

ioni

ng

10

Partition Manager

We propose a new data structure Monitors data access patterns Uses this information in a simple way to

decide how to route queries

Matrix of data access sequences New compressed data structure

Page 11: Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey.

Dyn

amic

Dat

a Pa

rtit

ioni

ng

11

Outline

IntroductionContributionsSystem OverviewExperiments

Page 12: Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey.

Dyn

amic

Dat

a Pa

rtit

ioni

ng

12

ExperimentsScalability with cluster size

Tested up to 32 machinesSystems compared

Static partitioning Dynamic partitioning (ours)

R-MAT graph 37M vertices 1B edges

Queries: BFS and k-hops

Page 13: Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey.

Dyn

amic

Dat

a Pa

rtit

ioni

ng

Experiments

Throughput (more better) Imbalance (less better)