Top Banner
Big Graph Analytics Engine Yinglong Xia 6/23/2016 8th Linked Data Benchmark Council TUC Meeting@Oracle Conference Center
19

8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

Feb 11, 2017

Download

Technology

LDBC council
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

Big Graph Analytics Engine

Yinglong Xia6/23/2016

8th Linked Data Benchmark Council TUC Meeting@Oracle Conference Center

Page 2: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

2

Introduction

Page 3: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

3

Introduction

http://www.huawei.com/en/about-huawei/corporate-governance/corporate-governance

Page 4: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

4

Recent Growth

Revenue

Net Profits

Cash flow from operating activities

http://www.huawei.com/en/about-huawei

Page 5: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

5

Collaboration

Industrial Partners

Universities

Standards

Technical organizations

Global Research Institute & Labs

Open Source

Page 6: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

6

Graph Analytics for Smart Big Data

Big Data Analytics & Management

Graph Machine Learning

NLP Deep Learning

Page 7: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

7

Graph in ONOS

HotSDN’2014

Page 8: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

8

Topology Impact on Information Propagation

Page 9: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

9

Explore the Variety in Graph Analytics

Graph

Page 10: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

10

Challenges● Very large scale graphs for analysis

• 10B~1000B in terms of the number of vertices• a few hundreds of properties, static and dynamic• distributed communication introduces additional overhead

● Irregularity in graph data access • Low data locality results in high disk/communication IO overhead• Data access patterns are diverse among graph analysis algorithms

● Near real-time requirement• Incorporate with incremental graph updates• Approximate query & analysis should be considered

● Efficiency and productivity to balance

Page 11: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

11

Graph Platform for Smart Big Data

Infrastructure

Data Management

Graph engines

Visualization

Analytics

Single Machine Cluster GPU Server Cloud

Structure Management

PropertyManagement

Metadata Management

Permission Control

Basic Engine

Streaming Graph Graphical Model Hyper Graph

Bayes NetCommunity

Label propagationCentrality

Anomaly detection

Matching

Ego Feature

Max Flow

Dynamic Graph Vis Property Vis Large Graph Vis

Incremental Update

Page 12: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

bi-temp query

E→Edge Prop

12

Graph Platform

Data Source Graph Topology and Property

V→AdjacencyKC/KV Store

V→Vertex PropProp Idx

Encoding

External(Solr/CLucene)

ConcurrencyControl

Main Storage Dynamic Graph

Onl

ine

quer

y/m

odifi

catio

n

Property indices

Ingestion

V→TimeStamp→Adjacency

Streaming graph storage

V→TimeStamp→Vprop

V→TimeStamp→EpropSlid

ing

Win

dow KC/KV Store

CSRSparse subgraphs

Densse SubgraphDense subgraphs

GPU

Offload

Direct Solver

Iterative Solver

Snapshots Double buffering Batch processing

StreamingGraph

TripleStore

Streaming algorithms

Graph Inference

Inference Tools (Virtuoso, Jena, etc.)

Knowledge Graph

Online update property graph

Periodically updated static graph snapshots

Probabilistic Graphical Model & InferenceOffline Batch Processing

online/offline analysis

MVCC

KV Store

Snap

shot

Man

agem

ent

Page 13: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

13

Unified Graph Data Access Patterns1

2

3

4

5

6

1 2 3 4 5 60.3

0.2

1.4

0.5 0.6

0.8

0.4

0.3

0.8

0.2

1.9

0.6

0.9 1.20.3

1.1equivalent

src dst value 1 2 0.33 2 0.24 1 1.45 1 0.5 2 0.6 6 2 0.8

src dst value 1 3 0.42 3 0.33 4 0.85 3 0.26 4 1.9

src dst value 2 5 0.63 5 0.9 6 1.24 5 0.35 6 1.1

shard 1 (1, 2) shard 2 (3,4) shard 3 (5,6)

src dst value 1 2 0.33 2 0.24 1 1.45 1 0.5 2 0.6 6 2 0.8

src dst value 1 3 0.42 3 0.33 4 0.85 3 0.26 4 1.9

src dst value 2 5 0.63 5 0.9 6 1.24 5 0.35 6 1.1

src dst value 1 2 0.33 2 0.24 1 1.45 1 0.5 2 0.6 6 2 0.8

src dst value 1 3 0.42 3 0.33 4 0.85 3 0.26 4 1.9

src dst value 2 5 0.63 5 0.9 6 1.24 5 0.35 6 1.1

1

2

3

4

5

6

0.3

0.2

1.4

0.5 0.6

0.8

0.4

0.3

0.8

0.2

1.9

0.6

0.9 1.2

0.3

1.1

1

2

3

4

5

6

0.3

0.2

1.4

0.5 0.6

0.8

0.4

0.3

0.8

0.2

1.9

0.6

0.9 1.2

0.3

1.1

step

1st

ep 2

step

3

obse

rvat

ion

on P

SW d

ata

acce

ss

patte

rns

insp

ires

high

ly e

ffici

ent

shar

ding

repr

esen

tatio

n

Itera

tion i

Page 14: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

14

Construct Edge-set Flows1

2

3

4

5

6

1 2 3 4 5 60.3

0.2

1.4

0.5 0.6

0.8

0.4

0.3

0.8

0.2

1.9

0.6

0.9 1.2

0.3

1.1

3

5

1

2

4

6

1 2 3 4 5 60.2

0.5 0.6

0.8

0.2

0.9 1.2

1.1

0.3 0.4

0.3 0.6

1.4 0.3

0.8 1.9

3

5

1

2

4

6

1 2 3 4 5 60.2

0.5 0.6

0.8

0.2

0.9 1.2

1.1

0.3 0.4

0.3 0.6

1.4 0.3

0.8 1.9

1 4 7 1 2 3 2 5 8 4 5 6

row permutation column permutation Physical edge-sets

1 2 34 5 67 8 9

Flow direction

Page 15: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

15

Preliminary Experiments - Preproc.

Graph Ingestion/Preprocessing Time

Create the data in our format

Page 16: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

16

Preliminary Experiments - Comp.

PageRank w/o Loading Time

Decent speedup achieved w/ or w/o loading time

Page 17: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

17

Preliminary Experiments

PageRank Total Time

Page 18: 8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

18

Conclusion● Many big data problems involve links among a lot of entities,

naturally represented as a graph● Property graph is highly expressive● Industry is looking for graph/graphical model engines for complex

network analysis, streaming graph, probabilistic graphical models, and RDF graph computing

● Efficiency is the key in many industry graph analysis systems, especially when the data volume is big

● Eventually, the graph engine should serve for AI Business systems