Top Banner
Artem Chebotko Graph Data Modeling in DataStax Enterprise
42

Graph Data Modeling in DataStax Enterprise

Apr 15, 2017

Download

Documents

Artem Chebotko
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Graph Data Modeling in DataStax Enterprise

A  r  t  e  m        C  h  e  b  o  t  k  o

Graph  Data  Modeling  in  DataStax Enterprise

Page 2: Graph Data Modeling in DataStax Enterprise

1 DataStax Enterprise  Graph

2 Property  Graph  Data  Model

3 Data  Modeling  Framework

4 Schema  Optimizations

2©  DataStax,  All  Rights  Reserved.

Page 3: Graph Data Modeling in DataStax Enterprise

DSE  Graph

• Real-­time  Graph  DBMS• Very  large  graphs• Many  concurrent  users• Proven  technologies  and  standards• OLTP  and  OLAP  capabilities

©  DataStax,  All  Rights  Reserved. 3

Page 4: Graph Data Modeling in DataStax Enterprise

DSE  Graph  Design

©  DataStax,  All  Rights  Reserved. 4

Graph  ApplicationsDSE  Graph

Page 5: Graph Data Modeling in DataStax Enterprise

DSE  Graph

Property  Graph  and  GremlinDSE  schema  API

DSE  Graph  Design

©  DataStax,  All  Rights  Reserved. 5

Graph  Applications

Page 6: Graph Data Modeling in DataStax Enterprise

DSE  Graph

Property  Graph  and  GremlinDSE  schema  API

DSE  Graph  Design

©  DataStax,  All  Rights  Reserved. 6

Fully  integratedbackend  technologies

Graph  Applications

Page 7: Graph Data Modeling in DataStax Enterprise

Property  Graph  and  GremlinDSE  schema  API

DSE  Graph

DSE  Graph  Design

©  DataStax,  All  Rights  Reserved. 7

Schema,  data,  and  query  mappingsOLTP  and  OLAP  engines

Fully  integratedbackend  technologies

Graph  Applications

Page 8: Graph Data Modeling in DataStax Enterprise

DSE  Graph  Use  Cases

©  DataStax,  All  Rights  Reserved. 8

Customer  360

Internet  of  Things

Personalization

Recommendations

Fraud  detection

Page 9: Graph Data Modeling in DataStax Enterprise

1 DataStax Enterprise  Graph

2 Property  Graph  Data  Model

3 Data  Modeling  Framework

4 Schema  Optimizations

9©  DataStax,  All  Rights  Reserved.

Page 10: Graph Data Modeling in DataStax Enterprise

Property  Graph  Data  Model

• Instance• Defined  in  Apache  TinkerPop™• Vertices,  edges,  and  properties

• Schema• Defined  in  DataStax Enterprise• Vertex  labels,  edge  labels,  and  property  keys

©  DataStax,  All  Rights  Reserved. 10

Page 11: Graph Data Modeling in DataStax Enterprise

Vertices

©  DataStax,  All  Rights  Reserved. 11

movie

useruser

genremovie

person

Page 12: Graph Data Modeling in DataStax Enterprise

Edges

©  DataStax,  All  Rights  Reserved. 12

movie

userrated rated

user

knows

genrebelongsTo belongsTo

actor

movie

person

Page 13: Graph Data Modeling in DataStax Enterprise

Properties

©  DataStax,  All  Rights  Reserved. 13

movieId: m267title: Alice in Wonderlandyear: 2010duration: 108country: United States

rating: 6rating: 5genreId: g2name: Adventure

userId: u75age: 17gender: F

movieId: m16title: Alice in Wonderlandyear: 1951duration: 75country: United States

userId: u185age: 12gender: M

movie

userrated rated

user

knows

genrebelongsTo belongsTo

actor

movie

personId: p4361 name: Johnny Depp

person

Page 14: Graph Data Modeling in DataStax Enterprise

Multi-­ and  Meta-­Properties

©  DataStax,  All  Rights  Reserved. 14

movieId: m267title: Alice in Wonderlandyear: 2010duration: 108country: United Statesproduction: [Tim Burton Animation Co., Walt Disney Productions]budget: [$150M, $200M]

m267movie

source: Bloomberg Businessweekdate: March 5, 2010

source: Los Angeles Timesdate: March 7, 2010

Page 15: Graph Data Modeling in DataStax Enterprise

Graph  Schema

©  DataStax,  All  Rights  Reserved. 15

movieId :texttitle :textyear :intduration :intcountry :textproduction :text*

personId:textname :text

genreId :textname :text

userId :textage :intgender :text

rating :intgenrebelongsTomovieuser rated

person

cine

mat

ogra

pher

acto

r

dire

ctor

com

pose

rsc

reen

writ

er

knows

Page 16: Graph Data Modeling in DataStax Enterprise

Importance  of  Graph  Schema

• DSE  needs  a  graph  schema  to  generate  a  C*  schema• Vertex  labels                        → tables• Property  keys                    →  columns• Graph  indexes                  → materialized  views

secondary  indexes              search  indexes

• Additional  data  validation  benefits

©  DataStax,  All  Rights  Reserved. 16

Page 17: Graph Data Modeling in DataStax Enterprise

Schema  Mapping  Example

Property  TableCREATE TABLE user_p ( community_id int, member_id bigint, "~~property_key_id" int, "~~property_id" uuid, age int, gender text, "userId" text, "~~vertex_exists" boolean,

PRIMARY KEY (community_id, member_id, "~~property_key_id", "~~property_id"))

©  DataStax,  All  Rights  Reserved. 17

movieId :texttitle :textyear :intduration :intcountry :textproduction :text*

personId:textname :text

genreId :textname :text

userId :textage :intgender :text

rating :intgenrebelongsTomovieuser rated

person

cine

mat

ogra

pher

acto

r

dire

ctor

com

pose

rsc

reen

writ

er

knows

Page 18: Graph Data Modeling in DataStax Enterprise

Schema  Mapping  Example

Property  TableCREATE TABLE user_p ( community_id int, member_id bigint, "~~property_key_id" int, "~~property_id" uuid, age int, gender text, "userId" text, "~~vertex_exists" boolean,

PRIMARY KEY (community_id, member_id, "~~property_key_id", "~~property_id"))

Adjacency  TableCREATE TABLE user_e ( community_id int, member_id bigint, "~~edge_label_id" int, "~~adjacent_vertex_id" blob, "~~adjacent_label_id" smallint, "~~edge_id" uuid, "~rating" int, "~~edge_exists" boolean, "~~simple_edge_id" uuid,

PRIMARY KEY (community_id, member_id, "~~edge_label_id", "~~adjacent_vertex_id", "~~adjacent_label_id", "~~edge_id"))

©  DataStax,  All  Rights  Reserved. 18

movieId :texttitle :textyear :intduration :intcountry :textproduction :text*

personId:textname :text

genreId :textname :text

userId :textage :intgender :text

rating :intgenrebelongsTomovieuser rated

person

cine

mat

ogra

pher

acto

r

dire

ctor

com

pose

rsc

reen

writ

er

knows

Page 19: Graph Data Modeling in DataStax Enterprise

1 DataStax Enterprise  Graph

2 Property  Graph  Data  Model

3 Data  Modeling  Framework

4 Schema  Optimizations

19©  DataStax,  All  Rights  Reserved.

Page 20: Graph Data Modeling in DataStax Enterprise

Data  Modeling

• Process  of  organizing  and  structuring  data• Based  on  well-­defined  set  of  rules  or  methodology• Results  in  a  graph  or  database  schema• Affects  data  quality,  data  storage  and  data  retrieval

©  DataStax,  All  Rights  Reserved. 20

Page 21: Graph Data Modeling in DataStax Enterprise

Traditional  Schema  Design

Data  Model• Conceptual  Data  Model  (CDM)

• Logical  Data  Model  (LDM)• Physical  Data  Model  (PDM)

Purpose• Understand  data  and  its  applications

• Sketch  a  graph  data  model• Optimize  physical  design

©  DataStax,  All  Rights  Reserved. 21

Page 22: Graph Data Modeling in DataStax Enterprise

knows

User

userIdage

gender

Movierated

rating

movieId

titleyear

duration

country

production

Genre

Person

belongsTo

involved

Actor Director Composer Screen-­‐writer

Cinema-­‐tographer

IsA

personId

name

genreId

name

Conceptual  Data  Model

• Entity  types• Relationship  types• Attribute  types

©  DataStax,  All  Rights  Reserved. 22

Page 23: Graph Data Modeling in DataStax Enterprise

Transition  from  CDM  to  LDM  

• Both  CDM  and  LDM  are  graphs• Entity  types                              →    Vertex  labels• Relationship  types        →    Edge  labels• Attribute  types                      →    Property  keys

• Mostly  straightforward  with  a  few  nuances

©  DataStax,  All  Rights  Reserved. 23

Page 24: Graph Data Modeling in DataStax Enterprise

movieId :texttitle :textyear :intduration :intcountry :textproduction :text*

personId:textname :text

genreId :textname :text

userId :textage :intgender :text

rating :intgenrebelongsTomovieuser rated

person

cine

mat

ogra

pher

acto

r

dire

ctor

com

pose

rsc

reen

writ

er

knows

Logical  Data  Model

©  DataStax,  All  Rights  Reserved. 24

• Vertex  labels• Edge  labels• Property  keys

Page 25: Graph Data Modeling in DataStax Enterprise

Keys

• Entity  type  keys  →  Property  keys• Uniqueness  is  not  enforced• Vertex  IDs  are  auto-­generated

• Entity  type  keys  →  Custom  vertex  IDs• Uniqueness  is  enforced• Overriding  default  partitioning• Advanced  feature

©  DataStax,  All  Rights  Reserved. 25

User

userId :textage :intgender :text

user

userIdage

gender

Page 26: Graph Data Modeling in DataStax Enterprise

Symmetric  Relationships

©  DataStax,  All  Rights  Reserved. 26

User Movieratedmovieuser

ratedwasRatedBy

movieuserwasRatedBy

movieuser

rated

wasRatedBy

Page 27: Graph Data Modeling in DataStax Enterprise

Bi-­Directional  Relationships

©  DataStax,  All  Rights  Reserved. 27

User

knows

user

knows

userknows

user

userknows

user

user knows user

knows

Page 28: Graph Data Modeling in DataStax Enterprise

Qualified  Bi-­Directional  Relationships

©  DataStax,  All  Rights  Reserved. 28

strength :int

User

likes

user

likes

userlikes

user

userlikes

user

userlikes

user

likesstrength

strength: 7

strength: 9

strength: 7

strength: 9

Page 29: Graph Data Modeling in DataStax Enterprise

Hierarchies

©  DataStax,  All  Rights  Reserved. 29

Movie

involved

Person

IsA

Actor Director

movie

person

directoractor

involved

isA isA

movie

person

involved

role:text

movie

person

actor

director

movie

person

directoractor

isA isA

involved

involved

Page 30: Graph Data Modeling in DataStax Enterprise

Physical  Data  Modelschema.propertyKey("userId").Text().create()

schema.propertyKey("name").Text().create()

schema.propertyKey("age").Int().create()

schema.vertexLabel("user").properties("userId","age",…).create()

schema.vertexLabel("movie").properties("movieId",…).create()

schema.edgeLabel("knows").connection("user","user").create()

schema.edgeLabel("rated").single().properties("rating")

.connection("user","movie").create()

©  DataStax,  All  Rights  Reserved. 30

Page 31: Graph Data Modeling in DataStax Enterprise

1 DataStax Enterprise  Graph

2 Property  Graph  Data  Model

3 Data  Modeling  Framework

4 Schema  Optimizations

31©  DataStax,  All  Rights  Reserved.

Page 32: Graph Data Modeling in DataStax Enterprise

Optimizing  PDM  for  Performance

• Indexing  data• Controlling  partitioning• Materializing  aggregates  and  inferences• Rewriting  traversals

©  DataStax,  All  Rights  Reserved. 32

Page 33: Graph Data Modeling in DataStax Enterprise

Vertex  Indexesschema.vertexLabel("movie")

.index("moviesById")

.materialized()

.by("movieId")

.add()

g.V().has("movie","movieId","m267")

©  DataStax,  All  Rights  Reserved. 33

movieId :texttitle :textyear :intduration :intcountry :textproduction :text*

movie

Page 34: Graph Data Modeling in DataStax Enterprise

Property  Indexesschema.vertexLabel("movie")

.index("movieBudgetBySource")

.property("budget")

.by("source")

.add()

g.V().has("movie","movieId","m267")

.properties("budget")

.has("source","Los Angeles Times").value()

©  DataStax,  All  Rights  Reserved. 34

movieId: m267title: Alice in Wonderlandyear: 2010duration: 108country: United Statesproduction: [Tim Burton Animation Co., Walt Disney Productions]budget: [$150M, $200M]

movie

source: Bloomberg Businessweekdate: March 5, 2010

source: Los Angeles Timesdate: March 7, 2010

Page 35: Graph Data Modeling in DataStax Enterprise

Edge  Indexesschema.vertexLabel("user")

.index("toMoviesByRating")

.outE("rated")

.by("rating")

.add()

g.V().has("user","userId","u1")

.outE("rated").has("rating",gt(6)).inV()

©  DataStax,  All  Rights  Reserved. 35

rating: 7movieuser rated

rating: 9

movierated

rating: 7movie

rated

Page 36: Graph Data Modeling in DataStax Enterprise

movie_p

year Kcountry KmovieId C↑~~property_key_id C↑~~property_id C↑durationtitle~~vertex_exists

Custom  Partitioningschema.vertexLabel("movie")

.partitionKey("year","country")

.clusteringKey("movieId")

.properties("title","duration")

.create()

©  DataStax,  All  Rights  Reserved. 36

movie_e

year Kcountry KmovieId C↑~~edge_label_id C↑~~adjacent_vertex_id C↑~~adjacent_label_id C↑~~edge_id C↑~~edge_exists~~simple_edge_id

Page 37: Graph Data Modeling in DataStax Enterprise

movieId :texttitle :textyear :intduration :intcountry :textproduction :text*avg :float

movie

Materializing  Aggregatesg.V().hasLabel("movie")

.property("avg",_.inE("rated")

.values("rating")

.mean())

©  DataStax,  All  Rights  Reserved. 37

Page 38: Graph Data Modeling in DataStax Enterprise

Materializing  Inferencesg.V().has("person","name","Tom Hanks").as("tom")

.in("actor").out("actor").where(neq("tom")).dedup()

.addE("knows").from("tom")

©  DataStax,  All  Rights  Reserved. 38

movietomperson

actor

person

person

actor

actor

knows

knows

Page 39: Graph Data Modeling in DataStax Enterprise

Rewriting  Traversals

• Equivalent  results• Different  execution  plans• Different  response  times

©  DataStax,  All  Rights  Reserved. 39

g.V().has("movie","year",2010).out("actor").has("name","Johnny Depp").count()

g.V().has("person","name","Johnny Depp").in("actor").has("year",2010).count()

Page 40: Graph Data Modeling in DataStax Enterprise

Profiling  Traversals

©  DataStax,  All  Rights  Reserved. 40

Page 41: Graph Data Modeling in DataStax Enterprise

Thank  You

©  DataStax,  All  Rights  Reserved. 41

Artem [email protected]/in/artemchebotko

Page 42: Graph Data Modeling in DataStax Enterprise

The  End