Top Banner
Text Titan By Isaac Rieksts @IsaacRieksts 1 These thoughts are mine own and do not represent the company
19
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

Text

TitanBy Isaac Rieksts @IsaacRieksts

1These thoughts are mine own and do not represent the company

Page 2: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

OutlineGraph Database overview

Tinkerpop

Titan

Graph Queries

Our use of Ttian

Demo

Page 3: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

Graph Databases

Id Name1 Bob2 Tom3 Joe

Person Knows1 21 3

Person

Crosswalk Bob

Joe

TomKnows

Knows

Page 4: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

Tinkerpop

Abstraction layer

Query Language

Computing

Page 5: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

Graphs with Spark

GraphX

Pregel

GraphLab

Page 6: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

Why Titan

Flexible backend

No added infrastructure cost

Community support

Page 7: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

Backends

Cassandra

Hbase

Hazelcastcache

Persistit

Berkeley

Page 8: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

Text

Database Trianglehttp://blog.nahurst.com/visual-guide-to-nosql-systems

Page 9: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

HBase

Strong consistency at the record level

Transaction support

Store procedures

Replication

Page 10: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

Cassandra

Tunable consistency

Multiple datacenter support

Built in replication and fault tolerance

CQL query language

Keyspace passwords

Page 11: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

IndexingBuilt-in

Fast for exact matches

Lucene

More advanced queries

Good for single box

Elasticsearch

Advanced queries

large scale clusters

Page 12: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

Gremlin vs SPARQL

Support for complex queries

http://gremlindocs.com/

Easy query language

http://www.w3.org/TR/rdf-sparql-query/

Gremlin SPARQL

Page 13: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

Gremlin vs SPARQL example 1

g.v(‘tg:1')

.out('tg:knows')

SELECT ?x WHERE {

tg:1 tg:knows ?x

}

Gremlin SPARQL

Page 14: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

g.v(‘tg:1')

.out(‘tg:knows')

.out('tg:name')

SELECT ?y WHERE {

tg:1 tg:knows ?x .

?x tg:name ?y

}

Gremlin SPARQL

Gremlin vs SPARQL example 2

Page 15: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

Our Mission▪Deliver the most current information on the U.S. healthcare provider

universe using integrated solutions in order for customers to: › Prevent fraud, waste and abuse across the healthcare system › Comply with evolving state and federal regulations › Improve market opportunity for non retail drugs and devices

Health Market Science a Lexisnexis Company

Page 16: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

The Business

BusinessSolutionsHealth Care Provider & Facilities

Variety/Velocity • >2000 of sources • 6 Million unique HCPs • 10+ years history Data Challenges • Constant change in real

world data • Conflicting & partial info • Frequent changes to source

structure • Authoritative sources vs.

crowdsource • Predicting source quality

Master Data SolutionsMedical Procedures & Diagnosis

Volume/Velocity • ~1B claims annually • +5B records annually • 5+ years history Data Challenges • Sources have incomplete

capture • Overlapping source data • Statistical projections &

biases • Social media type

relationships

Medical Claims Data

Batch (CompleteView,

Expense Manager)

Transactional (PRS/MDM/

VerifyRx)

Big Data Relational DB & Analytics

(Claims)

Page 17: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

Master Data Management

Visualization

Dashboard / Reports

Structured Storage

RelationalIndexing

Flexible Storage

NoSQL Graph(s)

Interfacing

Web Services

Distributed Processing

Standardize

Validate

Match

Consolidate

Analytics

Data Sources

Government

Web

Customer

I’m happy

User Interface

Page 18: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

Our use of Titan

Link storage

Analytics of links

Affiliation of business influences

Visualization of relationships

Page 19: Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

Demo