Top Banner
www.Objectivit y.com 06/16/22 1 Making Sense of the Graph Revolution Nick Quinn, Principal Engineer, InfiniteGraph
36

Making sense of the Graph Revolution

May 11, 2015

Download

Technology

InfiniteGraph

In 2013:
- 1.4 Trillion digital interactions happen per month.
- 2.9 million emails are sent every second.
- 72.9 products are ordered on Amazon per second.

That is a lot of connected data, graphs are truly everywhere. Companies are finding that graph database technology is helping them make sense of their big data.

Objectivity’s Nick Quinn, Chief Architect of InfiniteGraph, shows us just how popular graph databases have become and where they are being used, as well as showing us the ins and outs.

Do you want to build technology that does great things with big data? You might want to find out what your colleagues are Tweeting about, make recommendations for apps, music or other retail that result in higher purchase rates, discover hidden connections between new and recorded medical research data, or maybe even leverage intel across government agencies to catch the bad guys.

All this is possible with a graph database.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Making sense of the Graph Revolution

www.Objectivity.com

04/12/23 1

Making Sense of the Graph Revolution

Nick Quinn, Principal Engineer, InfiniteGraph

Page 2: Making sense of the Graph Revolution

Why Call it a Revolution?

• “a forcible overthrow of the current order in favor of a new system.”

• NoSQL (Not Only SQL)– Driven by Choice + Big Data Needs

• Scalable• Performing• Distributed• Highly Available

Page 3: Making sense of the Graph Revolution

Big Data + Graph = Big Graph Data

• Social Scale– 1 billion vertices, 100 billion edges

• Web Scale– 50 billion vertices, 1 trillion edges

• Brain Scale– 100 billion vertices, 100 trillion edges

AND GROWING!

Page 4: Making sense of the Graph Revolution

Why Call it a Graph Revolution?• After 2011, NoSQL and Graph database begin to

follow same trend line and forecast.

Page 5: Making sense of the Graph Revolution

The Growing Graph Database Landscape

Page 6: Making sense of the Graph Revolution

What is a Graph Database?

• A graph database is a native storage engine that enables efficient storage and retrieval of graph structured data.

• Graph databases are typically used:– When the data source is highly connected, – Where the connections are important (add value to the

data), and – When the user access pattern requires traversals of

those connections.

Page 7: Making sense of the Graph Revolution

What is a Graph Database

• Graph Databases have a unique data model (Vertices and Edges).

• They are optimized around concurrent access of persisted data, so users can navigate the data as it is being added or updated.

VERTEX EDGE2 N

Page 8: Making sense of the Graph Revolution

Why Use a Graph Database?Relational DatabaseThink about the SQL query for finding all links between the two “blue” rows... Good luck!

Table_A Table_B Table_C Table_D Table_E Table_F Table_G

Relational databases aren’t good at handling complex relationships!

Page 9: Making sense of the Graph Revolution

Why use a Graph Database?

Objectivity/DB or InfiniteGraph - The solution can be found with a few lines of code

Relational Database

A3 G4

Table_A Table_B Table_C Table_D Table_E Table_F Table_G

Page 10: Making sense of the Graph Revolution

Why use a Graph Database?

Page 11: Making sense of the Graph Revolution

Specialized Graph Use Cases

• Cyber Security – Identifying potential cyber threats and their targets

• Network Management – Offer answers to very complex navigational queries on a social network that needs near real-time answers

• Targeted Advertising – Customize marketing to the consumer by compiling a large knowledge graph with an integrated recommendation engine

Page 12: Making sense of the Graph Revolution

Example 1 - Ad Placement Networks

Smartphone Ad placement - based on the the user’s profile and location data captured by opt-in applications.

• The location data can be stored and distilled in a key-value and column store hybrid database, such as Cassandra

• The locations are matched with geospatial data to deduce user interests.

• As Ad placement orders arrive, an application built on a graph database such as InfiniteGraph, matches groups of users with Ads:

• Maximizes relevance for the user.

• Yields maximum value for the advertiser and the placer.

Page 13: Making sense of the Graph Revolution

Example 2 - Market AnalysisThe 10 companies that control a majority of U.S. consumer goods brands

Page 14: Making sense of the Graph Revolution

Example 3 - Seed To Consumer Tracking

?

Page 15: Making sense of the Graph Revolution

Supply Chain Management Use Case

• Identify the optimal route for a fleet of trucks at a particular time of the year is quite complex.– number of drivers to pay and their salaries – gas, weather patterns, timing requirements, container

sizes, distances, roads, hazards, repairs

• Find the most optimal route during the winter in which certain highways will tend to become hazardous around the Great Lakes.

Page 16: Making sense of the Graph Revolution

Supply Chain Management Use Case

• Find the most cost-effective route in December with weather conditions X and highway conditions Y, and stay below Z latitude while optimizing costs to achieve a rush deliveryGraphView myView = new GraphView();

myView.excludeClass(myGraphDb.getTypeId(Highway.class.getName()),“(weather.precipitation > precipitationX && weather.temperature < temperatureX) || traffic.speed < speedY || traffic.accidents > accidentsY ”);

myView.excludeClass(myGraphDb.getTypeId(City.class.getName()), “latitude >= Z”);

Page 17: Making sense of the Graph Revolution

Supply Chain Management Use CaseCity origin,target = …; // Use query or index to lookup “origin” & “target” city VertexIdentifier resultQualifier = new VertexIdentifier(target); // Set policiesPolicyChain myPolicies = new PolicyChain();myPolicies.addPolicy(new MaximumPathDepthPolicy(MAXIMUM_STEPS));myPolicies.addPolicy(new NoRevisitPolicy()); // Don’t revisit the cities more than once // Define logic on how to process resultsNavigationResultHandler myNavHandler = new NavigationResultHandler(){

@Override    public void handleResultPath(Path result)    {       // The first path returned is the shortest path, but may not be the cheapest       float cost = calculateCost(result);       float time = calculateTime(result);       // Minimize cost       …    }

@Overridepublic void handleNavigatorFinished(Navigator navigator){}

};Navigator navigator = origin.navigate(myView, Guide.DEPTH_FIRST_SEARCH, Qualifier.ANY

/** Path Qualifier **/, resultQualifier, myPolicies, myNavHandler);navigator.start();

Page 18: Making sense of the Graph Revolution

Graph Database Challenge #1: Reading Distributed Data

• If your graph data is distributed, traversing a desired path across partitions can be extremely difficult and slow.

Page 19: Making sense of the Graph Revolution

Graph Database Challenge #1: Reading Distributed Data

• Mitigate bottlenecks and optimize performance by using the following strategies:– Custom Placement: data isolation/localization of

logically related information (to achieve close to subgraph partitioning) in order to minimize the number of network calls

– Distributed Navigation Engine: Distributes the load on the partitions where the data is located.

Page 20: Making sense of the Graph Revolution

Reading Distributed Data: Custom Placement

• Consider the case where you are placing medical data for hospitals and patients. Using a custom placement model you can achieve fairly high isolation of the subgraphs.– Doctor ↔ Hospitals, Patients ↔ Visits.

Page 21: Making sense of the Graph Revolution

Reading Distributed Data: Distributed Navigation Engine

• Google Pregel (2010)– Batch algorithms on large graphs– Avoids passing graph state instead sends messages– Apache Giraph, Jpregel, Hama

while any vertex is active or max iterations not reached: for each vertex: this loop is run in parallel process messages from neighbors (update internal state) send messages to neighbors possibly synchronize results set active flag (unless no messages or state doesn’t change)

Page 22: Making sense of the Graph Revolution

Reading Distributed Data: Distributed Navigation Engine

• Pregel is optimized for large distributed graph analytics• Limitation on Pregel logic: When the traversal is

occurring locally, the logic is to still execute by sending messages from vertex to vertex

• Ideally, when local, the traversal should be executed in memory and when remote, pregel logic should be used.– InfiniteGraph’s Distributed Navigation Engine uses the

QueryServer (oqs) to achieve this optimized behavior.

Page 23: Making sense of the Graph Revolution

Graph Database Challenge #2: Supernodes

• A supernode is a vertex with a disproportionally high number of outgoing edges.– Inefficient to traverse through these vertices

Page 24: Making sense of the Graph Revolution

Supernodes (Avoid the Tonight Show!)In the IMDB data set, some examples of supernodes may be talk shows, awards shows, compilations or variety shows.

Page 25: Making sense of the Graph Revolution

Supernodes: GraphViews and Policies

• With InfiniteGraph, we offer two strategies to addressing the supernode problem within the navigation context. – Use GraphViews to filter out vertex or edge types– Globally limit the number of edges traversed using the

FanoutLimitPolicy

Page 26: Making sense of the Graph Revolution

Supernodes: GraphViews and Policies

• Consider calculating number of links to interesting companies on LinkedIn.– If you are connected to recruiters, the navigation result

set can be slowed down and possibly polluted if traversing through these recruiters.

GraphView myView = new GraphView();myView.excludeClass(myGraphDb.getTypeId(Person.class.getName()), “CONTAINS(profession, ‘recruiter’)”;

PolicyChain chain = new PolicyChain();// Limits # of edges traversed to 10chain.addPolicy(new FanoutLimitPolicy(10));

Page 27: Making sense of the Graph Revolution

Supernodes: Edge Discovery Methods

• If walking the graph, edge discovery methods are available on the vertex API allows for easy lookup.

• More edge discovery methods and optimizations are coming!

Vertex start = …; // lookup by query or index// Get all ‘Facebook’ connectionsEdgePredicate edgeQualifier = new EdgePredicate(Knows.class, “how == ‘Facebook’”);Iterable edgeHandles = start.getEdges(edgeQualifier);

Page 28: Making sense of the Graph Revolution

Graph Database Challenge #3: Writing Distributed Data

App-2(Ingest V2)

App-2(E23{ V2V3})

InfiniteGraph

Objectivity/DB Persistence Layer

App-1(Ingest V1)

App-3(Ingest V3)

V1V1 V2

V2 V3V3

App-1(E1 2{ V1V2})

App-3

E12E12 E23

E23

Page 29: Making sense of the Graph Revolution

Graph Database Challenge #3: Writing Distributed Data

• Concurrent writes (multithreaded, multiprocess and/or multiuser access) to a database that holds highly connected data highly contentious locking behaviorpoor write performance retrying transactions

• NoSQL databases with relaxed consistency modes typically offer higher write performance– System maintains data integrity (ACID), handles lock

conflicts, optimizes batch processing

Page 30: Making sense of the Graph Revolution

Writing Distributed Data: Accelerated Ingest (Pipelining)

• InfiniteGraph offers relaxed consistency ingest mode, Accelerated Ingest.– Vertex, Edge objects are placed immediately– Edge updates are “pipelined” (no lock contention) and

updates are batch processed (optimized)– Graph is built up in background– Achieves highest rate of ingest in distributed

environments

Page 31: Making sense of the Graph Revolution

Writing Distributed Data: Accelerated Ingest (Pipelining)

IG Core/API

C1C1

C2C2

C3C3

E12E12

E23E23

Targ

et C

onta

iner

s

Pipeline Containers

E(1->2)

E(3->1)

E(2->3)

E(2->1)

E(2->3)E(3->1)

E(1->2)

E(3->2)

E(1->2)

E(2->3)

E(3->1)

E(2->1)

E(2->3)

E(3->1)

E(3->2)

E(1->2)

Pipeline

Agent

Page 32: Making sense of the Graph Revolution

Acclerated Ingest Performance Results

Page 33: Making sense of the Graph Revolution

Graph Database Challenge #4: Tools

• Typically, when databases don’t offer tools for analysis or visualization, the tools that are used are general purpose.

• Tools offered by databases are generally integrated well with native features. – Sometimes exposing “hidden” features– These tools can generally be useful for debugging and

development of applications built on top of the database.

Page 34: Making sense of the Graph Revolution

Tools: The IG Visualizer• Excellent for development and debugging of

application built on top of IG database.

Page 35: Making sense of the Graph Revolution

Why InfiniteGraph™?

• Objectivity/DB is a proven foundation– Building distributed databases since 1993– A complete database management system

• Concurrency, transactions, cache, schema, query, indexing

• It’s a Graph Specialist !– Simple but powerful API tailored for data navigation.– Easy to configure distribution model

Page 36: Making sense of the Graph Revolution

QUESTIONS?