Metadata and the Power of Pattern-Finding

Post on 09-Feb-2017

561 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

Transcript

1

M E T A D A T A A N D T H E P O W E RO F P A T T E R N - F I N D I N G

M A Y 2 4 , 2 0 1 6 F O R D A T A V E R S I T Y

LEON GUZENDAChief Technology Market ing Of f icer

2

A G E N D A

• Who We Are

• Open Source Big & Fast Data Analytics

• Our Core Technology & New Product

• Pattern Finding Examples

• Q & A

O B J E C T I V I T Y , I N C .

4

O B J E C T I V I T Y I N C . O V E R V I E W

• Private company, headquartered in Silicon Valley since 1988

• Verticals:• Government: Intelligence, defense, crime detection & prevention• Financial Services• Industrial Internet of Things (IIoT)• Energy• Healthcare

• Horizontals:• Graph analytics• Complex, distributed, scalable database applications

S A M P L E C U S T O M E R S A N D P A R T N E R SCapital

IntensiveCustomers

Government Customers

Telco & Network

Customers

Technology Partners

SIPartners

5

O P E N S O U R C E B I G & F A S T D A T A A N A L Y T I C S

OPEN SOURCE ANALYTICS. . .

[Fall 2016]

,R Proprietary Rules, Ontologies, Queries...

Reports, Archives...

Workflow Design GUI

Proprietary

. . .OPEN SOURCE ANALYTICS

PROS:• Large community• Lots of algorithms• Model works at scale• Low startup costs• Cost effective

CONS:• Most algorithms are based on

statistical correlation, clustering or filtering

• Graph algorithms mainly tackle theoretical problems

• Hadoop mostly targets files, not metadata.

• Metadata tools focus on technical parameters, not semantic content.

• Vertex, Edge and Triplet operations

• Graph modification operations

• RDD join operations

• Adjacent triplet operations

• Iterative graph-parallel operations

• Page rank, connected, triangle counts etc.

APACHE SPARK GRAPHX API

• Vertex, Edge and Triplet operations

• Graph modification operations

• RDD join operations

• Adjacent triplet operations

• Iterative graph-parallel operations

• Page rank, connected, triangle counts etc.

Spark GraphFrames add Motifs (a simple subgraph definition)

APACHE SPARK GRAPHX API

• Vertex, Edge and Triplet operations

• Graph modification operations

• RDD join operations

• Adjacent triplet operations

• Iterative graph-parallel operations

• Page rank, connected, triangle counts etc.

Spark GraphFrames add Motifs (a simple subgraph definition)

BUT

Efficient pathfinding and complex navigation are inhibited because of a table/triplet approach.

APACHE SPARK GRAPHX API

O U R C O R E T E C H N O L O G Y

13

O U R F O C U S• Complex Objects at scale:

• Relationships are first class citizens

• Ultra-fast navigation and pathfinding

• Not restricted by available RAM

• Scalability, performance, reliability and flexibility:

• Distributed database and distributed processing

• Light, small database kernel - from embedded to cluster to cloud

14

• 1,000’s of trillions of unique objects

• 1,000’s of petabytes of storage

• Resolving an ID fast and regardless of the number of objects

D I S T R I B U T E D D A T A - S I N G L E L O G I C A L V I E WPut the data and processing where it’s needed

15

Put the data and processing where it’s needed

D I S T R I B U T E D P R O C E S S I N G

ThingSpan

Cache

Client Processes

T H I N G S P A N

T H I N G S P A N E N V I R O N M E N T

• Uses Apache Spark open source processing engine

• In partnership with Cloudera, Databricks, HortonWorks and MapR

• Powerful object and relationship modeling

• Can store data in HDFS and/or POSIX

• Ultra-fast graph navigation, pathfinding and pattern finding

• REST Server and API for loading data and performing graph analytics

• Spark DataFrame support to leverage MLlib, GraphX, SQL etc.

T H I N G S P A N F E A T U R E S

D I S T R I B U T E D P R O C E S S I N G & D A T A B A S E

Hadoop Distributed File System

Distributed from top to bottom

OPEN SOURCE ANALYTICS STACK

[Fall 2016]

,R Proprietary Rules, Ontologies, Queries...

Reports, Archives...

Workflow Design GUI

Proprietary

THINGSPAN ENHANCED ANALYTICS STACK

[Later this year]

T H I N G S P A N C O M P O N E N T S

P A T T E R N F I N D I N G

• Conventional Business Intelligence Analytics: Uses filters and statistical correlation to find relationships between parameters.

• Graph Pattern Finding Analytics: Uses a combination of outlier, navigational and pathfinding queries.

• Find outliers with SQL or MLlib

• Navigational query can specify Vertex and Edge types to be included/excluded and can invoke methods during the traversal, e.g. to compute transit time to a node.

• Pathfinding query can find shortest or all paths between two or more Vertices.

• Query type order depends upon the problem

P A T T E R N F I N D I N G T E C H N I Q U E S

CITY

LINK• Mode• Duration• Cost

P A T H - F I N D I N G Q U E R Y• Problem: Find the least expensive route between San Francisco and New

York for a 60 ton, very wide load that must arrive by Saturday and minimizes mode transitions (road/rail/water etc.)

• Implied: We can avoid Rail connections.

• Financial: Money Laundering Detection

• Intelligence Analysis: Threat Detection

• AdTech: Recommendation Engine Support

• Industrial Internet of Things (IIoT): Network Congestion Analysis

P A T T E R N F I N D I N G E X A M P L E S

1. Load Person, Account and Transaction data into ThingSpan

$

$

$

$

$

$

$

$

🏡🏡

F I N A N C I A L : M O N E Y L A U N D E R I N G D E T E C T I O N

P1

Acc 1

Acc 2

Acc 22

Acc 23

Acc 24

Acc 35

Acc 21

Acc 31

Acc 32

Acc 33

Acc 20

P2 P3

$

2. Identify people with more than 5 accounts (centrality)

$ $

$

$

$

$

$

$

$

🏡🏡🏡🏡

F I N A N C I A L : A P P L Y S P A R K G R A P H X

Acc 1

Acc 2

P1 P2

Acc 20

Acc 21

Acc 22

Acc 23

Acc 24

Acc 35

P3

Acc 31

Acc 32

Acc 33

3. Look at all of that person's transactions to see if they terminate in just 1 or 2 offshore accounts

$ $$

$

$

$

$

$

4. INVESTIGATE🏡🏡🏡🏡

F I N A N C I A L : A P P L Y A N A V I G A T I O N A L Q U E R Y

Acc 1

Acc 2

P2

Acc 20

Acc 21

Acc 22

Acc 23

Acc 24

Acc 35

Acc 31

Acc 32

Acc 33

P1 P3

$

1. Load People, Calls, Places and Sightings into the Graph

Seen2Seen1

PlaceZ

Seen3

Seen4

H U M I N T : T H R E A T D E T E C T I O NP1 P2 P3 P5

P6 P7 P8

P9 P10

P12

P13

P11

P14

P15

P16

P18

P17 PlaceX

PlaceY

CDR1 CDR2 CDR3

CDR4 CDR5

CDR7

CDR13

CDR15 CDR16

CDR14

CDR6

CDR12

CDR10

CDR8

CDR11

CDR9

CDR17

2. Use Spark GraphX to find "islands" of callers/callees.

P3CDR1 CDR1

CDR1 CDR1

CDR1

CDR1

CDR1 CDR1P17

CDR1

CDR1

CDR1

CDR1

CDR1

CDR1

CDR1

CDR1 CDR2 CDR3

CDR4 CDR5 CDR6

CDR7

CDR8

CDR9 CDR10

CDR11 CDR12

CDR13 CDR14

CDR15 CDR16

H U M I N T : A P P L Y S P A R K G R A P H XP1 P2

P6

P10

P16

P11

P7 P8

P14

P9

P12

P13

P15

P5

P18

CDR17

3. Use a navigational query to see if any of those People have been seen near Places that need to be protected.

PlaceX

CDR1 CDR1

CDR1 CDR1

CDR1

CDR1

CDR1 CDR1P17

CDR1

CDR1

CDR1

CDR1CDR1

CDR1

CDR1

Seen2Seen1

CDR2 CDR3

CDR4 CDR5 CDR6

CDR7

CDR8

CDR9 CDR10

CDR11 CDR12

CDR13 CDR14

CDR15 CDR16

PlaceY PlaceZ

Seen3

Seen4 CDR17

H U M I N T : A P P L Y A N A V I G A T I O N A L Q U E R Y

P1 CDR1 P2 P3 P5

P6

P10

P11

P7 P8

P9

P16

P14

P12

P13

P15

P18

CDR1

CDR1

4. P14 and P15 have been seen near potential target PlaceX, so they plus P11, P7 and P8 should be put under surveillance.

PlaceX

CDR1 CDR1 CDR1

CDR1

CDR1

CDR1

CDR1 CDR1

CDR1

CDR1

CDR1CDR1

CDR1

CDR1

Seen2Seen1

CDR2 CDR3

CDR4 CDR5 CDR6

CDR7

CDR8

CDR9 CDR10

CDR11 CDR12

CDR13 CDR14

CDR15 CDR16

PlaceZSeen4

H U M I N T : P L A N A C T I O NP1 P2

P6

P3

P7 P8

P5

P9

P12

PlaceY

Seen3

P10

P16

P13

P17

CDR17 P18

P11

P14

P15

Joe Fred Mary Jane

1. Load Products, Orders, People and Social_Links into ThingSpan.

Bill

A D T E C H : P R E - P L A N N E D A D S

Pr1

Pr2

Pr3

Pr4

Pr5

Pr6

Sale2 Sale3 Sale4 Sale5

Follows Follows Follows

Sale1

Joe Fred Mary

2. We want to place adds for Product Pr2

Bill

A D T E C H : P R E - P L A N N E D A D S

Pr2

Pr4

Pr5

Pr6

Sale1 Sale2 Sale3 Sale4 Sale5

Follows Follows Follows

Jane

Pr1

Pr3

Joe Fred Mary Jane

3. Use ThingSpan to find bloggers who bought Pr2 and who also have followers.

Bill

Result: Fred bought Pr2. Mary follows Fred's blogs. Jane & Bill follow Mary's.

A D T E C H : W H O F O L L O W S B U Y E R S O F T H E P R O D U C T ?

Pr1 Pr2 Pr3

Pr4

Pr5

Pr6

Sale1 Sale2 Sale3 Sale4 Sale5

Follows

Follows

Follows

Joe Fred Mary Jane

4. Next time you spot Mary, Jane or Bill, display a personalized Ad for Pr2.

Bill

Result: Fred bought Pr2. Mary follows Fred's blogs. Jane & Bill follow Mary's.

💥💥Buy 1!

A D T E C H : D I S P L A Y T H E A D

Pr1 Pr2 Pr3

Pr4

Pr5

Pr6

Sale1 Sale2 Sale3 Sale4 Sale5

Follows

Follows

Follows

1. Load Location, Equipment, Link (+Load) into the graph

20% 20%

95%

65%

20%

50%

30%

25%

Link 2

Link 3

Link 4

Link 5 Link 7

Link 8

Link 9

Link 1

Off

Link 6

SAN JOSE SALT LAKE CITY CHICAGO NEW YORK

I I O T : T E L C O N E T W O R K C O N G E S T I O N

L1 L2 L3 L4

E1

E2

E3

E20

E21

E22

E30

E31

E32

E33

E40

2. Use Spark SQL to find links that are over 90% loaded.

20%

95%

65%

20%

50%

30%

Off 25%

Link 2

Link 3

Link 4

Link 6

Link 7

Link 8

Link 9

Link 1

Link 5

SALT LAKE CITY CHICAGO NEW YORKSAN JOSE

I I O T : A P P L Y S P A R K S Q L

L1 L2 L3 L4

E1

E2

E3

E20

E21

E22

E31

E32

E33

E4020% E30

3. Use a graph query to find the leaf nodes (branch ends)...

20% 20%

95%

65%

20%

50%

30%

25%

Link 2

Link 3

Link 4

Link 6

Link 7

Link 8

Link 9

Link 1

Link 5

Off

... Then Investigate...

SALT LAKE CITY CHICAGO NEW YORKSAN JOSE

I I O T : A P P L Y A T H I N G S P A N N A V I G A T I O N A L Q U E R Y

L1 L2 L3 L4

E1 E20 E30 E40

E31E21E2

E3 E22 E32

E33

20% 20%

95%

65%

20%

50%

30%

25%

4. Aha! E2 and E3 in San Jose are streaming 8K UHDTV video movies from MovieFlix in New York, overloading Link 6.

Link 1

Link 2

Link 3

Link 4

Link 6

Link 7

Link 8

Link 9

OffLink 5

SALT LAKE CITY CHICAGO NEW YORKSAN JOSE

I I O T : D I A G N O S E

L1 L2 L3 L4

E1 E20 E30 E40

E31E21E2

E3 E22 E32

E33

20% 20%

50%

65%

20%

50%

30%

25%

5. Solved - by switching on Link 5.

Link 1

Link 2

Link 3

Link 4

Link 6

Link 7

Link 8

Link 9

45%Link 5

SALT LAKE CITY CHICAGO NEW YORKSAN JOSE

I I O T : F I X

L1 L2 L3 L4

E1 E20 E30 E40

E2 E21 E31

E3 E22 E32

E33

S U M M A R Y

• Open Source Big & Fast Data analytics tools are great at what they're designed for.

• ThingSpan adds a Metadata Store and scalable graph analytics• Ultra-fast navigation and pathfinding queries.

• It can interoperate with streaming systems and Big Data platforms• ThingSpan is extensible to other open source systems

Q U E S T I O N S ?

Info@objectivity.com408-992-7100

top related