Top Banner
1 METADATA AND THE POWER OF PATTERN-FINDING MAY 24, 2016 FOR DATAVERSITY LEON GUZENDA Chief Technology Marketing Officer
44

Metadata and the Power of Pattern-Finding

Feb 09, 2017

Download

Technology

DATAVERSITY
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Metadata and the Power of Pattern-Finding

1

M E T A D A T A A N D T H E P O W E RO F P A T T E R N - F I N D I N G

M A Y 2 4 , 2 0 1 6 F O R D A T A V E R S I T Y

LEON GUZENDAChief Technology Market ing Of f icer

Page 2: Metadata and the Power of Pattern-Finding

2

A G E N D A

• Who We Are

• Open Source Big & Fast Data Analytics

• Our Core Technology & New Product

• Pattern Finding Examples

• Q & A

Page 3: Metadata and the Power of Pattern-Finding

O B J E C T I V I T Y , I N C .

Page 4: Metadata and the Power of Pattern-Finding

4

O B J E C T I V I T Y I N C . O V E R V I E W

• Private company, headquartered in Silicon Valley since 1988

• Verticals:• Government: Intelligence, defense, crime detection & prevention• Financial Services• Industrial Internet of Things (IIoT)• Energy• Healthcare

• Horizontals:• Graph analytics• Complex, distributed, scalable database applications

Page 5: Metadata and the Power of Pattern-Finding

S A M P L E C U S T O M E R S A N D P A R T N E R SCapital

IntensiveCustomers

Government Customers

Telco & Network

Customers

Technology Partners

SIPartners

5

Page 6: Metadata and the Power of Pattern-Finding

O P E N S O U R C E B I G & F A S T D A T A A N A L Y T I C S

Page 7: Metadata and the Power of Pattern-Finding

OPEN SOURCE ANALYTICS. . .

[Fall 2016]

,R Proprietary Rules, Ontologies, Queries...

Reports, Archives...

Workflow Design GUI

Proprietary

Page 8: Metadata and the Power of Pattern-Finding

. . .OPEN SOURCE ANALYTICS

PROS:• Large community• Lots of algorithms• Model works at scale• Low startup costs• Cost effective

CONS:• Most algorithms are based on

statistical correlation, clustering or filtering

• Graph algorithms mainly tackle theoretical problems

• Hadoop mostly targets files, not metadata.

• Metadata tools focus on technical parameters, not semantic content.

Page 9: Metadata and the Power of Pattern-Finding

• Vertex, Edge and Triplet operations

• Graph modification operations

• RDD join operations

• Adjacent triplet operations

• Iterative graph-parallel operations

• Page rank, connected, triangle counts etc.

APACHE SPARK GRAPHX API

Page 10: Metadata and the Power of Pattern-Finding

• Vertex, Edge and Triplet operations

• Graph modification operations

• RDD join operations

• Adjacent triplet operations

• Iterative graph-parallel operations

• Page rank, connected, triangle counts etc.

Spark GraphFrames add Motifs (a simple subgraph definition)

APACHE SPARK GRAPHX API

Page 11: Metadata and the Power of Pattern-Finding

• Vertex, Edge and Triplet operations

• Graph modification operations

• RDD join operations

• Adjacent triplet operations

• Iterative graph-parallel operations

• Page rank, connected, triangle counts etc.

Spark GraphFrames add Motifs (a simple subgraph definition)

BUT

Efficient pathfinding and complex navigation are inhibited because of a table/triplet approach.

APACHE SPARK GRAPHX API

Page 12: Metadata and the Power of Pattern-Finding

O U R C O R E T E C H N O L O G Y

Page 13: Metadata and the Power of Pattern-Finding

13

O U R F O C U S• Complex Objects at scale:

• Relationships are first class citizens

• Ultra-fast navigation and pathfinding

• Not restricted by available RAM

• Scalability, performance, reliability and flexibility:

• Distributed database and distributed processing

• Light, small database kernel - from embedded to cluster to cloud

Page 14: Metadata and the Power of Pattern-Finding

14

• 1,000’s of trillions of unique objects

• 1,000’s of petabytes of storage

• Resolving an ID fast and regardless of the number of objects

D I S T R I B U T E D D A T A - S I N G L E L O G I C A L V I E WPut the data and processing where it’s needed

Page 15: Metadata and the Power of Pattern-Finding

15

Put the data and processing where it’s needed

D I S T R I B U T E D P R O C E S S I N G

ThingSpan

Cache

Client Processes

Page 16: Metadata and the Power of Pattern-Finding

T H I N G S P A N

Page 17: Metadata and the Power of Pattern-Finding

T H I N G S P A N E N V I R O N M E N T

Page 18: Metadata and the Power of Pattern-Finding

• Uses Apache Spark open source processing engine

• In partnership with Cloudera, Databricks, HortonWorks and MapR

• Powerful object and relationship modeling

• Can store data in HDFS and/or POSIX

• Ultra-fast graph navigation, pathfinding and pattern finding

• REST Server and API for loading data and performing graph analytics

• Spark DataFrame support to leverage MLlib, GraphX, SQL etc.

T H I N G S P A N F E A T U R E S

Page 19: Metadata and the Power of Pattern-Finding

D I S T R I B U T E D P R O C E S S I N G & D A T A B A S E

Hadoop Distributed File System

Distributed from top to bottom

Page 20: Metadata and the Power of Pattern-Finding

OPEN SOURCE ANALYTICS STACK

[Fall 2016]

,R Proprietary Rules, Ontologies, Queries...

Reports, Archives...

Workflow Design GUI

Proprietary

Page 21: Metadata and the Power of Pattern-Finding

THINGSPAN ENHANCED ANALYTICS STACK

[Later this year]

Page 22: Metadata and the Power of Pattern-Finding

T H I N G S P A N C O M P O N E N T S

Page 23: Metadata and the Power of Pattern-Finding

P A T T E R N F I N D I N G

Page 24: Metadata and the Power of Pattern-Finding

• Conventional Business Intelligence Analytics: Uses filters and statistical correlation to find relationships between parameters.

• Graph Pattern Finding Analytics: Uses a combination of outlier, navigational and pathfinding queries.

• Find outliers with SQL or MLlib

• Navigational query can specify Vertex and Edge types to be included/excluded and can invoke methods during the traversal, e.g. to compute transit time to a node.

• Pathfinding query can find shortest or all paths between two or more Vertices.

• Query type order depends upon the problem

P A T T E R N F I N D I N G T E C H N I Q U E S

Page 25: Metadata and the Power of Pattern-Finding

CITY

LINK• Mode• Duration• Cost

P A T H - F I N D I N G Q U E R Y• Problem: Find the least expensive route between San Francisco and New

York for a 60 ton, very wide load that must arrive by Saturday and minimizes mode transitions (road/rail/water etc.)

• Implied: We can avoid Rail connections.

Page 26: Metadata and the Power of Pattern-Finding

• Financial: Money Laundering Detection

• Intelligence Analysis: Threat Detection

• AdTech: Recommendation Engine Support

• Industrial Internet of Things (IIoT): Network Congestion Analysis

P A T T E R N F I N D I N G E X A M P L E S

Page 27: Metadata and the Power of Pattern-Finding

1. Load Person, Account and Transaction data into ThingSpan

$

$

$

$

$

$

$

$

🏡🏡

F I N A N C I A L : M O N E Y L A U N D E R I N G D E T E C T I O N

P1

Acc 1

Acc 2

Acc 22

Acc 23

Acc 24

Acc 35

Acc 21

Acc 31

Acc 32

Acc 33

Acc 20

P2 P3

$

Page 28: Metadata and the Power of Pattern-Finding

2. Identify people with more than 5 accounts (centrality)

$ $

$

$

$

$

$

$

$

🏡🏡🏡🏡

F I N A N C I A L : A P P L Y S P A R K G R A P H X

Acc 1

Acc 2

P1 P2

Acc 20

Acc 21

Acc 22

Acc 23

Acc 24

Acc 35

P3

Acc 31

Acc 32

Acc 33

Page 29: Metadata and the Power of Pattern-Finding

3. Look at all of that person's transactions to see if they terminate in just 1 or 2 offshore accounts

$ $$

$

$

$

$

$

4. INVESTIGATE🏡🏡🏡🏡

F I N A N C I A L : A P P L Y A N A V I G A T I O N A L Q U E R Y

Acc 1

Acc 2

P2

Acc 20

Acc 21

Acc 22

Acc 23

Acc 24

Acc 35

Acc 31

Acc 32

Acc 33

P1 P3

$

Page 30: Metadata and the Power of Pattern-Finding

1. Load People, Calls, Places and Sightings into the Graph

Seen2Seen1

PlaceZ

Seen3

Seen4

H U M I N T : T H R E A T D E T E C T I O NP1 P2 P3 P5

P6 P7 P8

P9 P10

P12

P13

P11

P14

P15

P16

P18

P17 PlaceX

PlaceY

CDR1 CDR2 CDR3

CDR4 CDR5

CDR7

CDR13

CDR15 CDR16

CDR14

CDR6

CDR12

CDR10

CDR8

CDR11

CDR9

CDR17

Page 31: Metadata and the Power of Pattern-Finding

2. Use Spark GraphX to find "islands" of callers/callees.

P3CDR1 CDR1

CDR1 CDR1

CDR1

CDR1

CDR1 CDR1P17

CDR1

CDR1

CDR1

CDR1

CDR1

CDR1

CDR1

CDR1 CDR2 CDR3

CDR4 CDR5 CDR6

CDR7

CDR8

CDR9 CDR10

CDR11 CDR12

CDR13 CDR14

CDR15 CDR16

H U M I N T : A P P L Y S P A R K G R A P H XP1 P2

P6

P10

P16

P11

P7 P8

P14

P9

P12

P13

P15

P5

P18

CDR17

Page 32: Metadata and the Power of Pattern-Finding

3. Use a navigational query to see if any of those People have been seen near Places that need to be protected.

PlaceX

CDR1 CDR1

CDR1 CDR1

CDR1

CDR1

CDR1 CDR1P17

CDR1

CDR1

CDR1

CDR1CDR1

CDR1

CDR1

Seen2Seen1

CDR2 CDR3

CDR4 CDR5 CDR6

CDR7

CDR8

CDR9 CDR10

CDR11 CDR12

CDR13 CDR14

CDR15 CDR16

PlaceY PlaceZ

Seen3

Seen4 CDR17

H U M I N T : A P P L Y A N A V I G A T I O N A L Q U E R Y

P1 CDR1 P2 P3 P5

P6

P10

P11

P7 P8

P9

P16

P14

P12

P13

P15

P18

Page 33: Metadata and the Power of Pattern-Finding

CDR1

CDR1

4. P14 and P15 have been seen near potential target PlaceX, so they plus P11, P7 and P8 should be put under surveillance.

PlaceX

CDR1 CDR1 CDR1

CDR1

CDR1

CDR1

CDR1 CDR1

CDR1

CDR1

CDR1CDR1

CDR1

CDR1

Seen2Seen1

CDR2 CDR3

CDR4 CDR5 CDR6

CDR7

CDR8

CDR9 CDR10

CDR11 CDR12

CDR13 CDR14

CDR15 CDR16

PlaceZSeen4

H U M I N T : P L A N A C T I O NP1 P2

P6

P3

P7 P8

P5

P9

P12

PlaceY

Seen3

P10

P16

P13

P17

CDR17 P18

P11

P14

P15

Page 34: Metadata and the Power of Pattern-Finding

Joe Fred Mary Jane

1. Load Products, Orders, People and Social_Links into ThingSpan.

Bill

A D T E C H : P R E - P L A N N E D A D S

Pr1

Pr2

Pr3

Pr4

Pr5

Pr6

Sale2 Sale3 Sale4 Sale5

Follows Follows Follows

Sale1

Page 35: Metadata and the Power of Pattern-Finding

Joe Fred Mary

2. We want to place adds for Product Pr2

Bill

A D T E C H : P R E - P L A N N E D A D S

Pr2

Pr4

Pr5

Pr6

Sale1 Sale2 Sale3 Sale4 Sale5

Follows Follows Follows

Jane

Pr1

Pr3

Page 36: Metadata and the Power of Pattern-Finding

Joe Fred Mary Jane

3. Use ThingSpan to find bloggers who bought Pr2 and who also have followers.

Bill

Result: Fred bought Pr2. Mary follows Fred's blogs. Jane & Bill follow Mary's.

A D T E C H : W H O F O L L O W S B U Y E R S O F T H E P R O D U C T ?

Pr1 Pr2 Pr3

Pr4

Pr5

Pr6

Sale1 Sale2 Sale3 Sale4 Sale5

Follows

Follows

Follows

Page 37: Metadata and the Power of Pattern-Finding

Joe Fred Mary Jane

4. Next time you spot Mary, Jane or Bill, display a personalized Ad for Pr2.

Bill

Result: Fred bought Pr2. Mary follows Fred's blogs. Jane & Bill follow Mary's.

💥💥Buy 1!

A D T E C H : D I S P L A Y T H E A D

Pr1 Pr2 Pr3

Pr4

Pr5

Pr6

Sale1 Sale2 Sale3 Sale4 Sale5

Follows

Follows

Follows

Page 38: Metadata and the Power of Pattern-Finding

1. Load Location, Equipment, Link (+Load) into the graph

20% 20%

95%

65%

20%

50%

30%

25%

Link 2

Link 3

Link 4

Link 5 Link 7

Link 8

Link 9

Link 1

Off

Link 6

SAN JOSE SALT LAKE CITY CHICAGO NEW YORK

I I O T : T E L C O N E T W O R K C O N G E S T I O N

L1 L2 L3 L4

E1

E2

E3

E20

E21

E22

E30

E31

E32

E33

E40

Page 39: Metadata and the Power of Pattern-Finding

2. Use Spark SQL to find links that are over 90% loaded.

20%

95%

65%

20%

50%

30%

Off 25%

Link 2

Link 3

Link 4

Link 6

Link 7

Link 8

Link 9

Link 1

Link 5

SALT LAKE CITY CHICAGO NEW YORKSAN JOSE

I I O T : A P P L Y S P A R K S Q L

L1 L2 L3 L4

E1

E2

E3

E20

E21

E22

E31

E32

E33

E4020% E30

Page 40: Metadata and the Power of Pattern-Finding

3. Use a graph query to find the leaf nodes (branch ends)...

20% 20%

95%

65%

20%

50%

30%

25%

Link 2

Link 3

Link 4

Link 6

Link 7

Link 8

Link 9

Link 1

Link 5

Off

... Then Investigate...

SALT LAKE CITY CHICAGO NEW YORKSAN JOSE

I I O T : A P P L Y A T H I N G S P A N N A V I G A T I O N A L Q U E R Y

L1 L2 L3 L4

E1 E20 E30 E40

E31E21E2

E3 E22 E32

E33

Page 41: Metadata and the Power of Pattern-Finding

20% 20%

95%

65%

20%

50%

30%

25%

4. Aha! E2 and E3 in San Jose are streaming 8K UHDTV video movies from MovieFlix in New York, overloading Link 6.

Link 1

Link 2

Link 3

Link 4

Link 6

Link 7

Link 8

Link 9

OffLink 5

SALT LAKE CITY CHICAGO NEW YORKSAN JOSE

I I O T : D I A G N O S E

L1 L2 L3 L4

E1 E20 E30 E40

E31E21E2

E3 E22 E32

E33

Page 42: Metadata and the Power of Pattern-Finding

20% 20%

50%

65%

20%

50%

30%

25%

5. Solved - by switching on Link 5.

Link 1

Link 2

Link 3

Link 4

Link 6

Link 7

Link 8

Link 9

45%Link 5

SALT LAKE CITY CHICAGO NEW YORKSAN JOSE

I I O T : F I X

L1 L2 L3 L4

E1 E20 E30 E40

E2 E21 E31

E3 E22 E32

E33

Page 43: Metadata and the Power of Pattern-Finding

S U M M A R Y

• Open Source Big & Fast Data analytics tools are great at what they're designed for.

• ThingSpan adds a Metadata Store and scalable graph analytics• Ultra-fast navigation and pathfinding queries.

• It can interoperate with streaming systems and Big Data platforms• ThingSpan is extensible to other open source systems

Page 44: Metadata and the Power of Pattern-Finding

Q U E S T I O N S ?

[email protected]