Graph Data Analytics www.globalids.com Arka Mukherjee, Ph.D. Global IDs [email protected] Resolving Complexity at an Enterprise Scale
Feb 23, 2016
Graph Data Analytics
www.globalids.com
Arka Mukherjee, Ph.D.Global IDs
Resolving Complexity at an Enterprise Scale
© 2013 Global IDs
2Proprietary
1 The “Complex Data” Context
Current Challenges2
Governance Methodology3
Topics
The “Complex Data” Context
© 2013 Global IDs
4Proprietary
The Big Shift
© 2013 Global IDs
5Proprietary
The cost structure is unsustainable
The cost of managing information is going up exponentially.
© 2013 Global IDs
6Proprietary
The Complexity growth is unmanageable
1. Complex data ecosystems
2. Highly dynamic
3. Limited traceability
4. Systemic Risk : Hard to measure
FinancialServices
Institutions
© 2013 Global IDs
7Proprietary
Question
How can Enterprises handle the cost and complexity of managing complex data landscapes ?
© 2013 Global IDs
8Proprietary
Global IDs Focus
To organize enterprise data landscapes
© 2013 Global IDs
9Proprietary
Global IDs: Product Suite
© Global IDs Inc. (2001-2013)
14
Global IDs Software Products
MetadataGovernance Suite
Master DataGovernance Suite
Enterprise DataGovernance Suite
13
12
11
10
9
8
7
6
5
4
3
2
1
Dashboards
Stewardship
Validation
Rules
Monitor
Model
Search
Map
Classify
Profile
Ingest
Discover
Big DataGovernance Suite
Move
Standardize
Create Transparency
Improve Quality
Accelerate Integration
Integrate
Distribute
15
16
18
17Analyze
Measure
Embed Analytics
Link
Visualize
19
20 Dashboards and Infographics
Graph Databases with Linked Data
KPIs and Trend Metrics
Reporting and Ad-Hoc Analysis
Data Services for Master Data
Integrated Master Data
Enriched Master Data
Data Repositories in Relational Databases or Hadoop
Master Data Governance Portals
RACI Matrix of Data Stewards
Data Quality Metrics
Rules Repository
Change Monitors, Impact Analysis
Master Data Models
Enterprise Search
Business Ontologies
Business Taxonomies
Semantic Metadata Repository
Inventory of External Data Assets
Comprehensive Data Asset Inventory
4
3
2
1
Deliverables
Under Development Using Hadoop Stack
Objective Function
Challenges
© 2013 Global IDs
11Proprietary
The typical Financial Institution’s
# Databases > 1000
# Tables > 200,000
# Columns > 2,000,000
© 2013 Global IDs
12Proprietary
Question
How can we understand the relationships across 2,000,000 attributes?
© 2013 Global IDs
13Proprietary
Converging Data Variety
Structured
Unstructured
MultiStructured
Data Content
© 2013 Global IDs
14Proprietary
Converging Data Ecosystems
SocialData
EnterpriseData
MachineData
Data Ecosystems
© 2013 Global IDs
15Proprietary
Current Approaches do not Scale
# Databases > 1,000 > 10,000 > 100,000
Small Average Large
© 2013 Global IDs
16Proprietary
A New Approach is Required
© 2013 Global IDs
17Proprietary
5 Utilize Graph Structures for Governance
Graph Analytics : Use Cases
© 2013 Global IDs
19Proprietary
Key Challenges
• Vast diversity and volume of metadata and data
• Storage and indexing of metadata to facilitate search and navigation
• Understanding the connection between different pieces of metadata (Crosswalk)
© 2013 Global IDs
20Proprietary
Utilize Graphs Structures for Storing Complex Data
© 2013 Global IDs
21Proprietary
Use Case 1:Enterprise Metadata Search with Hadoop
© 2013 Global IDs
22Proprietary
Use Case 2: Unstructured Data Integration
© 2013 Global IDs
23Proprietary
Use Case 3: Cross Database Similarity Mapping
© 2013 Global IDs
24Proprietary
Use Case 4 : Graph Analytics
Demo
Methodology
© 2013 Global IDs
27Proprietary
What we do
1. Scan
2. Analyze
3. Map / Organize
4. Govern
© 2013 Global IDs
28Proprietary
Automation
© 2013 Global IDs
29Proprietary
1 : Scan
© 2013 Global IDs
30Proprietary
2 : Semantic Analysis
© 2013 Global IDs
31Proprietary
3 Automate Semantic Mapping
© 2013 Global IDs
32Proprietary
4 Link the Data Landscape
Thank You!