DataSToRM: Data Science and Technology Research …...Mr. Vitaliy Gleyzer MIT Lincoln Laboratory 5 March 2018 The Future of Advanced (Secure) Computing This material is based upon
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
DataSToRM: Data Science and Technology Research
Environment
Mr. Vitaliy Gleyzer
MIT Lincoln Laboratory
5 March 2018
The Future of Advanced (Secure) Computing
This material is based upon work supported by the Assistant Secretary of Defense for Research and Engineering under Air ForceContract No. FA8721-05-C-0002 and/or FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Assistant Secretary of Defense for Research and Engineering.
Distribution Statement A: Approved for public release: distribution unlimited.
Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.
DataSToRM - 2VG 03/05/18
Advancing the State of Big Data Analytics: Raw Data to Insight
What new insight can
be gained from the
data?
What new information
should be collected?
analyticsdata
technologies
Big Data Application
presentation
Can analytics be modified
to allow efficient
implementation?
Can different data help
improve human
understanding?
How can existing
analytics be
accelerated?
How can the insight be
presented to improve
human cognition?
Are there new application
areas enabled by new
analytics?
How can data be
stored/collected more
efficiently?
DataSToRM - 3VG 03/05/18
Large-Scale Graph Applications Today
Development Environment
Lincoln Laboratory Supercomputing Center
Graph Analysis Frameworks and Databases
PageRank
Centrality
Walktrap
InfoMap
K-Truss
Algorithms
…
Visualization ToolkitsHardware Platform
……
BFS
MST
D4M
Cyber Data Analysis
Recommender Systems
Functional Brain Mapping
Applications
Personalized Healthcare
Fraud Detection
Drug Discovery
Diverse, quickly evolving ecosystem
DataSToRM - 4VG 03/05/18
Advancing the State of Big Data Analytics: Challenges
• Technology moves quickly– New algorithms and analytic techniques– New storage solutions– New processing technologies– New database technologies and frameworks– New applications
• New framework adoption is a serious investment
• How to leverage new technologies?
• How to enable co-design opportunities?
• How to integrate disparate communities to enable co-design?
Keeping up with big data technology is challenging
DataSToRM - 5VG 03/05/18
Example Standardization Efforts
Development Environment
Lincoln Laboratory Supercomputing Center
Graph Analysis Frameworks and Databases
PageRank
Centrality
Walktrap
InfoMap
K-Truss
Algorithms
…
Visualization ToolkitsHardware Platform
……
BFS
MST
D4M
Cyber Data Analysis
Recommender Systems
Functional Brain Mapping
Applications
Personalized Healthcare
Fraud Detection
Drug Discovery
GraphBLASApache
Gremlin
Different communities are attempting to unify and standardize interface and languages in the ecosystem
DataSToRM - 6VG 03/05/18
Example Standardization Efforts
Development Environment
Lincoln Laboratory Supercomputing Center
Graph Analysis Frameworks and Databases
PageRank
Centrality
Walktrap
InfoMap
K-Truss
Algorithms
…
Visualization ToolkitsHardware Platform
……
BFS
MST
D4M
Cyber Data Analysis
Recommender Systems
Functional Brain Mapping
Applications
Personalized Healthcare
Fraud Detection
Drug Discovery
GraphBLASApache
Gremlin
Different communities are attempting to unify and standardize interface and languages in the ecosystem
Backend interface standardization
enables hardware innovation!
• Parallel in-memory database
• 100–1000 graph edge traversal speed
• Simple linear algebra API
DataSToRM - 7VG 03/05/18
Unifying Principles for Big Data Graphs
• Graphs capture relationship information between entities
– Molecular forces– Social interactions– Semantic concepts– Vehicle tracks
• Graphs can be fully expressed in the language of linear algebra
– Represented as sparse matrices– Enable mathematic foundation for data analysis– Leverage existing linear algebra techniques and
methods– Define a small set of well-defined mathematical
operations
Graph Representations
Adjacency Matrix(N N)
Vertices
Ver
tice
s
Incidence matrix(N M)V
erti
ces
Edges
DataSToRM - 8VG 03/05/18
DataSToRM: Data Science and Technology Research Environment
Enables hardware diversity
Composed of analytics
Implemented on top of a standard API
Hardware acceleration of a small number of well-defined mathematical operations enable an extensive analytic ecosystem
Threat Detection Sentiment AnalysisRecommender
Engine …
Community Detection
Classification
API
Centrality Analysis
Hardware
Applications
Graph Analysis Kernels …
GraphBLAS (Semi-ring Linear Algebra API)
DataSToRM - 9VG 03/05/18
GraphBLAS Overview
• Five key operationsA = NxM(i,j,v) (i,j,v) = A C = A B C = A C C = A B = A . B
• Can be used to build 12 GraphBLAS standard functionsbuildMatrix, extractTuples, Transpose, mXm, mXv, vXm, extract, assign, eWiseAdd, eWiseMult, apply, reduce
• Can be used to build a variety of graph utility functionsTril(), Triu(), Degreed Filtered BFS, …
• Can be used to build a variety of graph algorithmsK-Truss, Jaccard Coefficient, Non-Negative Matrix Factorization, …
• That work on a wide range of graphsHyper, multi-directed, multi-weighted, multi-partite, multi-edge
Unifying interface for backend graph processing
DataSToRM - 10VG 03/05/18
Lincoln Laboratory Technologies Targeting Large-Scale Graph Analytics
Graph Processor LLSC
D4M
GraphBLAS
Graph AlgorithmsGraph Algorithms
Dynamic Distributed Dimensional
Data Model (D4M)
GraphBLAS
Standard API for graph analytics using
Sparse Linear Algebra primitivesData analysis framework based on
associative array algebra
Graph Processor Lincoln Laboratory Super
Computing Center (LLSC)
State-of-the-art super computing
environment
Novel graph processing architecture
• Simple hardware agnostic API
• Concise language for
complex graph analytics
• Mathematical closure
• Linear Algebra underpinning
• Scalable graph processing
hardware architecture
• Unprecedented performance
• Native linear algebra
instruction set
• Heterogeneous processing
capabilities
• Ideal technology integration
environment
DataSToRM - 11VG 03/05/18
Graph Processor Matrix Multiply Performance
1E+7
1E+8
1E+9
1E+10
1E+11
1E+12
1E+13
1E+14
1E+1 1E+2 1E+3 1E+4 1E+5 1E+6 1E+7 1E+8 1E+9
Trav
ers
ed
Ed
ges
Pe
r Se
con
dWatts
ASIC Graph Processor (Projected)
FPGA Graph Processor (Measured)
Cray XK7 Titan (Measured)
Cray XT4 Franklin (Measured)
Today’s State of the Art
Data Center
Applications
Embedded
Applications
16 Racks
64 Racks
1014
1013
1012
1011
1010
109
108
107
101 102 103 104 105 106 107 108 109
Graph Processor Performance
4 Racks
1 Racks
2 Nodes
FPGAASIC
100x
20171 Chassis System
Highly efficient graph processing technology that is 100s to 1000s of times more efficient compared to traditional architectures
DataSToRM - 12VG 03/05/18
Collaboration Opportunities
• Developing graph algorithms in the language of linear algebra– Community detections, subgraph isomorphism, subgraph matching, etc.
• Developing graph algorithms that can scale to datasets with billions to trillions of vertices– Sparsity-aware, distributed memory algorithms
• Identifying or developing new technologies to leverage the linear algebra abstraction– Compilers, optimizer, hardware accelerators, etc.
• Integrating GraphBLAS backend as part of popular frameworks – e.g., Apache TinkerPop, Neo4j, ElasticSearch, etc.