StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

StructMatrix: large-scale visualization of graphs bymeans of structure detection and dense matrices

Hugo Gualdron, Robson L. F. Cordeiro, Jose F Rodrigues-Jr

University of Sao PauloIn collaboration with Carnegie Mellon University

(Prof. Christos Faloutsos, and PhD Danai Koutra)

Funding by research agency Fapesp (2013/03906-0, 2014/07879-0, 2015/18335)

In: The Fifth IEEE ICDM Workshop on Data Mining in Networks,Atlantic City, NJ, USA - November, 2015

http://www.icmc.usp.br/pessoas/junio

Jose F Rodrigues-Jr (University of Sao Paulo) 1 / 20

http://www.icmc.usp.br/pessoas/junio

Introduction

Motivation

Big Data!!!

A lot of information, much of it in the form of relationships;

Large-scale graphs: graphs generated by applications in which usersor entities are distributed along large geographical areas - even theentire planet;

Social networks, recommendation networks, road nets, e-commerce,computer networks, client-product logs, and many others.

Data analysis is the differential for industrial competition.

General Electric & Accenture.


Introduction

Problem

Such graphs are too big:

node-link visualization cannot handle even thousand-vertices graphs;

adjacency matrices are limited by the number of pixels of the screen;

in any case, the cardinality of the nodes prevents rationalization;

non-visual analytical techniques might produce way too manypatterns preventing human cognition.

Still, we want to characterize the structure of graphs for:

understanding the overall structure, and not only thedistribution-based analyses;

spotting outliers and trends that are not dominant;

requesting details on demand concerning subregions of the graphtopology.


Introduction

Problem

Layouts node-link and adjacency matrix

Node-link Adjacency matrix

Scalability:Hundred nodes Thousand nodes


Introduction

Methodology overview

Assumptions:

graphs are made of recurrent simple structures (cliques, bi-partitecores, stars, and chains);

such structures are more meaningful than sole nodes;

even at lower resolutions, the graph main properties are maintained ina visualization.

Hypothesis: we reach more scalable and meaningful graph visualizationswith:

graph summarization by detecting recurrent structures of the graph;

dense adjacency matrices.


Methodology

Proposed method: StructMatrix

Our method has two parts:

1 An algorithm to detect substructures;

2 A dense adjacency matrix of the structures that were detected.


Methodology

1.Structure detection


Methodology

1.Structure detection

We designed a graph partitioning algorithm based on the fact thatreal-world graphs obey to power-law distributions;In such graphs: few nodes with very high degree and the majority ofnodes with low degree;Kang and Faloutsos [1] demonstrated that the ordered removal of thehigher degree nodes leads to the removal of hubs from the giant CC,creating satellite (much smaller) connected components;

This ordered removal lends to a structural scanning of the graph.


Methodology

1.Structure detection–Structure vocabulary

StructMatrix Vocabulary ψ


Methodology

1.Structure detection–Algorithm

1 If the queue has connected components, StructMatrix gets the firstelement for processing.


Methodology


2 StructMatrix selects the vertices with higher degree (up to 1% of thevertices) and removes their edges.


Methodology


2 We get a set of smaller connected subcomponents.


Methodology


3 We classify the subcomponents according to the vocabulary.


Methodology

1.Structure detection–Structure classification

α = n2

4 β = n(n−1)2 ε = 0.2


Methodology


4 We store the classified subcomponents; the ones that were notidentified go to the queue waiting for a new round of shattering.


Methodology


5 We proceed to the next element in the queue.


Methodology

1.Structure detection–Structure detection results

Graph # Structures fs st ch nc fc nb fbDBLP 160.885 76% 5% 2% 2% 15% <1% -WWW-barabasi 15.652 32% 52% 5% 3% 2% 4% 2%cit-HepPh 14.479 79% 13% 6% 1% <1% <1% <1%Wikipedia-vote 1.706 65% 33% 2% - - <1% -Epinions 8774 52% 31% 14% <1% <1% 2% <1%Roadnet PA 51.175 23% 45% 27% - - 5% -Roadnet CA 88.993 27% 39% 29% - - 4% -Roadnet TX 62.614 25% 43% 28% - - 4% -


Methodology

1.Structure detection–Runtime

We compare to algorithm VoG (Koutra et al.[2]): better performance, andbigger vocabulary.


Methodology

2.Visualization–Projection

After structure detection, we build an adjacency matrixstructure-to-structure whose edges’ weights indicate the number ofedges between the nodes of each structure;

Although smaller than the original matrix, for million-scale graphs,the struct matrix is still too large to fit in the screen;

For this reason we create a dense matrix according to a straightproportion (x , y)→ (ρx , ρy ) for:

ρx =⌈

(Resx − 1) x−xminxmax−xmin

+ 12

⌉ρy =

⌈(Rexy − 1) y−ymin

ymax−ymin+ 1

2

⌉ (1)

where (x , y) are points of the original matrix and Resx ,Resy are thetarget resolutions; the more resolution, the more details are presented– these parameters allow for interactive grasping of details.


Methodology

2.Visualization–Projection


Methodology

2.Visualization–Layout

We organize the matrix according to structure type, and to number ofedges – size of structures (number of nodes) is given by color.


Methodology

2.Visualization–Layout

We organize the matrix according to structure type, and to number ofedges – size of structures (number of nodes) is given by color.


Experiments

Experiments–Real datasets

Graph # Structures fs st ch nc fc nb fbDBLP 160.885 76% 5% 2% 2% 15% <1% -WWW-barabasi 15.652 32% 52% 5% 3% 2% 4% 2%cit-HepPh 14.479 79% 13% 6% 1% <1% <1% <1%Wikipedia-vote 1.706 65% 33% 2% - - <1% -Epinions 8774 52% 31% 14% <1% <1% 2% <1%Roadnet PA 51.175 23% 45% 27% - - 5% -Roadnet CA 88.993 27% 39% 29% - - 4% -Roadnet TX 62.614 25% 43% 28% - - 4% -


Experiments

Experiments–Real datasets–WWW-barabasi

WWW-barabasi: webpages and links between them.

Stars (st and fs) refer to webpages with many out links.

Most of the webpages have less than one thousand connections;however, some present unusual thousand connections.


Experiments

Experiments–Real datasets–Road nets

Pennsylvania California Texas

The three road graphs have a similar structure – all U.S. roads;

There is a hierarchical connectivity: bigger to smaller cities;

Surprising grid-like (due to symmetry) structure: intersections refer tohub cities, and lines refer to inter-city paths.


Experiments

Experiments–Real datasets–Road nets

Comparison: Structure-to-structure vs Node-to-node.

California (structure-to-structure) California (node-to-node)

Main differences:

1 The partitioning according to structures;

2 The ordering by number of edges to other structures;

3 There is a hierarchical connectivity: bigger to smaller cities;

4 Surprising grid-like structure: intersections refer to hub cities, andlines refer to inter-city paths.


Experiments

Experiments–Real datasets–DBLP

Overall FC-FC zoom

DBLP is mainly characterized by false stars – possibly becauseadvisors have students, and students connect one to each other;By zooming FC-FC, one can see outliers, for instance k3 = “TheBiomolecular Interaction Network Database and related tools 2005update” 75 authors.


Conclusions

Contributions

Visualization technique: we introduce a processing and visualizationmethodology that puts together algorithmic techniques and design inorder to reach large-scale visualizations;

Analytical scalability: our technique extends the most scalabletechnique found in the literature; plus, it is engineered to plot millionsof edges in a matter of seconds;

Practical analysis: we show that large-scale graphs have well-definedbehaviors concerning the distribution of structures, their size, andhow they are related one to each other; finally, using a standardlaptop, our techniques allowed us to experiment in real, large-scalegraphs coming from domains of high impact, i.e., WWW, Wikipedia,Roadnet, and DBLP.


References

U. Kang and C. Faloutsos, “Beyond ’caveman communities’: Hubsand spokes for graph compression and mining,” in ICDM, 2011, pp.300–309.

D. Koutra, U. Kang, J. Vreeken, and C. Faloutsos, “Vog:Summarizing and understanding large graphs,” in SDM, 2014.


StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Data & Analytics

StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices