Multilevel Network VisualizationEmmanuel OppongComputer Science
and EngineeringThe Pennsylvania State University
SROP 2014 ReportAugust 4, 2014
AbstractIn this research project, we investigate the problem of
visualizing large networks. Networks, or graphs, are used to
describe relationships between different objects. Graphs are widely
used in social networks, roadway systems, and in general, to
describe a system that has interactions among multiple entities.
Visualizing relationships through graph drawings is important so
that information can be easily comprehended and navigated. Some
networks, for instance social networks, can become very large when
they represent a large number of entities. In this project, we
develop a new multilevel method for visualizing graphs, using
existing tools and algorithms for graph drawing. We tested this
method on real-world networks from several online repositories,
such as the Koblenz network collection and Stanford large network
collection. We evaluated the method and compared it to
alternatives. This research tool will allow users to generate
multilevel network visualizations for describing systems such as
social connections, microorganism relationships, highway systems,
large populations, and map topologies.IntroductionA network is a
system of interconnected objects. Graph theory is the mathematical
language used to describe networks. It is a very old branch of
mathematics which started in 1736 when Leonhard Euler attempted to
solve the problem of the seven bridges of Konigsberg. He tried to
prove that there wasnt a possible way of visiting each bridge
without crossing one twice [1].Since then, graph theory has
evolved. Currently, researchers study how networks arise in
real-world scenarios and analyze their properties. Graphs are used
to model relations in physical, social, biological, and information
systems. They are a unifying information abstraction to capture
various types of data. Graphs are currently widely used on the
internet to make sense of large datasets. In 2012, Google announced
the Knowledge Graph feature as an addition to their search engine
[3]. The idea was to build a massive graph of real world objects
and the connections between them. The knowledge graphs uses links
between documents on the web to understand their semantic context.
The graph contains millions of objects and billions of facts
connecting them, which it uses to understand the meaning of the
keywords entered for the search. Facebook also utilizes a
graph-based search engine. They combine big data from their
billions of users and external data into one big search engine
providing user-specific search results. The amount of data on the
internet continues to grow each day. Graphs are used to create
network connections to make it easier to understand the type of
information coming in, and the information that is already there on
the web. With the growth of data, especially on the internet,
graphs have become very large. They encapsulate millions of
networks and can contain billions of different connection types.
Visual representation of networks is an important way of describing
the data they represent. Visualization of graphs is done with graph
drawing techniques. A graph drawing is visual representation of the
vertices and edges it contains. The typical drawing of a graph
consists of a shaded circle depicting the vertices and line
segments depicting the edges, which connects related vertices.
Graph drawing makes the information in the graph legible and
navigable. The data within a network can be explored through
displaying the vertices and edges in various layouts with
attributing colors, size, and other properties. The display
highlights patterns, shows connections, and provides visual
information about a vertex. These factors are used to draw
conclusions about a certain dataset, in order to solve complex
problems. There are many graph drawing techniques that utilize
mathematical algorithms to space out the vertices and edges
accordingly. The arc diagram method (See Figure 1) evenly lays out
all the vertices on the same line, and the edges are drawn as
semicircles that go above or below the line to connect the
vertices. The layered drawing method (also shown in Figure 1) is
done by placing the vertices of directed graphs in horizontal rows,
with the edges directed downwards. These methods are ideal when
drawing displaying networks with a few vertices and edge
connections. However, they are not ideal for drawing larger
graphs.
Figure 1: Arc diagram (left)[5], and Layered method
(right)[4].The force-directed system (see example in Figure2) is a
physics-based method that calculates the attractive and repulsive
force between vertices, and moves the vertices along the direction
of the force [7]. The process is repeated multiple times until the
edges are close to equal lengths and there are as few crossing
edges as possible. This method is better suited for displaying
clustered graphs. The larger the graph however, the longer it takes
for the vertices to be repositioned.The spring electrical model
(Figure 3) is a type of force-directed algorithm, where the system
is visualized as electrically-charged vertices connected by springs
[7]. Springs are imagined to be placed between vertices that share
edges. The vertices are pulled together by the spring, while a
repulsive electrical force exists among all pairs of nodes. This
method is also repeated until the system reaches equilibrium
[7].
Figure 2: Force-directed graph drawing technique [6].
Figure 3: Spring-Electrical Models [7].The multilevel approach
to graph drawing aims to scale very large graphs to small ones.
This is done by taking the edge connections between multiple
vertices and separating them into layers. Figure 4 shows a
demonstration of the multilevel approach. First the original graph
is broken down into parts, and then new vertices are created
encapsulating the parts they represent. The new vertices can be
used to construct a smaller graph, which is then displayed. The new
smaller graph now allows easy visualization of the entire
graph.
Figure 4: Multilevel graph visualization approach.Visualization
of large networks, i.e., graphs with more than millions of
entities, is very challenging. This is due to the constraints of
screen displays and the limitations of current graph drawing
algorithms. To solve this problem, we implement a multilevel
approach, where the network is partitioned into smaller graphs that
hold different parts of the larger graph. Figure 4 illustrates the
multilevel approach to graph visualization.A network can be
partitioned in many different ways. It can be partitioned by
labeled categories in the dataset, using weights associated with
the vertices, or using a user-defined parameter present in the
data. For example, if a data consists of a list of interactions
between different animals, the data can be partitioned by grouping
together animals that belong to the same species. This way, we can
visualize a higher level view, where the types of species which
will be represented by new vertices that belong to a smaller graph.
We can then navigate to a specific species, to view an animal that
belongs to that category.There are many software tools currently
used to visualize small graphs. Gephi[2] is a windows application
that is an interactive visualization and exploration for networks
and complex systems. It can be used for social network analysis,
exploratory data analysis, and biological network analysis. It
provides tools for people to explore and understand graphs through
graphical visualization. Sigma Js [9], D3 Js, and Processing Js are
all browser-based JavaScript libraries that are dedicated to graph
drawing. JavaScript is a dynamic computer programming language used
to develop browser-based applications. These JavaScript libraries
can be used to simplify network visualization in a browser, and
allow application developers to integrate network exploration. We
chose the Sigma Js library because it is the most light-weight of
the three aforementioned libraries, and allows more user
interaction with the display. We are creating a web user interface
application, where users can upload a formatted large graph with
multiple connections. Sigma Js takes a specific input with
formatted labels of the vertices and edges with listed properties
such as color and size. We are developing a PHP script for
preprocessing, to reformat the users input to the format that Sigma
Js recognizes. The end goal of this project is to enable users to
upload their generated networks consisting of millions of vertices
and billions of edges, and visualize them in a multilevel
manner.MethodologyThe process begins with a formatted graph that
consists of multiple vertices and edges. The graph is split into
smaller ones according to their connections. This creates multiple
layers of the different parts of the graphs. The formatted
description of the vertices of the smaller graphs holds the
identifier of the lower level networks they represent. When the
user wants to navigate to a certain part of the graph, we use the
identifier to locate that part of the graph and magnify the display
unto it. The vertex zoom functionality will be created using
JavaScript. A mouse click functionality will also be implemented.
The user can use mouse to navigate through the network by zooming
onto specific layers of the graph or directly onto a vertex. The
Sigma Js library utilizes the force-directed method for drawing.
The specific plug-in of the library that uses the force-directed
method is called force atlas. When the users network is ready for
display, the force atlas plug-in is called to calculate the
position of the vertices for display. We display the graph using
force atlas which is part of the Sigma Js library. The algorithm
ensures that the vertices are well positioned so that all the edges
are equal length and that crossing edges are reduced as much as
possible.During the first four weeks of the eight week research
term, we worked on creating the user interface and building example
networks to display. The goal of the application is to allow users
to better visualize and interact with their large networks. The
user interface is designed to allow user to move vertices around
the screen, zoom in and out of specific items, and also display
textual information about a vertex. We also added a functionality
to change the color of the vertices. Most importantly, the user
interface comes with a search bar where user can search for
particular items. The user interface was designed using HTML, a
hypertext markup language used to create the graphical view of a
web page. The user interface consists of input boxes and button
selections with which the user can interact with a mouse and a
keyboard. Using JavaScript, We connected the users actions to
specific aspects of the network display, thereby creating the user
interactivity with it. We tested networks with different sizes,
small, large, and very large, to analyze the visualization,
interactivity and performance the displays. We found that Sigma Js
can processes network with up to 1000 vertices at a preferred
performance level, however, when the vertex count exceeds that
amount, performance begins to degrade. This finding is acceptable
for the multilevel approach we will used to solve out problem. If a
network with a million vertices is chosen for visualization, it can
be scaled down to a network with 1000 vertices, where each vertex
holds another network with 1000 vertices. The last four weeks of
the research term was dedicated to partitioning of the large graphs
into its smaller scaled representation. To test the multi-level
approach, we chose a network with 1000 vertices and partitioned it
into 10 different parts. We partitioned it numerically from 0 to
99, 100 to 199, and so on. First we used C++ to write the code for
breaking up the larger graph. We wrote the code following the
format of the dataset download from the large network databases.
The different partitions were written to new files and another file
was created with vertices linked to the partitions. The files are
JSON formats which Sigma Js recognized for created the display of
the vertices and edges.FindingsWe tested many different networks
from two main sources, KONECT - The Koblenz Network Collection
[10], and Stanford Large Network Dataset Collection [11]. We also
tested many randomly generated graphs with arbitrary sizes and
position. Here are some of the results from displaying the networks
using Sigma Js. Figure 5(a) shows a display of a randomly generated
graph using Sigma Js. Figure 5(b) shows the same graph display with
the force directed plug-in from Sigma Js applied to it. As
mentioned before, the vertices of the network are moves so that the
edges are close to equal length when the force directed algorithm
is applied. Figure 5: a) Random generated graph with Sigma Js.
Figure 5: b) Force directed plug-in applied.
Figures 6(a), 6(b), and 6(c) show examples of networks
visualized using Sigma Js. These networks were downloaded from
Stanford large network database. The format of the data set was
defined by the creators and therefore had to be converted to the
format required by Sigma Js. After careful conversion from the
Stanfords graph data format to Sigma Js JSON format, we displayed
the graph along with its properties. We also tested the effects of
the user interface dialog box on these networks. We found that the
vertices responded to the mouse and keyboard actions designed in
the program. The vertices move accordingly and changes colors upon
selection of the option to change a vertex color, through the user
interface. Figure 6(a) displays a network with 1000 vertices.
Figure 6(b) has a network with 5000 vertices, and Figure 6(c) has a
network with 10000 vertices. As we can see in the displays, the
network becomes clustered with the vertex points. The network
becomes very hard to visualize. It is not easy to interpret the
type of information being conveyed by the graph. It also takes very
long to navigate through the graph to find a specific item. Figure
6: a) 1000 vertices.Figure 6: b) 5000 vertices.
Figure 6: c) 10,000 vertices.The beginning face of the user
interface (Figure 7), directly allows the user to interact with the
network displayed. Interactivity also plays an important role in
the visualization of the networks, especially when implementing the
multilevel approach. The user interface makes the information with
the network easily accessible through a navigable display. The
display screen can be repositioned along with specific item to
visualize specific parts of the network or to maneuver unto certain
vertices. The user interface allows the user to search for specific
items with the data set, change the color of the vertices and
edges, and also change how the edges are drawn. The user can also
fit the network to the screen is they have navigated too far into
the display. We defined the number of iteration of the force
directed algorithm when the network is first loaded onto the web
browser screen. The user interface has an option for the user to
continue iterating through the algorithm to get a better display of
the network.
Figure 7: User interface dialog box.
For testing the multilevel approach to visualizing large
networks, we chose the network from Figure 6(a), to partition. Our
goal was to partition it into 10 parts and create a new display to
link the partitioned parts to the files they are stored in. When we
loaded Sigma Js, the new display is drawn unto the screen and also
follows the interactivity of the user interface. The vertices in
those displays can be changed with color, position and style. We
can now navigate to specific parts of the network we want to
display. We added two animation processes that display either the
part of the graph the user wants to navigate to, or a specific
item. If the user searches for an item in the search box, the first
animation zooms in to the part of the graph that item belongs to.
Then that part of the graph is loaded onto the screen. The second
animation zooms on to the item search and displays it along with
its attributes.Figure 8 shows the display of the scaled down
version of the test network (Figure 6(a)). The vertices are color
coated to match the colors of the part of the larger network it is
linked to. The graph is displayed with the force directed algorithm
applied to it. Figure 9 shows the different partitions that were
created. The display of each part consists of items that belong and
items from other parts that are linked. They follow the color coat.
If there is at least one connection between two parts of the graph,
an edge is drawn in Figure 8 to connect those two parts.Figure 8:
Scaled display result from partitioning network in Figure 6(a).
Figure 9: The different parts of the larger graph the vertices
in Figure 8 are linked to.
DiscussionThe results of the tests ran on network display using
Sigma Js confirmed our assumptions of the multilevel approach. When
the network is scaled down to a smaller size compared to its larger
representation, we are able to analyze the larger network very
easily. For our tests, we chose to use a network consisting of 1000
vertices. Given the results from these tests, we believe that
partitioning and visualizing networks with over a million vertices
will follow the same process and produce similar results. We have
set goals to test our partition algorithm on these much larger
networks. The next step is to create multiple stages when
partitioning the networks. For example, if a network has a million
vertices, we can partition it into 1000 different parts, each
consisting of 1000 vertices from the larger network. We can then
move to partition further by splitting the new display into 10
parts, the display that consists of 1000 vertices linked to the
1000 different parts of the larger graph. We encountered several
challenges while conducting this research. The primary concern when
designing the application was to create as much client-side
processes as possible and utilize minimal server-side processes. On
web browser applications, server-side processes are those handled
on the computer of the host, and client-side processes are handled
on the computer of the user accessing the application. We aim to
process the partitioning of the graph on the user end. However, in
the current approach, this is done on the server. We faced another
problem with the use of the force directed plug-in provided by
Sigma Js. We saw that in some displays, the algorithm ran
continuously without stopping. We saw some vertices constantly
moving, sometimes back and forth in the same position. We resolved
this by only iterating through the plug-in a certain amount of time
and then bringing it to a halt for the first display. As mentioned
earlier, we provided an option in the user interface for the user
to continue iterating through the plug-in if they wanted a better
display than the one provided.We are now developing this work
further to possibly include an improved user interface dialog box,
and parallel partitioning of larger networks. We will test the
different partitioning algorithms on networks consisting of
millions of vertices and billions of edges. The goal is to minimize
the time it takes to partition the items in the larger datasets and
to display the results. If the time to partition the dataset is
minimized, we will add a function in the user interface to allow
users to partition the network in real time. They will be able to
define how they want the data to be separated in accordance to the
format provided and see the end results of it on the display
screen. We will also design better iteration of the force directed
plug-in so that the first display of the network is desirable. We
believe that the findings of the research project will greatly
benefit those interested in analyzing and interpreting their large
datasets through visualization. The end result of the research will
be a website with user access to the application. The website will
allow anyone to upload their datasets and easily visualize and
interact with the information conveyed by the dataset. The website
will also support multiple formats of the dataset, and will provide
a guideline for the user to follow so the upload the right
formatted document.References1. Rhishikesh S. Fansalkar, Graph
Theory Origin and Seven Bridges of Knigsberg, New York University,
2007.2. The Gephi team, Gephi, http://gephi.github.io/, last
accessed August 2014.3. The Google Team, Inside Search,
http://www.google.com/insidesearch/features/search/knowledge.html,
last accessed August 2014.4. Graph layout,
http://goblin2.sourceforge.net/refman/pageGraphLayout.html, last
accessed August 2014.5. Jeffrey Heer, Michael Bostock, and Vadim
Ogievetsky, A Tour Through the Visualization Zoo,
http://homes.cs.washington.edu/~jheer/files/zoo/, last accessed
August 2014.6. John Howse, Peter Rodgers, and Gem Stapleton,
"VL/HCC Tutorial 2009: Automated Diagram Drawing",
http://www.eulerdiagrams.com/tutorial/AutomatedDiagramDrawing.html,
last accessed August 2014.7. Yifan Hu, Current and Future
Challenges in the Visualization of Large Networks, Encyclopedia of
Social Network Analysis and Mining, 2013.8. Yifan Hu, Efficient,
High-Quality Force-Directed Graph Drawing",The Mathematica Journal
10(1), 2006.9. Alexis Jacomy, Sigma js library,
http://sigmajs.org/, last accessed August 2014.10. Jrme Kunegis,
KONECT-The Koblenz Network Collection,
http://konect.uni-koblenz.de/networks/, last accessed August
2014.11. Jure Leskovec, Stanford Large Network Dataset Collection,
http://snap.stanford.edu/data/index.html, last accessed August
2014.