Context Visualisation Interaction Prototype Conclusion Dendrogramix: a Hybrid Tree-Matrix Visualization Technique to Support Interactive Exploration of Dendrograms Renaud Blanch (IIHM) <[email protected]> Rémy Dautriche (IIHM) <[email protected]> Gilles Bisson (AMA) <[email protected]> Université Grenoble Alpes, CNRS Laboratoire d’Informatique de Grenoble april 14, 2015 <PacificVis 2015> [email protected], [email protected], [email protected]UGA, CNRS Dendrogramix
29
Embed
a Hybrid Tree-Matrix Visualization Technique to Support ...iihm.imag.fr/blanch/projects/dendrogramix/2015-04-14-PacificVis.pdf · 4/14/2015 · 2.1 Hierarchical Clustering Visualization
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Figure 3: From similarity matrix to Dendrogramix: (a) graphic encoding of the similarity; (b) matrix reordered with an (optimal) order compatiblewith a traversal of the tree of clusters; (c) graphic encoding of clusters; (d) Dendrogramix compared to (e) classic dendrogram showing the samehierarchical clustering of the same data.
2 RELATED WORK
The clustering itself uses the well-known AHC method [16]. Den-drogramix is not the first attempt at showing together the clusteringresult and the details about items, but previous works in this areaall rely on the juxtaposition of two visualizations: the classical den-drogram and another visualization for the individual items. Den-drogramix is not a juxtaposition of two visualizations but rather amix of two visualizations. It is thus more close to recent works inthe InfoVis community about hybrid visualizations. The interactiontechniques provided by our Dendrogramix are also comparable torecent works focused on the interaction with visualizations, espe-cially interaction that exploit the structure of the data visualized toguide the user.
2.1 Hierarchical Clustering VisualizationSeveral visualization have been proposed to display the informationabout items together with the clustering result (see Wilkinson &Friendly survey [17] for a comprehensive overview of those tech-niques). The raw information on items can be shown with a heatmap [15] placed side by side with the dendrogram. Such visual-izations have been improved with interactions and overview+detailview to cope with the scaling issues encountered when dealing withlarge data sets [5].
Another way to display information about items is to show theirsimilarity matrix rather than their raw vectors, as proposed byGower & Digby [7]. It has the advantage of saving space whenthe item vectors have more dimensions than the number of items,as the similarity matrix is square and symmetric. It has also the ad-vantage of actually showing the similarity between items as seen bythe system, rather than letting the user reconstruct it from the visualsimilarity of the row vectors from the heat map which can be mis-leading. For those reasons we have also chosen to use the similaritymatrix as a starting point to provide information on items.
Our main contribution on the graphic representation is that theDendrogramix embeds the information about items into the visual-ization of the tree representing the clustering result.
2.2 Hybrid VisualizationsAnother way to cope with scaling issues is to use different repre-sentations for sub-parts of a data-set (e.g., node-link and treemapfor trees, as in elastic hierarchies [18], or node-link and adjacencymatrix for graphs, as in MatLink [8], NodeTrix [9], or TreeMa-trix [12]). The choice for a specific representation for a part ofthe data can be made interactively by the user, as in NodeTrix, ortake advantage of theoretical results about the space-efficiency ofvarious representations (e.g, for trees [10]) to automatically switchfrom a representation to another at various levels of aggregation.
Such hybrid visualizations have been used to display large den-drograms. In Stacked Trees [3], the classical dendrogram visual-ization is used for the largest clusters, but above an homogeneity
threshold the visualization switches to a stack of leaves rather thangoing on with the tree representation. This saves space at the ex-pense of loosing structural information.
Our Dendrogramix is also hybrid but in a different way: it com-bines the visualizations of two different data sets that do not havethe same structure: the input of the AHC —a matrix storing theitems similarity—, and the output of the AHC —a binary tree de-picting the clusters and their homogeneity.
2.3 Interaction TechniquesThe interaction techniques designed to explore the Dendrogramixfollow the principles of direct manipulation [14], especially the factthat they should be rapid, reversible, and incremental. The mainchallenge here is that the data structures manipulated are inherentlydiscrete. Making the interaction continuous requires the use of an-imated transitions controlled either by the system or by the userto switch between coherent states of the visualization. ZoomableTreemaps [4] is a good example of previous work that providesa continuous interaction with the discrete data structure of a treevisualized as a treemap. We borrowed from Zoomable Treemapsthe idea of using a crossing-based interaction [1] to perform an ad-vanced selection (of two clusters simultaneously in our case; of anarbitrary internal node of the treemap in their case).
Most of the interaction techniques proposed by Dendrogramixfit in the framework of interactions with hierarchical aggregationproposed by Elmqvist & Fekete [6]. However, they do not considerinteractions altering the layout (such as node reordering) in theirframework.
3 DENDROGRAMIX
A Dendrogramix is an hybrid visualization that mixes the similar-ity matrix of items with the binary tree of clusters resulting fromtheir AHC. We first describe how the visualization is built and whatkind of observations can be made using it. We then describe theinteraction techniques that allow its exploration.
3.1 VisualizationFigure 2.a shows the data set —five points of the plan— used belowto illustrate how a Dendrogramix is built. Figure 2.b shows the re-sult of an AHC of this data set using the Euclidean distance to mea-sure the similarity between two points, and the minimum distancebetween the elements of two clusters to generalize this similaritymeasure to clusters (i.e., the single linkage method).
3.1.1 ConstructionUsing the tree of clusters and the similarity matrix as inputs, theDendrogramix is built:
1. by encoding the similarity matrix using the size of the circulardots to denote similarity (large dots means similar items) —Figure 3.a;
fils gauche (resp. droit) de sa classe parente, son glisser vers la droite (resp. gauche) est simple,il correspond à une permutation avec son frère. Dans le cas contraire, il faut remonter dansl’arbre pour trouver la première classe ancêtre pour laquelle la classe courante est incluse dansle fils gauche (resp. droit) et permuter alors les fils de celle-ci.
(a) (b) (c) (d)
FIG. 4 – Interaction de glisser-déposer pour permuter des classes, (a) état initial, (b) et (c)états intermédiaires, (d) état final après permutation.
Durant l’interaction (Figure 4 (b) et (c)), les points donnant l’information de similaritéindividuelle ne sont pas affichés pour des raisons de performance. La position de la classemanipulée est contrainte pour rester dans son parent et ne pas empiéter sur son frère. Sondéplacement prend donc la forme d’un saut : elle remonte vers la racine de son parent ; quandelle l’atteint, son frère passe au-dessous ; et enfin, elle peut redescendre de l’autre côté. Sil’utilisateur interrompt son glisser-déposer avant son terme, la fin du déplacement est animée :la classe revient à son point de départ si elle n’était pas encore passée par dessus son frère(Figure 4 (b)) ; sinon (Figure 4 (c)), elle poursuit son chemin de l’autre côté.
2.2.3 Agrégation de classes
Les classes peuvent être annotées : l’appui sur la touche entrée passe en mode édition dela classe courante, ce qui permet d’entrer ou d’éditer une étiquette pour cette classe. Cetteétiquette est affichée à proximité de la racine de la classe (Figure 5 (a), la classe courante de 5individus a été étiquetée “MSR”). Lorsqu’on clique sur une classe, elle est repliée et l’étiquetteest alors utilisée pour la montrer parmi les individus (Figure 5 (b)). Les lignes et colonnes
(a) (b)
FIG. 5 – Agrégation de classes : (a) étiquettage d’une classe ; et (b) repliement de cette classe.
data set: 189 permanent researchers of the lab,characterized by the 2688 co-authorsof their 3245 publications over the last 4 years.participants: 6 senior researchers from the lab (deputydirector or group heads).two visualizations: classical dendrogramvs. Dendrogramix.
• North, Saraiya, Duca. A comparison of benchmark task and insight evaluationmethods for information visualization.Information Visualization 10(3), 162–181. 2011.