Postacademic course Big Data Postacademic course Big Data Joris Klerkx Research Expert, PhD. [email protected]@jkofmsk Erik Duval Professor [email protected]@erikduval Visualisatie - deel 2 Big Data - module 3 IVPV - Instituut voor Permanente Vorming 28-05-2015
174
Embed
Visualisation - techniques, interaction dynamics, big data
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Visualisatie - deel 2Big Data - module 3IVPV - Instituut voor Permanente Vorming28-05-2015
To research, design, create and evaluate useful tools that augment the human intellect
By ‘augmen+ng human intellect’ we mean increasing the capability of a man to approach a complex problem situa+on, to gain comprehension to suit his particular needs, and to derive solu+ons to problems (Douglas Engelbart, 1962).
Augment group - HCI research lab Dept. ComputerwetenschappenKU Leuvenhttps://augmenthuman.wordpress.com
Music
Technology Enhanced Learning
e-health
Research 2.0
HealthMedia
(Consumption)
Technology Enhanced Learning
Science 2.0
https://augmenthuman.wordpress.com/
http://eng.kuleuven.be/datavislab/
Recap
Humans have advanced perceptual abilities
Humans have little short term memory
Externalize data by using interactive, visual encodings
Our brains makes us extremely good at recognizing visual patterns
Our brains remember relatively little of what we perceive
Information Visualisation is the use of interactive visual representations to amplify cognition [Card. et. al]
Definition
Information visualization significantly improves insight generations and user productivity
Accelerates time to insight
Today
• Tour through the visualisation zoo
• Interactive Dynamics
• Is there too much data to visualize?
• Tools?
A tour through the visualization zoo
Heer, J., Bostock, M., & Ogievetsjy, V. (2010, May). A Tour through the VisualizaMon Zoo -‐ A survey of powerful visualisaMon techniques, from the obvious to the obscure. ACM Graphics , 8 (5), hTps://queue.acm.org/detail.cfm?id=1805128
The horizon graph is a technique for increasing the data density of a time-series view while preserving resolution.
Sizing the Horizon: The Effects of Chart Size and Layering on the Graphical Perception of Time Series Visualizations Jeffrey Heer, Nicholas Kong, Maneesh Agrawala ACM Human Factors in Computing Systems (CHI), 2009. pp. 1303 - 1312. Best Paper Award PDF (442K)
Statistical DistributionsReveal how a set of numbers is distributed and thus help an analyst better understand the statistical properties of the data
Histograms shows the prevalence of values grouped into bins
Histogram sliders
A stem-and-leaf plot bins numbers according to the first significant digit, and then stacks the values within each bin by the second significant digit.
MapsMostly based upon a cartographic projection: a mathematical function that maps the three-dimensional geometry of the Earth to a two-dimensional image
Other maps knowingly distort or abstract geographic features to tell a richer story or highlight specific data.
• Google Maps - Well rounded, established mapping solution, especially for non-developers to get a basic map on the web, along with all the powers that Google is (in)famous for.
• OpenLayers - For situations when other mapping frameworks can’t solve your spatial analysis problems.
• Leaflet - Currently, easily the best mapping framework for general mapping purposes, especially if you don’t need the additional services that MapBox or CartoDB provide.
• MapBox - Fast growing and market changing mapping solution for when you want more control over map styling or have a need for services that others are not providing, such as detailed satellite images, geocoding or directions.
• Unfolding - to create interactive maps and geovisualizations in Processing and Javahttp://www.toptal.com/web/the-roadmap-to-roadmaps-a-survey-of-the-best-online-mapping-tools
HierachiesMost data can be organised into natural hierarchies
Special visualization techniques exist to leverage hierarchical structure, allowing rapid multiscale inferences: micro-observations of individual elements and macro-observations of large groups
A node-link diagram with Reingold-Tilford algorithmht
The adjacency diagram is a space-filling variant of the node-link diagram; rather than drawing a link between parent and child in the hierarchy, nodes are drawn as solid areas (either arcs or bars), and their placement relative to adjacent nodes reveals their position in the hierarchy
T. Nagel, M. Maitan, E. Duval, A. Vande Moere, J. Klerkx, K. Kloeckl, and C. Ratti. Touching transport - a case study on visualizing metropolitan public transit on interactive tabletops. In AVI2014: 12th ACM International Working Conference on Advanced Visual Interfaces, pages 281–288, 2014.
Chord diagrams show directed relationships among a group of entities. Relationship can be quantitative or binary
http://bl.ocks.org/mbostock/4062006 Ye L, Amberg J, Chapman D et al. 2013 Fish gut microbiota analysis differentiates physiology and behavior of invasive Asian carp and indigenous American fish The ISME journal
Choices of representation (e.g., matrix- diagram) and interactive parameterization (e.g., default sort order) can be critical to unearthing data quality issues that can otherwise undermine accurate analysis.
Navigate to examine high-mede patterns & low-level detail
• Overview first, zoom & filter, then details-on-demand • Start with what you know, then grow
• Search, show context, expand on demand. • Focus + Context • Semantic Zooming • Magical lenses
$
are just by visually looking for the largest number of con-nected nodes. These larger clusters can be a first indicationof where high profile authors are located. However, in thisstate, neither the names of the authors nor the titles of thepapers are visible yet.
When the user wants to look into more details, he can zoomin to a specific part of the publication space. This is whatFigure 3 depicts. The author names become clearly visible,so that the user can identify a particular author. The usercan also click on paper nodes to get more information on thepaper. To make it easier to identify which authors are moreprolific in the field, the node size of the author is directlyproportional to his number of publications. In Figure 3, forexample, author Martin Wolpers has the largest number ofpublications and is a good candidate to use as a landmarkin the exploration process.
4. EVALUATIONIn this section, we describe how we have evaluated our firstiteration. Subsections 4.1 and 4.2 elaborate on the setupof the evaluation. Subsection 4.3 discusses the results ofthe evaluation and finally, in subsection 4.4, we draw ourconclusions from this evaluation.
4.1 DescriptionTo evaluate the application, we deployed our tabletop in themain hall of the ECTEL 2010 conference [42]. This roomwas the main location for co�ee breaks and figure 4 illus-trates the tabletop setup.The evaluation was conceived as a formative evaluation, inorder to gather feedback on the design and implementationof the application from real users in a real life scenario. Wefollowed the think aloud method, where the participantsverbally describe their thoughts during the evaluation. Inthis way, the participants reveal their view on the systemand possibly their misconceptions [28]. It started o� withgeneral questions (age, gender, profession, vision and leftor right handed) about the participants together with theirbackgrounds. The participants were introduced to the ap-plication by asking them if they could explain what theysaw. We also asked them one basic content-related ques-tion to get them started: “Find author x and find out howmany papers he wrote in ECTEL 2007”. When needed, theparticipants were given extra explanation about the appli-cation. After this, the evaluation continued with tasks theyhad to perform. For each task, we noted whether the tasksucceeded, how fluently the task was performed and whetherthe participant needed help or not. Finally, the participantswere asked for some general feedback and they filled in asmall questionnaire about usefulness and ease of use. Eachevaluation took between 20 and 30 minutes.
4.2 ParticipantsThere was a total of 11 participants, aged between 27 and 60.All participants were researchers, right handed and all butone had corrected vision. Only 3 of the participants con-sidered that they had a bit of experience with multitouchinteraction, the other 8 said they had a lot of experience.Regarding experience towards tabletops or multitouch wallshowever, only one person described himself as experienced.To find out how experienced the participants were in the
Figure 4: Setting of the evaluation.
Figure 5: An overview of the number of papers theparticipants have written
research area, they were asked about their years of experi-ence in the Technology Enhanced Learning (TEL) researcharea, the number of papers published and how many of thempublished in TEL. Half of the participants claimed to haveup to 3 years of experience and the other half claimed tohave many years of experience. On average, the partici-pants have published around 32 papers, from which 16 inthe TEL area. Three participants have published more than60 papers, from which 20 or more in the TEL area. Fig-ure 5 shows in detail the number of published papers perparticipant.
4.3 ResultsIn this section, we describe the results of the evaluation.These results are grouped in three parts. First, we reporton the tasks the participants had to perform, second, wesummarize the most important feedback, and third, we takea look at the results from the questionnaire.
4.3.1 Tasks
B. Vandeputte, E. Duval, and J. Klerkx. Interactive sensemaking in authorship networks. Proceedings of the ACM International Conference on Interactive Tabletops and Surfaces, ITS11, pp. 246–247, 2011.
Overview first, zoom and filter, details on demand
B. Vandeputte, E. Duval, and J. Klerkx. Applying design principles in authorship networks-a case study. In CHI EA’12: Proceedings of the 2012 ACM annual conference extended abstracts on Human Factors in Computing Systems, pages 741–744, 2012. (https://www.youtube.com/watch?v=R5CeTEejdBA)
Start with what you know, then grow Search, show context, expand on demand
C. Tominski, S. Gladisch, U. Kister, R. Dachselt, and H. Schumann. A Survey on Interactive Lenses in Visualization. EuroVis State-of-the-Art Reports, Swansea, UK, Eurographics Association, 2014.
Fisheye
C. Tominski, S. Gladisch, U. Kister, R. Dachselt, and H. Schumann. A Survey on Interactive Lenses in Visualization. EuroVis State-of-the-Art Reports, Swansea, UK, Eurographics Association, 2014.
Coordinate views for linked, multi-dimensional exploration
• Tiled approaches (different widgets) allows to see all information and selectors at once, minimizing distracting scrolling or window operations, while enabling analysts to concentrate on extracting and reporting insights.
• Layout organization tools will become decisive factors in creating effective user experience
Orchestrate attention and mentally integrate patterns among viewsHeer, J., & Shneiderman, B. (2012, February). InteracMve Dynamics for Visual Analysis. Magazine Queue -‐ Microprocessors , 10 (2), p. 30. hTp://queue.acm.org/detail.cfm?id=2146416
Heer, J. & Kandel, S. (2012), Interactive Analysis of Big Data, XRDS, 19 (1)
Cluttered displaysBinned density scatterplot
Hexagonal instead of rectangular
Heer, J. & Kandel, S. (2012), Interactive Analysis of Big Data, XRDS, 19 (1)
Perceptual scalability of a display should be limited by the chosen resolution of the data, not the numbers of records (Heer & Kandel, 2012)
Heer, J. & Kandel, S. (2012), Interactive Analysis of Big Data, XRDS, 19 (1)
Multi-variate data with 100s to 1000s of variables
“Wide” data
Visualizations might help reveal multidimensional patterns
Use the power of the machine to find a proxy in the data that predicts the selected variables
Depending on their specific questions, domain experts might select a subset of variables they are interested in
http://www.perceptualedge.com/blog/?p=2046
In this day of so-called Big Data, organizations are scrambling to implement new software and hardware to increase the amount of data that they collect and store. In so doing they are unwittingly making it harder to find the needles of useful information in the rapidly growing mounds of hay. If you don’t know how to differentiate signals from noise, adding more noise only makes matters worse.
Monday, June 1st, 2015
When we rely on data for decision making, how do we tell what qualifies as a signal and what is merely noise? In and of itself, data is neither. It is merely a collection of facts. When a fact is true, useful, and deserves a response, only then is it a signal. When it isn’t, it’s noise. It’s that simple (Few, http://www.perceptualedge.com/blog/?p=2046, 2015)
Overview First, Zoom and Filter, Details-on-Demand
Analyze First, Show the Important, Zoom and Analyse Further, Details-on-Demand
Ben Shneiderman
Daniel Keim
Interactive analysis tools can help quell “big data” by augmenting our ability to manipulate and reason about it (Heer & Kandel, 2012)
Heer, J. & Kandel, S. (2012), Interactive Analysis of Big Data, XRDS, 19 (1)
In the face of a data deluge, what remains relatively constant is our own cognitive ability to make sense of the data and reach reliable, informed decisions. Big data is of little help when decoupled from sound judgment.
J. Heer
Heer, J. & Kandel, S. (2012), Interactive Analysis of Big Data, XRDS, 19 (1)
Large data is a wild beast and you’d better treat it with the right tools. Visualization is a great tool to convey what automatic data analysis algorithms discover. And often it is a very challenging task! What the algorithms spit is exciting new complex data that requires creativity and knowledge as well.
▪ Interactive visualization of a million items J.D. Fekete and C. Plaisant.
▪ Random Sampling as a Clutter Reduction Technique to Facilitate Interactive Visualisation of Large Datasets G. Ellis (part of it in collab. with yours truly).
▪ A Sampling Approach to Deal with Cluttered Information Visualizations E. Bertini (my phd thesis).
▪ TreeJuxtaposer: Scalable Tree Comparison using Focus+Context with Guaranteed Visibility T. Munzner, F. Guimbretiere, S. Tasiran, L. Zhang, and Y. Zhou.
▪ Beyond visual acuity: the perceptual scalability of information visualizations for large displays B. Yost, Y. Haciahmetoglu, and C. North.
▪ Extreme visualization: squeezing a billion records into a million pixels B. Shneiderman.
▪ Measuring Data Abstraction Quality in Multiresolution Visualization Q. Cui, M. O. Ward, E. A. Rundensteiner, and J. Yang.
• imMens: Real-time Visual Querying of Big Data Zhicheng Liu, Biye Jiang, Jeffrey Heer