Intro to Data Vis for the Humanities nov 2013

Post on 11-Sep-2014

2350 Views

Category:

Education

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

This is an extensive but high level look at principles, methods, and tools looking to a couple case studies around the use of data visualisation for humanities research.

Transcript

Data Visualisation in the Digital Arts and

HumanitiesTools, Methods & Techniques toPut Your Research Data to Work

!

Shawn DayQueen’s University Library

Objective‣ To appreciate the rich variety of techniques and tools

available to digital humanities scholars for data visualisation and analysis. This workshop will provide an introduction to the varied use of data visualisation in the humanities through examples, case studies and hopefully inspire you to some hands-on fun.

The beginningof a conversation …

Upcoming Seminars and Workshops‣ 18 November - A Survey of Digital Humanities ‣ 2 December - Engaging Your Auduence with Your Research Data

(Exhibit) ‣ 9 December - Telling Stories with Data – Collections Visualisation

for Arts and Humanities Scholars (OMEKA) ‣ January - Digital Project Management ‣ Februrary - Hands On Workshop – Data Visualisation for

Presentation ‣ February - Social Scholarship – Tools for Collaborative Research ‣ March - Data Visualisation for Textual and Spatial Analysis

!

‣ More to come: http://qubdh.co.uk

Agenda‣ Introduction ‣ What is Data Visualisation ‣ Why Visualise Data? ‣ Case Studies ‣ Things to Visualise ‣ Ways to Visualise ‣ Tools for Visualisation

Breakpoint‣ One of the keys to good visualization is understanding

what your immediate (and longer term) goals are. !

‣ Are you visualizing data to understand what’s in it, or are you trying to communicate meaning to others?

!

‣ You - Visualisation for Data Analysis !

‣ Share with Others - Visualisation for Presentation

Why Visualise? The Basics‣ Open Up Large Datasets ‣ Increase Density of Observable Data ‣ Reduce Complexity ‣ Aestheticise Data ‣ Illustrate an Interpretation ‣ Make an Argument

Why Visualise? The Psychology and Physiology‣ Bypass language centres to tap directly into the visual

cortex; ‣ Leverage ability to recognise patterns - what they call

visual sense-making; ‣ Powerful graphics engines now allow for live data

processing and sophisticated animations and interactive research environments.

Why Visualise? From a Data Perspective‣ Can link different formats ‣ Can share more easily with others ‣ Can see new meanings and connections ‣ Sort and re-organize in automated fashion ‣ Manage larger amounts of information ‣ Visualise your results

Why Visualise? For Humanities Research ‣ Work with new data to create new knowledge

‣ Explore data to discover things that used to be unknown, unknowable or impractical to know

‣ Take a new perspective on the familiar to reveal previously hidden insights

Data Visualisation hasdefinitely hit the big-time‣ Guardian Awards ‣ New York Times ‣ Why?

Tourists vs Locals, Eric Fischer, 2010 - Flickr

Visualise New Information

Red - Tourists Blue - Locals Yellow- NA

Areas of Interest

Crowdsourcing

Visualising New Information

Flickr Flow, Martin Wattenberg and

The Familiar through New Eyes

The London Times Atlas

Joanna Kamradt and Christian Tate

How Could You Use Data Analysis?‣ “In the Lab” - for your own analysis ‣ Online as part of collabourative groups ‣ Through dissemination for extension of own work -

crowdsourcing ‣ Others?

Case Study: The Time Strip

Visualisation Objective‣ Exploring the ‘ordinary’ lives of rural pioneers/farmers in

nineteenth century Ontario

Canada

Ontario

South Western Ontario

Farm Journal Raw Materials‣ 100s of pages ‣ Varying hands ‣ Varying quality ‣ Columns ‣ No Context

William Sunter Farm Diary, 1858

Medical Diary by BlueChillies

Example: Medical Diary

History flow by Martin Wattenberg and Fernanda Viegas

Example: History Flow

Mechanics of the Process‣ Generate word frequency (Voyant, TAPoR) ‣ Isolate known farm activities (NLP - LanguageWare) ‣ Collocate to link activity references to time, duration, and

resources (Voyant)

The Result/ New Patterns

The Result/ New Patterns‣ Less time haying ‣ The impact of technology ‣ More tasks faster

How Else Could this be done?

What is the Value of this Visualisation?‣ Easier to compare over intervals ‣ Multiple vectors with greater granularity in a compressed

space ‣ The challenge is to find rich enough source materials to

yield substantive datasets

Case Study: The Tree Map

Example: Newsmap

http://newsmap.jp/

Example: Panopticon

Ben Scheiderman and Hard Drive Space

Example: Bachelor’s Degrees 2011

Ben Schmidt, 2013 http://benschmidt.org/Degrees/2011Overview/

Case Study: Occupations of Politicians‣ What are we studying? • Self-declared occupations of politicians

‣ Why? • What bias might they bring to their job?

‣ How? • Visualising past occupation and mapping to political

platform of party affiliated with

Occupations of MPs in the 2nd Canadian Parliament

Occupations of MPs in the 37th Canadian Parliament

Occupations of TDs in the 30th Dáil Éireann

The Result/ New Patterns‣ The emergence of the professional politician with no

private sector experience ‣ Occupational continuity across changes in governing

party ‣ http://dev.dho.ie/~sday/dail/index.html

How Else Could this be Done?

How Else Could this be Done?

The Value of Data Vis for Analysis‣ New ways of presenting allow new ways of seeing ‣ Hidden patterns become evident ‣ Suggest other hypotheses to test for ‣ Good research raises more questions than answers

People demanding more…‣ Interactivity ‣ Involvement ‣ Action ‣ Participation ‣ Web 2.0 … 3.0 ….

General Steps in Data Vis for DH1.Discovery / Acquisition 2.Cleaning / ‘Munging’ 3.Analysis / Exploratory Vis 4.Presentation

Types of Data to Visualise‣ Audio Data ‣ Categorical Data ‣ Cartographic Data ‣ Collections ‣ Image Data • Still • Moving ‣ Metadata ‣ Multimedia Data

‣ Network Data • Social • Other ‣ Numerical Data ‣ Temporal Data ‣ Textual Data • Narrative • Qualitative ‣ ????

Audio Data‣ Spectrogram ‣ Wave forms ‣ Notes ‣ Frequency ‣ Beats

Visualisation of "Canada is Really Big" by The Arrogant Worms” http://www.sonicvisualiser.org/

Audio Data‣ What does sound look like?

Audio Data: The Shape of Song‣ http://www.turbulence.org/Works/song/index.html ‣ Measuring Musical Patterns using Translucent Arcs ‣ Repetition

Phillip Glass, Candyman 2

Madonna, Like a Prayer

Audio Data: IBM ‘Glass Engine’

http://www.philipglass.com/glassengine/

Categorical Data‣ Data is grouped into categories based on a qualitative

trait, ‣ The resulting data represents the labels of these groups. ‣ Nominal, Ordinal

and/or Binary

Cartographic Data‣ Communicate spatial information

Cartographic Visualisation

Cartographic Visualisation

http://maps.stamen.com/watercolor/#13/53.3355/-6.2181

Digital Collections‣ Collections of data, images, movies, sound … etc • Visualise the

object incontext aspart of collection

• Representthe structureof thecollection

Google Art Project: Visualising Museum Collections

Digital Collection Visualisation

Digital Still Image Data‣ Colour ‣ Texture ‣ Shape ‣ Content ‣ Format ‣ Metadata ‣ Luminosity/Hue/

Saturation/Range

Digital Moving Image Data‣ Adding Data on: • Narrative • Length • Frame rate • Sound/Image • Key Frames • Storyboard

Metadata

Numerical/Quantitative Data‣ Does anyone really need me to tell them about this? • Analysed using statistical methods • displayed using tables, charts, histograms and graphs…

Social Network Data‣ Nodes and Edges ‣ Representing relations and quantifying and qualifyign the

same between objects

Temporal Data‣ Show changes over time ‣ Show temporal clusters

Xerox Parc, Stuart K.Card, George G. Robertson, Jock D. Mackinlay

http://www.itc.nl/personal/kraak/

Different Ways of Seeing Time

http://www.edwardtufte.com/tufte/posters

Combining Time and Space

Quantitative Textual Visualisation

Textual - Qualitative‣ Textual attributes graphically represented • Frequency • Collocation • Adjacency

Textual - Narrative

Time, Space, Narrative: MythEngine

http://www.bbc.co.uk/blogs/researchanddevelopment/2010/03/the-mythology-engine-represent.shtml

General Steps in Data Vis for DH1. Discovery / Acquisition 2. Cleaning / ‘Munging’ 3. Analysis / Exploratory Vis 4. Presentation

Step 1 Discovery / Acquisition

An Iterative Process

ACQUIRE w PARSE w FILTER w MINE w REPRESENT w REFINE w INTERACT

Visualizing What?‣ Basic types of content that we are used to deal with: • Text • Numbers • Image • Video

‣ Other, more “complex” stuff: • Relations, connections, links - a genealogy • Time and space coords - the path of migratory birds • Animations – a piece of courseware • 3D models – the plan of your house

Acquisition: Public Data Sources‣ CSO: Data Formats ‣ The Data Hub: Linked Data

Acquisition: Public Data Sources

Cleaning / Munging (Normalisation, Format Conversion)‣ Tools: • Data Wrangler • Google Refine • Mr. Data Converter

!

‣ Data Wrangler • Does simple, split, clear, fold/unfold transforms on data • See example --> Data and Script

!

‣ Google Refine • Works with larger datasets

Open Data/Linked Data

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Munging Tool: Data Wrangler‣ http://vis.stanford.edu/wrangler/app/

Cleaning Exercise

Alternate: Google Refine‣ http://code.google.com/p/google-refine/

Alternate: Mr Data Converter‣ http://shancarter.com/data_converter/

Now You’ve Got Data ... ‣ What’s Next? ‣ Data Visualisation in the Analysis Process ‣ Data Visualisation for Presentation

General Steps in Data Vis for DH1. Discovery / Acquisition 2. Cleaning / ‘Munging’ 3. Analysis / Exploratory Vis 4. Presentation

Breakpoint‣ Are you visualizing data to understand what’s in it, or are

you trying to communicate meaning to others? ‣ You - Visualisation for Data Analysis ‣ Share with Others - Visualisation for Presentation

Google NGram Viewers‣ Examine word frequency in digitised books ‣ Currently about 4% of books ever published ‣ In English, Chinese, French, German, Hebrew, Russian, and

Spanish ‣ Changes in word usage ‣ Trends

Google NGram Viewer

http://books.google.com/ngrams/graph

The Value of Data Vis for Analysis‣ New ways of presenting allow new ways of seeing ‣ Hidden patterns become evident ‣ Suggest other hypotheses to test for ‣ Good research raises more questions than answers

Data Analysis Principles1. Process is a Way of Thinking, not a Substitute for

Thinking 2. Data needs to be considered and reported in Context 3. Look Before you Leap - Get to Know Your Data 4. Question Everything - CollectionProcess, Bias, etc. 5. Do a Gut Check 6. Coincidence is Not the Same as Causality 7. Just Because Data Exists Doesn’t Mean its Relevant

Fern Halper - Seven Guiding Principles

Analysis / Exploratory Visualisation

Orange

http://orange.biolab.si/

Text Analysis: Voyant

http://voyeurtools.org

Gephi: Analysis and Discovery of Networks

Where to Keep up with the Community‣ Highbrow: http://osc.hul.harvard.edu/highbrow

!

!

!

‣ http://chronicle.com/blogs/profhacker ‣ Flowing Data: http://flowingdata.com ‣ Perceptual Edge: http://www.perceptualedge.com ‣ Info is Beautiful: http://www.informationisbeautiful.net ‣ Visualising Data: http://www.visualisingdata.com ‣ Infosthetics: http://infosthetics.com ‣ Datavisualisation.ch: http://datavisualization.ch ‣ Dig Hum Specialist: https://dhs.stanford.edu/the-digital-

humanities-as

New Perspectiveson Old Data

Presenting Your Data Visually

Objectives‣ Consider best practices in sharing research findings

using visualisation tools; ‣ Identify and judge between publicly available tools to

create and deploy humanities visualisation research products;

‣ Consider data visualisation as part of a larger research discussion.

General Steps in Data Vis for DH‣ Discovery / Acquisition ‣ Cleaning / ‘Munging’ ‣ Analysis / Exploratory Vis ‣ Presentation

Academic Visualisation?There’s lots of published papers out there

http://www.autodeskresearch.com/projects/citeology

The Life on An Idea through Citations

Data Visualisation Lessons from Tufte‣ Show the Data ‣ Provoke Thought about the Subject at Hand ‣ Avoid Distorting the Data ‣ Present Many Numbers in a Small Space ‣ Make Large Datasets Coherent ‣ Encourage Eyes to Compare Data ‣ Reveal Data at Several Levels of Detail ‣ Serve a Reasonably Clear Purpose ‣ Be Closely Integrated with Statistical and Verbal

Descriptions of the Dataset

What Visual Techniques Exist?‣ Connecting your data with the right visualisation ‣ What is your message? ‣ How do we know what we might use? ‣ Start with your Exploratory/Research/Analytical

Environment ‣ How do visuals fit into your narrative?

Connecting your data with the right visualisation

What Visual Techniques Exist?

Visual Everything

Connecting your data with the right visualisation

Structured Data Presentation Tools(a tiny subset)‣ Webservices • Temporal: TimeFlow • Google Fusion Tables • Textual, Spatial and

Numeric: Many Eyes • Temporal: Dipity • Infographics: Visual.ly

!

!

!

!

‣ Frameworks • GraphViz • Gephi • Prefuse • D3 • Processing • Exhibit (Exercise)

TimeFlow‣ Journalism ‣ Getting the flow

of events and factsstraight

!

!

!

!

‣ http://flowingmedia.com/timeflow.html ‣ Great for historians

Google FusionTables‣ Initially Exploratory

and useful for ‘Munging’ ‣ Allows for Embedding ‣ And for User Interaction ‣ Transparency ‣ Experimental (Good)

!

!

‣ http://www.google.com/fusiontables/Home/

Many Eyes‣ http://www-958.ibm.com ‣ Rich, Varied and Accessible ‣ Free Rapid Prototyping

Visual.ly

Visual.ly‣ Well crafted Infographics gaining credibility ‣ The new poster presentation ‣ Data-driven narrative in words and pictures ‣ Visual.ly currently driven by social media

Dipity

Frameworks and Languages‣ GraphViz ‣ R Programming Language ‣ JIT (JavaScript Infovis

Toolkit) ‣ Protovis ‣ D3 ‣ Processing ‣ Tableau ‣ Prefuse ‣ Gephi

‣ WEAVE (http://www.oicweave.org/)

!

‣ Exhibit (Exercise)

Graphviz‣ An Open Source Framework ‣ Mature (1988) ‣ AT&T Labs ‣ Used as a basis for subsequent ‣ A great prototyping and starting point

!

!

!

!

!

‣ http://www.graphviz.org/

R Programming Language‣ Geared towards statistical analysis ‣ More recently has had some powerful graphics

frameworks added ‣ Open Source ‣ Typically Command Line but a variety of GUI editors

available ‣ > Jeff Rydberg-Cox: R for the Digital Humanities

JavaScript InfoVis Toolkit (JIT)‣ JIT Demos (http://thejit.org/demos/) ‣ The JavaScript InfoVis Toolkit is a complete set of tools to

create Interactive Data Visualizations for the Web. It includes JSON loading, animation, 2D point and graph classes and some predefined tree visualization methods.

‣ Smaller datasets in a clean form ‣ Related and Aggregated/Categorised Data

JavaScript InfoVis Toolkit (JIT)

JavaScript InfoVis Toolkit (JIT)

ProtoVis‣ Protovis is a visualization toolkit for JavaScript using SVG.

It takes a graphical approach to data visualization, composing custom views of data with simple graphical primitives like bars and dots. These primitives are called marks, and each mark encodes data visually through dynamic properties such as color and position.

‣ Jerome Cukier: ProtoVis Tutorial ‣ Development shifted to D3 ‣ ProtoVis still very accessible and usable

http://mbostock.github.com/protovis/ex/crimea-rose.html

ProtoVis

http://mbostock.github.com/protovis/ex/napoleon.html

ProtoVis

D3‣ D3 allows you to bind arbitrary data to a Document

Object Model (DOM), and then apply data-driven transformations to the document. As a trivial example, you can use D3 to generate a basic HTML table from an array of numbers. Or, use the same data to create an interactive SVG bar chart with smooth transitions and interaction.

‣ Open Source

http://www.visualizing.org/full-screen/16266

D3

Processing‣ Now we are getting serious... ‣ Ben Fry ‣ Like R has a serious statistical bent ‣ Has a client and development environment, but deploys

easily to the web using processing.js ‣ Large and VL datasets ‣ Good with related data ‣ Serious support for aesthetics ‣ Modelling Environment ‣ http://processing.org/ ‣ http://www.openprocessing.org/

Processing.js

http://nytlabs.com/projects/cascade.html

Processing.JS

Tableau‣ Commercial ‣ Offers a Free Public Application ‣ Encourages sharing and focusses on building a narrative

around visualisation of your research data ‣ Education and Non-Commercial Licenses available ‣ Mature and evolving rapidly to demonstrate the newest

and most exciting visualisation types

http://www.tableausoftware.com/public

Tableau

Prefuse‣ flare.prefuse ‣ Flash-based ‣ Great transitions and very approachable ‣ Beware of Datalocking ‣ http://flare.prefuse.org/demo

Gephi‣ Open Source ‣ Mapping and Visualising Relationships and Networks ‣ An outstanding Visual Development Environment ‣ Multiplatform ‣ Extensible!! ‣ https://gephi.org/

Gephi

Gephi

Upcoming Workshops‣ 18 November - A Survey of Digital Humanities ‣ 2 December - Engaging Your Auduence with Your Research Data

(Exhibit) ‣ 9 December - Telling Stories with Data – Collections Visualisation

for Arts and Humanities Scholars (OMEKA) ‣ January - Digital Project Management ‣ Februrary - Hands On Workshop – Data Visualisation for

Presentation ‣ February - Social Scholarship – Tools for Collaborative Research ‣ March - Data Visualisation for Textual and Spatial Analysis

!

‣ More to come: http://qubdh.co.uk

Thank YouShawn Day - s.day@qub.co.uk - @iridium

!

The Library/Institute for Collaborative Research in the Humanities18 University Square

Ground Floor http://qubdh.co.uk

top related