Defining a Digital Storytelling Discipline: Learning, Skills, and Knowledge John Wihbey Northeastern University @wihbey
Defining a Digital Storytelling Discipline: Learning, Skills, and Knowledge
John Wihbey Northeastern University
@wihbey
Case study: Northeastern
University undergrads working on Boston Police Department data, “as is” - in a general digital skills
course
Murder data from the 1960sCity of Boston - Homicide data obtained through public
records request
Text text text
Murder data from the 1990sCity of Boston - Homicide data obtained through public
records request
Murder data from the 2000sCity of Boston - Homicide data obtained through public records
request
(http://the-accidental-housewife.blogspot.com/)
29 problems. 1 assignment.
List of problems/errors in structure and format of homicide data
1. Inconsistencies in case column, e.g. “01/06” vs “ ’09/06 ”2. No indication of meaning of red text3. No key for case column IDs4. Different text formats/styles for entire rows and cells5. Inconsistent descriptions of intersection addresses, e.g. “Washington @ Cedar St” vs “Willowood & Woodrow Ave” vs “Shawmut Ave. / Dwight”6. No key for weapon codes7. Race and gender are collapsed into single column8. No codes for race/gender (race: “W”, “B”, gender: “M”, “F”, “H”)9. Some R/G codes are “W/H/M”, “B/N/H” making it impossible to systematically split columns into 2 using the only delimiting character (/)10. Some R/G codes have NO delimiter (e.g., 2009 sheet), so cannot split at all11. Data for 2007 and later have two additional columns not in 2006: “defendant” and “DOA” (no indication of what DOA means)12. Some rows have merged cells13. Some merged cells have multiple values14. Missing data/empty cells – what do these mean?
List of problems/errors in structure and format of homicide data (cont.)
15. Location data is incomplete – no zip code information and Boston assumed as city (except in cases where “Dor” is appended at end of address)16. Only first couple of sheets have column header information; column headers have to be assumed for remaining ones to follow those with labeled headers17. Mysterious unexplained extra characters in date columns (e.g., (w) and xxx)18. Inconsistent syntax for times: 12:00am, 7:10pm, 02:16am, 2:56hrs, 15oo hrs (Letter “o” instead of number 0), 1:49 AM, 7:24 PM, 21:25, 21:32 PM, 12:39P.M., 2:22p.m., 1:49:pm, 12;24AM, :14 am19. Inconsistent syntax for dates: “07/24/2006” vs “2006/7/24” vs “6/31/64” vs “08/31/06”20. Inconsistent syntax for age: “1’7”21. Sheets for 2012/13/14 have new columns not in previous sheets22. Motive/Relation columns look identical but are not both present in all sheets, impossible to know which labels are which in those sheets without column headers23. Simple spelling errors: “Tauma”24. Inconsistent coding: Unk, UNK, unk25. Unexplained “DV” column that only appears in 201326. No explanation for meaning of row breaks – are these separating data rows into groups of some sort? Are these data that one existed but were removed?27. Multiple columns with same (non-unique) headers – “R/G”, “Age”, “DOB” for both VICTIM and DEFENDANTS28. Inconsistent district labels and squadron personnel names29. For cells with multiple data/names in cell merged column, have to assume respective values in adjacent cells are provided in same order
Existential experience of: #datafail & #GIGO risk
Good student requests for clarification
Many noble student attempts at cleaning, analysis, exploration,
viz:
Data viz using Plot.ly
Data viz using Carto
Google Maps
Fun with line graphs - an attempt to look at time-of-day patterns
Experiments in viz for exploratory purposes
D'où Venons Nous / Que Sommes Nous / Où Allons Nous - Paul Gauguin, 1897
2000
2005
2016
Wikimedia
1774
(Nielsen 2006 via www.nngroup.com)
G. Chaucer, The Canterbury Tales (courtesy: library.arizona.edu)
1400s
https://projects.propublica.org/docdollars/
Fatal Force https://www.washingtonpost.com/graphics/national/police-shootings/
http://www.nytimes.com/interactive/2012/01/15/business/one-percent-map.html
http://www.npr.org/news/graphics/2011/10/toxic-air/#4.00/39.00/-84.00
https://offshoreleaks.icij.org/#_ga=1.76851094.2020983486.1475355003
http://projects.latimes.com/value-added/
Labnol.org
Pew Research Center
Polarized Crowd: Two large dense groups with little interconnection
Pew Research Center
Tight Crowd: Highly interconnected group with few isolated participants
Pew Research Center
Brand clusters: Products, services, celebrities discussed by disparate persons
Pew Research Center
Community clusters: Popular topics attracting multiple smaller groups
Pew Research Center
Broadcast networks: Media-centric, with audience proliferating information
Pew Research Center
Support network: Customer complaints, with hub-and-spoke dynamics
Six degrees - Wikimedia
Facebook friends network
Nebraska local politicians; network graph - Matt Waite
http://www.poynter.org/2013/how-to-visually-explore-local-politics-with-network-graphs/218543/
Chicago community - homicide network (Andrew Papachristos et al.)
NYTimes.com
Gary King, et al, IQSS, Harvard
Global supply chain, Sourcemap.com
Opte Project
D'où Venons Nous / Que Sommes Nous / Où Allons Nous - Paul Gauguin, 1897
John Wihbey Northeastern University