Figures of the Many - Quantitative Concepts for Qualitative Thinking
Post on 20-Nov-2014
2441 Views
Preview:
DESCRIPTION
Transcript
Figures of the ManyQuantitative Concepts for Qualitative Thinking
Bernhard RiederUniversiteit van AmsterdamMediastudies Department
Context
Terms like "big data", "computational social science", "digital humanities", "digital methods", etc. are receiving a lot of attention.
They point to a set of practices for knowledge production: data analysis, visualization, modeling, etc.
Instead of a totalizing search for a "logic" of data analysis, we could inquire into the vocabulary of analytical gestures that constitute the practice of data analysis.
A twofold approach to methods:☉ Engagement, development, application => digital
methods
☉ Conceptual, historical, and political analysis and critique => software studies
This presentation
How do we talk about data? How do we analyze them? What is our frame of thought? How do we go further in terms of imagination, expressivity?
☉ 1 / Confronting "the many"☉ 2 / Two kinds of mathematics
☉ Objects and their properties => Statistics
☉ Objects and their relations => Graph theory
Engage the theory of knowledge (epistemology) mobilized in data analysis, but through the actual techniques and not generalizing concepts.
What styles of reasoning?
Hacking (1991) building the concept of "style of reasoning" on A. C. Crombie’s (1994) "styles of scientific thinking":
☉ postulation and deduction
☉ experiment and empirical research
☉ reasoning by analogy
☉ ordering by comparison and taxonomy
☉ statistical analysis of regularities and probabilities
☉ genetic development
What kind of reasoning are we mobilizing in data analysis?
Is the history of styles of reasoning simply intellectual progress, or adaptation to a changing world, or co-constitutive of that world?
What is our world like?
"It is hard to believe that we still have to absorb the same types of actors, the same number of entities, the same profiles of beings, and the same modes of existence into the same types of collectives as Comte, Durkheim, Weber, or Parson [sic], especially after science and technology have massively multiplied the participants to be cooked in the melting pot." (Latour 2005, 260)
The proliferation of actors and facilitation of transversal connectivity have lead to large and complex forms of socio-technical grouping and structuring.
Forms of organization take the shape of (multi-sided) markets based around technological platforms that facilitate transactions.
Social media use simple but flexible grammars of connectivity (combination of point to point and list forms), exchange, and aggregation that accommodate various practices and levels of scale.
The diversity of practices, contents, geographies, topologies, intensities, motivations, etc. makes it hard to generalize and theorize dynamics of use.
1 / The many
Platforms like Twitter boost opportunities for connectivity between various types of actors.
At the same time, they produce detailed data traces that are highly centralized and searchable.
Quality / quantity
"One of my favorite fantasies is a dialogue between Mills and Lazarsfeld in which the former reads to the latter the first sentence of The Sociological Imagination: 'Nowadays men often feel that their private lives are a series of traps. ' Lazarsfeld immediately replies: 'How many men, which men, how long have they felt this way, which aspects of their private lives bother them, do their public lives bother them, when do they feel free rather than trapped, what kinds of traps do they experience, etc., etc., etc.' If Mills succumbed, the two of them would have to apply to the National Institute of Mental Health for a million-dollar grant to check out and elaborate that first sentence. They would need a staff of hundreds, and when finished they would have written Americans View Their Mental Health rather than The Sociological Imagination, provided that they finished at all, and provided that either of them cared enough at the end to bother writing anything." (Maurice Stein, cit. in Gitlin 1978)
Theory vs. empiricism, macro vs. micro, qualitative vs. quantitative, inductive vs. deductive, associative vs. formalistic, etc.
The promise of data analysis tools, applied to exhaustive (and cheap) data, is to bridge the gap, to allow zooming, "quali-quanti" (Latour 2010).
“facts and statistics collected together for reference or analysis. See also datum.
- Computing: the quantities, characters, or symbols on which operations are performed by a computer, being stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.
- Philosophy: things known or assumed as facts, making the basis of reasoning or calculation.” (Oxford American Dictionary)
Define: data
Reasoning (OAD): "think rationally", "use one's mind", "calculate", "make sense of", "come to the conclusion", "judge", "persuade", etc.
Reasoning as "giving reasons" – what counts as a good reason? What counts as a good argument? As a proof? What is "good" knowledge?
Reasoning as a series of techniques, e.g. science, engineering, etc.
Why does the astronaut step into the space shuttle?
A short history of reasoning the "more"
Commercial Capitalism (13th +)calculating for trade, arithmetic, sharing risk and profit in long-distance
commerce
Rise of the Nation State (17th +)"art of the state", mercantilism, scientific revolution
Industrialization (19th +)urbanization, scientific management, large bureaucracies
☉ Fibonacci, "Liber Abaci", Fibonacci, Calculating with Arab numerals (Pisa, 1202)
☉ Unknown, "Arte dell'Abbaco", Practical arithmetic (Venice, 1478)
☉ Pacioli, "Summa de arithmetica, geometria, proportioni et proportionalità" , Double entry bookkeeping (Venice, 1494)
☉ William Petty & John Graunt, Political Arithmetick (17th century)
☉ Hermann Conring & Gottf ried Achenwall, Statistik (17th & 18th century)
☉ Adolphe Quetelet, Statistical regularities and the "average man" (19th century)
☉ Francis Galton & Karl Pearson, Public health and eugenics (late 19th century)
Liber Abaci, Fibonacci, 1202
Calculation for accounting, money-changing, insurance, lending, measurement, etc.
"Having proved that there die about 3,506 persons at Paris unnecessarily, to the damage of France, we come next to compute the value of the said damage, and of the remedy thereof, as follows, viz., the value of the said 3,506 at 60 livres sterling per head, being about the value of Algier slaves (which is less than the intrinsic value of people at Paris), the whole loss of the subjects of France in that hospital seems to be 60 times 3,506 livres sterling per annum, viz., 210,360 livres sterling, equivalent to about 2,524,320 French livres." (Petty 1655)
The Assurance of Lifes, Charles Babbage, 1826
First life tables were assembled in the 17th century by John Graunt.
Babbage builds a machine to produce tables faster.
Essai sur la statistique de la population française, Adolphe d'Angeville, 1836
population census, tax register, house numbers, etc.
modern statistics, large bureaucracies, quantitative social sciences, etc.
Over the last centuries, scientific thinking has become the dominant way of producing knowledge and making decisions in most societies.
Scientific thinking implies various styles of reasoning, different ways of "giving reasons", different analytical gestures, etc.
Styles are intrinsically connected to our "lifeworld" (Husserl 1936).
Two diagnoses:☉ Our lifeworld is changing in significant ways => "the many"
☉ We need new ways of making sense of it => data analysis
What is the style of data analysis? Its epistemology? One or many?
What are its techniques, its analytical gestures?
Some conclusions for part 1
2 / Two kinds of mathematics
Can there be data analysis without math? No.
Does this imply epistemological commitments? Yes.
But there are choice, e.g. between:☉ Confirmatory data analysis => deductive
☉ Exploratory data analysis (Tukey 1962) => inductive
There is a fast growing variety of analytical gestures focusing on large numbers of formalized and classed objects.
2 / Two kinds of mathematics
Statistics
Observed: objects and properties
Inferred: relations
Data representation: the table
Visual representation: quantity charts
Grouping: class (similar properties)
Graph-theory
Observed: objects and relations
Inferred: structure
Data representation: the matrix
Visual representation: network diagrams
Grouping: clique (dense relations)
Facebook Page "ElShaheeed", June 2010 – June 2011, (Poell / Rieder, forthcoming)
7K posts, 700K users, 3.6M comments, 10M likes (tool: netvizz), work in progress!
New media platforms funnel practices into reduced and largely formal "grammars of action" (Agre 1989); data is therefore very clean, very complete, and very detailed.
Can be imported with great ease into standard packages that come with many analytical gestures built in R, Excel, SPSS, Rapidminer, etc.).
Tools are easy, concepts are hard.
Statistics
Facebook Page "ElShaheeed", June 2010 – June 2011 comment timescatter
Facebook Page "ElShaheeed", June 2010 – June 2011 comment timescatter, log10 y scale
Facebook Page "ElShaheeed", June 2010 – June 2011: comment timescatter, log10 y scale, likes on comments
Facebook Page "ElShaheeed", June 2010 – June 2011 comment timeline, per day
Facebook Page "ElShaheeed", June 2010 – June 2011 comment timeline, per month
Facebook Page "ElShaheeed", June 2010 – June 2011 page posts by type, per month
Facebook Page "ElShaheeed", June 2010 – June 2011 comparison timeline: comments, posts, comments per post
Facebook Page "ElShaheeed", June 2010 – June 2011 histogram of comment lengths in characters
Facebook Page "ElShaheeed", June 2010 – June 2011 histogram of like count
Calculating relationships between variables
Quetelet 1827, Galton 1885, Pearson 1901
"Erosion of determinism" (Hacking 1991)
Facebook Page "ElShaheeed", June 2010 – June 2011 scatterplot comments / likes, with standard error
Facebook Page "ElShaheeed", June 2010 – June 2011: scatterplot comments / likes, per post type
2 / Two kinds of mathematics
Statistics
Observed: objects and properties
Inferred: relations
Data representation: the table
Visual representation: quantity charts
Grouping: class (similar properties)
Graph-theory
Observed: objects and relations
Inferred: structure
Data representation: the matrix
Visual representation: network diagrams
Grouping: clique (dense relations)
3 / The mathematics of structure
Graph theory has a long prehistory; social network analysis starts in the 1930s with Jacob Moreno's work.
Graph theory is "a mathematical model for any system involving a binary relation" (Harary 1969); it makes relational structure calculable.
Three different force-based layouts of my FB profile
OpenOrd, ForceAtlas, Fruchterman-Reingold
Non force-based layouts
Circle diagram, parallel bubble lines, arc diagram
Network statistics
betweenness centrality
degr
ee
Relational elements of graphs can be represented as tables (nodes have properties) and analyzed through statistics.
Network statistics bridge the gap between individual units and the structural forms they are embedded in.
This is currently an extremely prolific field of research.
Twitter 1% sample, 24 hours: 4.3M tweets, 3.4M users, 2M accounts mentioned, 227K unique hashtags
Helpful: baseline sampling
Twitter's API proposes a random 1% statuses/sample endpoint that does not require privileged access.
Provides datasets for researching certain types of questions and allows to "contextualize" (baseline) other collections.
We (Gerlitz / Rieder 2013) explored 24 hours of the 1% sample and captured 4,376,230 tweets, sent from 3,370,796 accounts, at an average rate of 50.65 tweets per second, leading to about 1.3GB of uncompressed and unindexed MySQL tables.
A baseline provides reference points
Beware of averages in non-normal distributions! But 1% sample is sufficiently large to allow representative exploration of subsamples.
We can qualify structures and individual elements in terms with the help of statistics and graph theory.
Twitter 1% sample, co-hashtag analysis
227,029 unique hashtags, 1627 displayed (freq >= 50)
Size: frequency
Color: modularity
Size: frequency
Color: user diversity
Twitter 1% sample, co-hashtag analysis
227,029 unique hashtags, 1627 displayed (freq >= 50)
Size: frequency
Color: degree
Twitter 1% sample, co-hashtag analysis
227,029 unique hashtags, 1627 displayed (freq >= 50)
Nine measures of centrality (Freeman 1979)
Twitter 1% sample
Co-hashtag analysis
Degree vs. wordFrequency
Degree vs. userDiversity
Twitter 1% sample
Co-hashtag analysis
Facebook Page "ElShaheeed"
700K nodes, 11M connections
Color: type
Facebook Page "ElShaheeed"
700K nodes, 11M connections
Color: outdegree
Conclusions
There is a lot of excitement about data analysis, but our understanding of styles and analytical gestures is still very poor.
We need interrogation and critiques of methodology that are developed from engagement and historical/conceptual investigation.
We need analytical gestures that are more closely tied to concepts from the humanities and social sciences; exploration rather than confirmation.
Visualization and simpler tools are very interesting but require technical and conceptual literacy to deliver more than illustrations.
This is probably not a fad.
"Incite, induce, deviate, make easy or difficult, enlarge or limit, render more or less probable… These are the categories or power." (Deleuze 1986, 77)
Thank You
rieder@uva.nl
https://www.digitalmethods.net
http://thepoliticsofsystems.net
"Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. Data analysis must progress by approximate answers, at best, since its knowledge of what the problem really is will at best be approximate." (Tukey 1962)
top related