Top Banner
Open for Research: A Demonstration of Text Analysis Applications and a Discussion of Library Collaboration Opportunities Nat Gustafson-Sundell Journal Acquisitions/ Reference Librarian Assistant Professor Minnesota State University, Mankato See OpenResearch.Weebly.Com for my notes
92

Open Research

Oct 22, 2014

Download

Technology

Learn about Open Access, Digital Humanities, and Text Analysis. Focus on Voyant, Topic Modeling Tool, and CATMA.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Open Research

Open for Research: A Demonstration of Text Analysis Applications and a Discussion of Library Collaboration Opportunities

Nat Gustafson-SundellJournal Acquisitions/ Reference LibrarianAssistant ProfessorMinnesota State University, MankatoSee OpenResearch.Weebly.Com for my notes

Page 2: Open Research

Section 1: 60-70 minutes

I. Introduction to Workshop (5-10)

II. Overview: A Functional Description of Open Access (15)

III. Overview: Digital Humanities, Humanities Computing, eResearch: Background and Theory (20-25)

IV. Overview: Text Analysis & Topic Modeling: Background and Theory (20, possibly broken by coffee break)

*Coffee Break 2:30-2:45* Section 2: 65-70 minutes (Start around 2:45-50)

V. Demonstration: Text Discovery/ Text Preparation/ Text Analysis/ Topic Modeling (30)

VI. Group Exploration: Text Analysis/ Topic Modeling (20, less if we need to make up some time)

VII. Overview: Content Analysis (15-20, start by 3:40) *Breather* (5 minutes)

Section 3: 55-60 minutes (Start around 4:00)

VIII. Demonstration: Text Annotation Tools & Relational Databases (20) IX. Group Exploration: Projects (20-25)

X. Group Exploration: Collaboration (eResearch Centers and Libraries) (15, less if we need to make up some time)

Schedule

2

Page 3: Open Research

Overview: Open Access . 1http://openresearch.weebly.com/open-access.html

Key Terms:• Gratis/ Libre• Green/ Gold/ “Platinum”• Working Papers, pre-prints, e-prints, etc.• Subject and Institutional Repositories

3

Page 4: Open Research

http://www.opendoar.org/onechart.php?cID=&ctID=&rtID=&clID=&lID=&potID=&rSoftWareName=&search=&groupby=rt.rtHeading&orderby=Tally%20DESC&charttype=pie&width=600&height=300&caption=Open%20Access%20Repository%20Types%20-%20Worldwide

Overview: Open Access . 2

4

Page 5: Open Research

https://creativecommons.org/licenses/

Overview: Open Access . 3

• Do not sign away your copyright (unless you have a REALLY good reason)

5

Page 6: Open Research

Overview: Open Access . 4

“We are all, or are all soon to become, nineteenth centuryists.”

- Matthew Jockers

6

Page 7: Open Research

Overview: Digital Humanities . 1http://openresearch.weebly.com/digital-humanities-eresearch.html

“In 1949, an Italian Jesuit priest, Father Roberto Busa, began what even today is a monumental task: to make an index verborum of all the words in the works of St. Thomas Aquinas and related authors, totaling some 11 million words of medieval Latin. Father Busa imagined that a machine might be able to help him, and, having heard of computers, went to visit Thomas J. Watson at IBM in the United States in search of support … The entire texts were gradually transferred to punched cards and a concordance program written for the project.” (Hockey online)

“The History of Humanities Computing,” Susan Hockey, 2004

7

Page 8: Open Research

Overview: Digital Humanities . 2

“The real origin of that term [digital humanities] was in conversation with Andrew McNeillie, the original acquiring editor for the Blackwell Companion to Digital Humanities. We started talking with him about that book project in 2001, in April, and by the end of November we’d lined up contributors and were discussing the title, for the contract. Ray [Siemens] want ‘A Companion to Humanities Computing’ as that was the term commonly used at that point; the editorial and marketing folks at Blackwell wanted ‘Companion to Digitized Humanities.’ I suggested ‘Companion to Digital Humanities’ to shift emphasis away from simple digitization.” (John Unsworth quoted in Kirschenbaum 2-3)

“Twitter, along with blogs and other online outlets, has inscribed the digital humanities as a network topology, that is to say, lines drawn by aggregates of affinities, formally and functionally manifest in who follows whom, who friends whom, who tweets whom, and who links to what.” (Kirschenbaum 5)

“What is Digital Humanities and What’s It Doing in English Departments?” Matthew G. Kirschenbaum, 2010

8

Page 9: Open Research

Overview: Digital Humanities . 3

“…the digital humanities can be … a nexus of fields within which scholars use computing technologies to investigate the kinds of questions that are traditional to the humanities, or… who ask traditional kinds of humanities-oriented questions about computing technologies.” (Fitzpatrick online)

“Reporting from the Digital Humanities 2010 Conference,” Kathleen Fitzpatrick, 2010

9

Page 10: Open Research

(Meeks online, 2011) See https://dhs.stanford.edu/comprehending-the-digital-humanities/

Overview: Digital Humanities . 4

10

Image Removed to avoid possibility of copyright infringement

Page 11: Open Research

Overview: Digital Humanities . 5“The Landscape of Digital Humanities,” Patrik Svensson, 2010

5 “paradigmatic modes of engagement between the humanities and information technology: information technology as a tool, an object of study, an exploratory laboratory, an expressive medium and an activist venue.” (Svensson online)

“…the digital humanities comprise a field in a loose sense.” (Svensson online)

“…it seems quite unlikely that the digital humanities would ever become a fully separate field.” (Svensson online)

“The complexity of digital humanities as a ‘field’ comes partly from its disciplinary and institutional diversity, and its multiple modes of engagement with information technology.” (Svensson online)

“…one interesting question is whether the digital calls for other modes of investigation, collaboration and making that may be partially incompatible with the epistemic commitments of the established discipline or field.” (Svensson online)

11

Page 12: Open Research

Overview: Digital Humanities . 6“A Genealogy of Digital Humanities,” Marija Dalbello, 2010

“This paper explores the history of digital humanities that grappled with the epistemological status of technology, as a major field of contemplation of the effects of new media on writing, reading, and interpretation.” (Dalbello 482)

Previous to the TLG, according to Karen Ruhleder, “gaining familiarity with the corpus was a life’s work” (1995), but given the TLG, graduate students could “ask questions that formerly could only be answered through comprehensive reading and experience un the field.” (Dalbello 488)

The TLG “had a profound restructuring effect on knowledge production in the field of classics … a broader range of ‘legitimate’ research queries could produce a sense of ‘doing complete work’” so that the “researcher could be free to do more intellectually interesting work rather than focusing on learning the corpus” (Dalbello 489)

“Seeing patterns and connections out of context liberates them from an archive, as a signifier that attaches itself to a new meaning; searches through digital corpora can produce radical readings and undermine existing interpretations, thus contributing to a program of critical interpretation,” (Dalbello 493)

12

Page 13: Open Research

Overview: Digital Humanities . 7

“The term post-human … suggests that our sense of what it is to be human has changed – as Katherine Hayles puts it, the post-human is a state of mind, a realization that mankind has finally understood that it is definitely not the centre of the universe. My concern here is to consider the implications of this post-human state of mind for our understanding and practice of the digital humanities.” (Prescott online)

“Making the Digital Human: Anxieties, Possibilities, Challenges,” Andrew Prescott, 2012

“For all the rhetoric about digital technologies changing the humanities, the overwhelming picture presented by the activities of digital humanities centres in the Great Britain is that they are busily engaged in turning back the intellectual clock and reinstating a view of the humanities appropriate to the 1950s...” (Prescott online)

“…as far as the digital humanities are concerned, interdisciplinarity is just a cover for the lack of a distinctive intellectual agenda … Another major obstacle preventing the digital humanities developing its own scholarly identity is our interest in method. If we focus on modelling methods used by other scholar, we will simply never develop new methods of our own.” (Prescott online)

“We might start by seeking closer contact with our colleagues in Cultural and Media Studies.” (Prescott online)

“We should be seeking to provide new perspectives on the way in which technology interacts with text.” (Prescott online)

13

Page 14: Open Research

Overview: Digital Humanities . 8“The state of the digital humanities: A report and a critique,” Alan Liu, 2011

“A purely economic rationale for the digital humanities might … be that they re-engineer higher education for knowledge work by providing ever smarter tools for working with increasingly global-scale knowledge resources, all the while trimming the need to invest proportionally in the traditional facilities…” (Liu 2011)

“I offer a report on the current state of the digital humanities … I will define it with unusual breadth. “Digital humanities’ will here have a supervening sense that combines ‘humanities computing’ or ‘text-based’ digital humanities … and new media studies …” (Liu 10)

“Currently, I fear, the digital humanities are not ready to take up their full responsibility because the field does not posses an adequate critical awareness of the larger social, economic, and cultural issues at stake … the whole amounts to the lack of a mental and policy firewall against postindustrial takeovers of the digital idea …” (Liu 11)

“The digital humanities are on the threshold of a new interpretive paradigm. The old paradigm, especially on the text-oriented side of the field, was constraining. That paradigm was empirical … about ‘hypothesis testing and empirical validation’. …The new paradigm allows computers and humans to share responsibility for the full act of interpretation, including the component acts of hypothesis-framing, observation, discovery, analysis, testing, reiterative hypothesis-framing, etc.” (Liu 21)

“…the text-oriented side of the digital humanities has been almost wholly uninterested in any social, political, economic, or cultural inquiry into the contexts and implications of information technology.” (Liu 30)

14

Page 15: Open Research

Overview: Digital Humanities . 9

“….attempts to work toward a theoretical foundation for humanities computing surfaced at the outset of scholarly publication in the field and have been in progress ever since – with no consensus in sight … the considerable variety in how humanities computing is … conceived is a sign of health rather than decay.” (McCarty 1228)

“….humanities computing … need not wait on the emergence of a theoretical framework … its semidirected, semicoherent activities are no discredit, rather the norm for an experimental field.” (McCarty 1233)

“…the crafted objects of humanities computing are … primary ‘metatheoretical statements’ produced by those who ‘think in things rather than words.’ A new definition of scholarship, demanind new abilities, would seem to follow.” (McCarty 1227)

“The question of reinventing our scholarly forms takes us beyond any list of projects to what we might consider the fundamental ‘project’ of humanities computing … This is, in simple but far-reaching terms, epistemological: to ask, in the context of computing, what can (and must) be known of our artifacts, how we know what we know about them, and how new knowledge is made.” (McCarty 1231)

“The socioacademic function of humanities computing can be understood as an elaboration of the Galisonian trading zone, which the field establishes as a methodological commons and within it, takes the role of merchant trader among mutually divergent academic cultures.” (McCarty 1232)

“Play, apparently without conscious direction, is a recognized factor in scientific discovery.” (McCarty 1232)

“Humanities Computing,” Willard McCarty, 2003

15

Page 16: Open Research

Overview: Digital Humanities . 9

(McCarty 1225) 16

Image Removed to avoid possibility of copyright infringement

Page 17: Open Research

Overview: Text Analysis . 1http://openresearch.weebly.com/text-analysis.html

Some Terms:• Corpus• Concordance• Key Words in Context (KWIC)• Stopwords• Collocation• Frequency (and Frequency Distribution)• Lemmatization• Hapax legomena (dis, tris, tetrakis)• Etc.

“Text-analysis tools have their roots in the print concordance. The concordance is a standard research tool in the humanities that goes back to the thirteenth century.…The challenge before us is to question our procedural habits and presuppositions as to what are legitimate recombinations – to forget the concordance and ask anew how we can analyse text with a computer and whether such computer-assisted interpretations are interesting in and of themselves. We need to play again and make playpens available to our colleagues.…I therefore want to propose a very different image of what a concordance is … I call it a hybrid (or monster) because it is authored not just by the original author, but also by the user’s choices and the procedures used to generate it … it is neither the work of the original author nor that entirely of the provoker of the concordance.” (Rockwell 210,213)

17

Page 18: Open Research

Overview: Text Analysis . 2Macroanalysis, Matthew Jockers, 2013

“The literary scholar of the twenty-first century can no longer be content with anecdotal evidence, with random ‘things’ generated from a few , even ‘representative’ texts. We must strive to understand these things in the context of everything else, including a mass of possibly ‘uninteresting’ texts.” (Jockers 8)

“Today’s student of literature must be adept at reading and gathering evidence from individual texts and equally adept at accessing and mining digital-text repositories.” (Jockers 9)

“At the macro scale , we see evidence of time and gender influences on theme and style. By superimposing these two network snapshots in our minds, we can begin to imagine a larger context in which to read and study nineteenth-century literature. What is clear is that the books we have traditionally studied are not isolated books. The canonical greats are not even outliers: they are books that are similar to other books…” (Jockers 168)

“…macroscopic investigation is contextualization on an unprecedented scale.” (Jockers 27-8)

“It is the exact interplay between the macro and micro scale that promises a new, enhanced, and perhaps even better understanding of the literary record. The two approaches work in tandem and inform each other. Human interpretation of the ‘data,’ whether it be mined at the macro or micro level, remains essential … The most fundamental and important difference in the two approaches is that the macroanalytic approach reveals details about texts that are for all intents and purposes unavailable to close-readers of the texts.” (Jockers online)

See Example Project18

Page 19: Open Research

Overview: Text Analysis . 3Distant Reading, Franco Moretti, 2013

“Writing about comparative social history, Marc Bloch once coined a lovely ‘slogan,’ as he himself called it: ‘years of analysis for a day of synthesis’; and if you read Braudel or Wallerstein you immediately see what Bloch had in mind. The text which is strictly Wallerstein’s, his ‘day of synthesis’, occupies one-third of a page … the rest are quotations … Years of analysis; other people’s analysis, which Wallerstein’s page synthesizes into a system.

Not, if we take this model seriously, the study of world literature will somehow have to reproduce this ‘page’ – which is to say: this relationship between analysis and synthesis – for the literary field. But in that case, literary history will quickly become very different from what it is now: it will become ‘second hand’: a patchwork of other people’s research, without a single direct textual reading. Still ambitious, and actually even more so than before (world literature!); but the ambition is now directly proportional to the distance from the text: the more ambitious the project, the greater must the distance be.” (Moretti 47-8, 2000)

“Distant reading: where distance … is a condition of knowledge: it allows you to focus on units that are much smaller or much larger than the text: devices, themes, tropes – or genres and systems. And if, between the very small and the very large, the text itself disappears, well, it is one of those cases when one can justifiably say, Less is more. It we want to understand the system in its entirety, we must accept losing something…” (Moretti 48-9, 2000)

19

Page 20: Open Research

Overview: Text Analysis . 4Distant Reading, Franco Moretti, 2013

(Moretti 221) See also http://litlab.stanford.edu/LiteraryLabPamphlet2B.Figures.pdf20

Image Removed to avoid possibility of copyright infringement

Page 21: Open Research

Overview: Text Analysis . 5Reading Machines, Stephen Ramsay, 2011

“…literary criticism operates at a register in which understanding, knowledge, and truth occur outside of the narrower denotative realm in which scientific statements are made. It is not merely the case that literary criticism is concerned with something other than the amassing of verified knowledge. Literary criticism operates within a hermeneutical framework in which the specifically scientific meaning of fact, metric, verification, and evidence simply do not apply … ‘evidence’ stands as a metaphor for the delicate building blocks of rhetorical persuasion … ‘Verification’ occurs in a social community of scholars whose agreement or disagreement is almost never put forth without qualification.” (Ramsay 7, 2011)

“If text analysis is to participate in literary critical endeavor in some manner beyond fact-checking, it must endeavor to assist the critic in the unfolding of interpretive possibilities. We might say that its purpose should be to generate further ‘evidence,’ though we do well to bracket the association that term holds in the context of less methodologically certain pursuits. The evidence we seek is not definitive, but suggestive of grander arguments and schemes.” (Ramsay 10, 2011)

“Critics often use the word ‘pattern’ to describe what they’re putting forth, and that word aptly connotes the fundamental nature of the data upon which literary insight relies. The understanding promised by the critical act arises not from a presentation of facts, but from the elaboration of a gestalt, and it rightfully includes the vague reference, the conjectured similitude, the ironic twist, and the dramatic turn. In the spirit of inventio, the critic freely employs the rhetorical tactics of conjecture – not so that a given matter might be definitely settled, but in order that the matter might become richer, deeper, and ever more complicated. (Ramsay 16, 2011)

21

Page 22: Open Research

“Although it is true that we do not typically ask [students] to cast yarrow stalks or choose things at random, we do ask them to find some pattern beyond the apparent pattern of the text … We ask them to select, isolate, notice – to consider a small group of sub-patterns from among the infinity of patterns that make up the text. Having done this, we then ask them to re-articulate those patterns in narrative form as elucidations of the texts in which they occur. We call those articulations ‘meanings’, and we call the act of embedding them in a narrative framework ‘interpretation’. …With algorithmic criticism, one would not ask how the ends of interpretation were or were not justified by means of the algorithms imposed, but rather, how successful the algorithms were in provoking thought and allowing insight.” (Ramsay 171, 173, 2003)

Overview: Text Analysis . 6

“…the real message of our technology is something entirely unexpected – a writerly, anarchic text that is more useful than the readerly, institutional text … This is, if you like, the basis of the Screwmeneutical Imperative. There are so many books. There is so little time. Your ethical obligation is neither to read them all nor to pretend that you have read them all, but to understand each path through the vast archive as an important moment in the world’s duration – as an invitation to community, relationship, and play.” (Ramsay online, 2010)

Reading Machines, Stephen Ramsay, 2011, and other essays

“… As with any text-analytical result, we can weave a narrative through the gaps. For this reason, we would do better to say it carves a new path through the document space, which in turn allows us to reread and rethink…” (Ramsay 80, 2011)

22

Page 23: Open Research

Overview: Text Analysis . 7“Tampering with the Text to Increase Awareness of Poetry’s Art,” Estelle Irizarry, 1996“Computer-Assisted Reading: Reconceiving Text Analysis,” Stefan Sinclair, 2003“What is Text Analysis, Really?” Geoffrey Rockwell, 2003

“The value of the computer-mediated exercises is that they enable readers to readily perceive and appreciate features that are not obvious in a conventional reading of a printed text.” (Irizarry 155)

“The computer is, among other things, an instrument uniquely suite to play activities ...” (Irizarry 156)

“By thinking more about process than outcomes, about multiplying meanings (not data) rather than converging on answers, we can consider how to make the computer an extension of the reading and interpretive practices in which humanists are already engaged.” (Sinclair 176)

“Playful experimentation is a pragmatic approach of trying something, seeing if you obtain interesting results, and if you do, then trying to theorize why those results are interesting rather than starting from articulated principles.” (Rockwell 214, 2003)

“Assembling and disassembling a text, like playing with blocks of Lego, may not necessarily contribute immediately to its understanding, but it is likely to contribute to the aggregate experience of the text in valuable ways. … I am suggesting that play is an integral part of a humanist’s interpretive activities…” (Sinclair 181)

“…we should rethink our tools on a principle of research as disciplined play.” (Rockwell 213)

23

Page 24: Open Research

Overview: Text Analysis . 8“Between Language and Literature: Digital Text Exploration,” Geoffrey Rockwell and Stefan Sinclair, 2009

“Just because we can’t extract the same meaning(s) from a representation in the way we might from a traditional text does not mean that representations can’t be read.…As everyone should know by now, looking at visualizations of texts is a form of exploring and should be taken not as analysis, but exploration.…We can (and must) learn new ways of reading texts, and to embrace mathematical abstraction and visualization as interpretive allies rather than black-box enemies.” (Gibbs online, 2013)

“We have found that students enjoy submitting their own texts to these types of analysis tools, where they discover aspects of their writing of which they were not aware (like a propensity for repeating a given phrase. An engaging activity can be to have students try to find texts on the web that most closely resemble the data profile of their own texts. Do so can provoke interesting results and awaken the curiosity of the students for the relationship between text analysis and linguistic proficiency.” (Rockwell and Sinclair, 2009)

“Learning to Read Again,” Fred Gibbs, 2013

24

Page 25: Open Research

Overview: Topic Modeling . 1http://openresearch.weebly.com/topic-modeling.html

(Blei 78)25

Image Removed to avoid possibility of copyright infringement

Page 26: Open Research

Overview: Topic Modeling . 2

TimeYearsSingPastLandSongsLongThingsDivineBlood…

45%

ManBodySoulPoemsWomanMakeTrueLargeBeautyTimes…

23%

TheeThySoulJoyLifeShipSpaceJoysLong…

10%

EarthMenFaceStrongYoungLoveCitiesChildrenWomenFill…

10%

World LifeStatesWarAmericaGreatPresentFutureRealToday…

5%

26

Page 27: Open Research

Overview: Topic Modeling . 3

“Topic modeling gives us a way to infer the latent structure behind a collection of documents. In principle, it could work at any scale, but I tend to think human beings are already pretty good at inferring the latent structure in (say) a single writer’s oeuvre. I suspect this technique becomes more useful as we move toward a scale that is too large to fit into human memory.” (Underwood online)

“Topic modeling made just simple enough,” Ted Underwood, 2012

“…I’m not sure how much value they will have as evidence. For one thing, they require you to make a series of judgment calls that deeply shape the results you get (from choosing stopwords, to the number of topics produced, to the scope of the collection). The resulting model ends up being tailored in difficult-to-explain ways by a researchers preferences.” (Underwood online)

See a variety of Topic Model visualizations

“…excitement about the use of topic models for discovery needs to be tempered with skepticism about how often the unexpected juxtapositions LDA creates will be helpful, and how often merely surprising. A poorly supervised machine learning algorithm is like a bad research assistant. It might produce some unexpected constellations that show flickers of deeper truths; but it will also produce tedious, inexplicable, or misleading results.” (Schmidt 50)

“Words Alone: Dismantling Topic Models in the Humanities,” Benjamin M. Schmidt, 2012

27

Page 28: Open Research

Demonstration: Text Analysis: Text Discovery . 1http://openresearch.weebly.com/tools.html

http://chroniclingamerica.loc.gov/

28

Page 29: Open Research

Demonstration: Text Analysis: Text Discovery . 2

29

Page 30: Open Research

Demonstration: Text Analysis: Text Discovery . 3

30

Page 31: Open Research

Demonstration: Text Analysis: Text Discovery . 4http://www.gutenberg.org/

31

Page 32: Open Research

Demonstration: Text Analysis: Text Discovery . 5

32

Page 33: Open Research

Demonstration: Text Analysis: Text Discovery . 6

33

Page 34: Open Research

Demonstration: Text Analysis: Text Preparation . 1http://openresearch.weebly.com/tools.html

34

Page 35: Open Research

Steps:1. Reg Ex: ] → □2. Reg Ex: \n→ ◊3. Reg Ex: \r → ◊4. Normal: ; → ;□5. Normal: , → ,□6. Normal: . → .□7. Normal: ? → ?□8. Normal: ! → !□9. Normal: : → :□10. Reg Ex: (\s{2,}) → □11. Normal: -□ → ◊12. Normal: - → ◊13. Add carriage returns for every “document”14. Encode in UTF-8

Demonstration: Text Analysis: Text Preparation . 2

35

Page 36: Open Research

Demonstration: Text Analysis: Text Preparation . 3

36

Page 37: Open Research

Steps:1. Reg Ex: \(.+?\) → □

Find every instance of parentheses enclosing any text.

Demonstration: Text Analysis: Text Preparation . 4

37

Image Removed to avoid possibility of copyright infringement

Page 38: Open Research

Demonstration: Text Analysis: Voyant . 1http://openresearch.weebly.com/tools.html

http://openresearch.weebly.com/texts.html

http://voyant-tools.org

38

Page 39: Open Research

Demonstration: Text Analysis: Voyant . 2

39

Page 40: Open Research

Demonstration: Text Analysis: Voyant . 3

40

Page 41: Open Research

Demonstration: Text Analysis: Voyant . 4

41

Page 42: Open Research

Demonstration: Text Analysis: Voyant . 5

42

Page 43: Open Research

Demonstration: Text Analysis: Voyant . 6

43

Page 44: Open Research

<ctrl> A<ctrl> C

Demonstration: Text Analysis: Voyant . 7

44

Page 45: Open Research

Demonstration: Text Analysis: Voyant . 8

45

Page 46: Open Research

Demonstration: Text Analysis: Voyant . 9

46

Image Removed to avoid possibility of copyright infringement

Page 47: Open Research

Demonstration: Text Analysis: Topic Modeling Tool . 1http://openresearch.weebly.com/tools.html

http://code.google.com/p/topic-modeling-tool/

47

Page 48: Open Research

Demonstration: Text Analysis: Topic Modeling Tool . 2

48

Page 49: Open Research

Demonstration: Text Analysis: Topic Modeling Tool . 3

1. Reg Ex: \n→ ◊2. Reg Ex: \r → ◊

49

Page 50: Open Research

Demonstration: Text Analysis: Topic Modeling Tool . 4

50

Page 51: Open Research

Demonstration: Text Analysis: Topic Modeling Tool . 5

51

Page 52: Open Research

Demonstration: Text Analysis: Topic Modeling Tool . 6

52

Page 53: Open Research

Demonstration: Text Analysis: Topic Modeling Tool . 7

53

Page 54: Open Research

Concept?

Language Type?

Sentiment Cluster?

Demonstration: Text Analysis: Topic Modeling Tool . 8

54

Page 55: Open Research

Group Exploration: Text Analysis . 1 http://openresearch.weebly.com/tools.html

???

55

Page 56: Open Research

Leaves of Grass 1867

Group Exploration: Text Analysis . 2

56

Page 57: Open Research

Leaves of Grass 1892

Group Exploration: Text Analysis . 3

57

Page 58: Open Research

Leaves of Grass 1892

Group Exploration: Text Analysis . 4

58

Page 59: Open Research

There is 1 document in this corpus with a total of 130,506 words and 14,594 unique words.Most frequent words in the corpus: old (303), shall (265), life (261), love (261), soul (245). More…

Leaves of Grass 1892

Group Exploration: Text Analysis . 5

59

Page 60: Open Research

Word 1892 Count 1892 Ratio 1867 Count 1867 Ratioold 303 0.00232173 157 0.00157965shall 265 0.00203056 218 0.0021934life 261 0.00199991 161 0.0016199love 261 0.00199991 218 0.0021934soul 245 0.00187731 160 0.00160984long 236 0.00180835 154 0.00154947earth 229 0.00175471 199 0.00200223night 210 0.00160912 168 0.00169033man 206 0.00157847 190 0.00191168day 188 0.00144055 140 0.00140861men 185 0.00141756 182 0.00183119know 179 0.00137158 146 0.00146898death 178 0.00136392 131 0.00131805come 175 0.00134093 131 0.00131805time 175 0.00134093 111 0.00111682great 174 0.00133327 159 0.00159977world 157 0.00120301 78 0.0007848sea 153 0.00117236 124 0.00124762hear 137 0.00104976 115 0.00115707like 137 0.00104976 94 0.00094578face 132 0.00101145 111 0.00111682hand 128 0.0009808 101 0.00101621good 126 0.00096547 100 0.00100615body 124 0.00095015 114 0.00114701young 122 0.00093482 101 0.00101621

Total 130506 99389

Sorted by 1892 Ratio

Same list re-sorted by 1867 Ratio

Leaves of Grass 1867, 1892

Group Exploration: Text Analysis . 6

60

Page 61: Open Research

Overview: Content Analysis . 1http://openresearch.weebly.com/content-analysis.html

“Content Analysis” or “message analysis”?• Rhetorical Analysis• Narrative Analysis• Discourse Analysis• Structuralist (or Semiotic) Analysis• Interpretative Analysis• Conversation Analysis• Critical Analysis• Normative Analysis

(Neuendorf 5-8)

“Content analysis is a summarizing, quantitative analysis of messages that relies on the scientific method and is not limited as to the types of the variables that may be measured or the context in which the messages are created or presented.” (Neuendorf 10)

“Content analysis is any technique for making inferences by objectively and systematically identifying specified characteristics of messages. (Holsti 25)

61

Page 62: Open Research

Overview: Content Analysis . 2

Concepts and Design:• Quantitative/ Qualitative• Deductive/ Inductive• Manifest/ Latent Content• Content/ Form• Reliability, Validity, Generalizability, Replicability• Unitizing• Etc.• Etc.

Content analysis “ ‘learned its methods from cryptography, from the subject classification of library books, and from biblical concordances, as well as from standard guides to legal precedents’ “ (Marvick, quoted in Rogers 214, 1994, quoted in Neuendorf 31)

62

Page 63: Open Research

Message produced by source A:time t1

Message produced by source A:time t2

Content Variable X AX t1

AX t2

Trends in communication content

Adapted from Figure 2-2, Holsti 28

Overview: Content Analysis . 3

63

Image Removed to avoid possibility of copyright infringement

Page 64: Open Research

Adapted from Figure 2-5, Holsti 30

Messages produced by source A

Content Variable X AX

Relationship of content variables to each other

Content Variable Y AY

Overview: Content Analysis . 4

64

Image Removed to avoid possibility of copyright infringement

Page 65: Open Research

Message produced by source A

Message produced by source B

Content Variable X AX

BX

Differences between communicators

Adapted from Figure 2-6, Holsti 30

Overview: Content Analysis . 5

65

Image Removed to avoid possibility of copyright infringement

Page 66: Open Research

Content Analysis can be used to:

• To describe trends in communication content• To relate known characteristics of sources to the messages they produce• To audit communication content against standards• To analyze techniques of persuasion• To analyze style• To relate known characteristics of the audience to messages produced for them• To describe patterns of communication• To secure political and military intelligence• To analyze psychological traits of individuals• To infer aspects of culture and cultural change• To provide legal evidence• To answer questions of disputed authorship• To measure readability• To analyze the flow of information• To assess responses to communication

Adapted from Table 2-1, Holsti 26

Overview: Content Analysis . 6

66

Page 67: Open Research

Overview: Content Analysis . 7

A 1935 study by Edgar Dale analyzed the themes and emphases of American motion pictures. (Neuendorf 33-4)

A World War II study “systematically analyzed radio broadcasts from Axis powers …. Allied forces were able to estimate the concentration of German troops in various locations by comparing music played on German radio stations with music played elsewhere in occupied Europe.” (Neuendorf 37)

A 1994 study by Kathleen Carley tracked “the representation of robots in science fiction” from “three different time periods – pre-1950s, the 1950s and 1960s, and the 1970s and 1980s.” (Neuendorf 184-6) See visualization p. 185

A 1963 study by E.S. Shneidman analyzed political rhetoric “to infer personality traits of the speaker from logical and cognitive characteristics of his verbal production.” The study categorized “Idiosyncrasies of reasoning and cognitive maneuvers in the rhetoric of John F. Kennedy, Richard M. Nixon, and Nikita Krushchev:”• Idiosyncrasies of reasoning:

o Irrelevant premiseo Argumentum ad populumo Complex questiono Derogationo Stranded predicateo Truth-type confusion

• Cognitive Maneuvers:o To enlarge or elaborate the precedingo To smuggle debatable point into alien contexto To be irrelevanto To allege but not substantiateo To introduce new notion

Adapted from Holsti 72-4 67

Page 68: Open Research

Overview: Content Analysis . 8ArticleManager

68

Page 69: Open Research

This is a typical hierarchy of forms linked from the main form. The Group and Actor forms are more complexly related, so possibly more interesting to look at, but would not fit so nicely on a single page.

Overview: Content Analysis . 9ArticleManager

69

Page 70: Open Research

Overview: Content Analysis . 10ArticleManager

70

Page 71: Open Research

Overview: Content Analysis . 11ArticleManager

71

Page 72: Open Research

Demonstration: Text Annotation: CATMA . 1http://openresearch.weebly.com/tools.html

http://www.catma.de/

72

Page 73: Open Research

Demonstration: Text Annotation: CATMA . 2

73

Page 74: Open Research

Demonstration: Text Annotation: CATMA . 3

74

Page 75: Open Research

Demonstration: Text Annotation: CATMA . 4

75

Page 76: Open Research

Demonstration: Text Annotation: CATMA . 5

What tags should I use?- Bottom-up (inductive)- Top-down (deductive)What’s most appropriate for the text?What is the unit of analysis?What do I want to analyze for –what am I wondering about> What do I think I might find?How is play a part of this process?How might hypothesis testingenter the process?How might a collaborativeproject proceed?

76

Page 77: Open Research

Demonstration: Text Annotation: CATMA . 6

77

Page 78: Open Research

Demonstration: Text Annotation: CATMA . 7

If there are too many tags, it is not possible to close-read the tags with the text. The tags only have value for analysis.

As tags proliferate, it can be difficult to stop adding tags for each new nuance. Should I tag the last line. Is this the “Object’s Action Property?” How to note sarcasm?

Seek out appropriate models:• See Narratology• See Content Analysis

78

Page 79: Open Research

Demonstration: Text Annotation: CATMA . 8

79

Page 80: Open Research

Demonstration: Text Annotation: CATMA . 9

80

Page 81: Open Research

Demonstration: Text Annotation: CATMA . 10

81

Page 82: Open Research

Example Projects . 1http://openresearch.weebly.com/example-projects.html

82

Page 83: Open Research

Example Projects . 2http://www.oldbaileyonline.org/obapi//

83

Page 84: Open Research

Example Projects . 3

84

Page 86: Open Research

Example Projects . 5

86

Page 87: Open Research

Example Projects . 5

87

Page 88: Open Research

Example Services . 1

Google Spreadsheet of Digital Humanities Centers

http://openresearch.weebly.com/eresearch-services-examples.html

To your knowledge, does your university or library support Digital Humanities research or eResearch more broadly?

If no, would you feel comfortable approaching your library to find out if the library might help out? Or someone in your department, or your college/ university, or through an association?

Would you feel comfortable continuing to explore the possibilities on your own?

What kinds of project(s) can you imagine pursuing? (or have you pursued?)

Would there be any need for, or value in, seeking collaboration? What kind of collaboration (or support) would be valuable?

What kind of budget environment might factor into what services are available or could be made available to you?

What kinds of services would be most valuable to you? (Ex: Videos, Link Lists, Project Descriptions, Classes, One on One Consultations, Project Development support, etc.)

88

Page 89: Open Research

89

Page 90: Open Research

90

Page 91: Open Research

91

Page 92: Open Research

92