Top Banner
Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan
55

Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Dec 16, 2015

Download

Documents

Eugene Stafford
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Analyzing Text at the Middle Distance between the Close Read and Culturomics

Marti A. HearstU.C Berkeley

Joint Work with Aditi Muralidharan

Page 2: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 3: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Foreground: The Close Read

Middle Distance: Sensemaking

Background: Culturomics (Text Mining)

Page 4: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Definition: “Close Read”

“Close reading describes, in literary criticism, the careful, sustained interpretation of a brief passage of text. Such a reading places great emphasis on the particular over the general, paying close attention to individual words,

syntax, and the order in which sentences and ideas unfold as they are read.”

-English Wikipedia, 6/4/2012

Page 5: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

“Power and Passion in Shakespeare’s PronounsInterrogating ‘you’ and ‘thou’”Penelope Freedman, 2007, MPG Books, 280 pp.

Scene from “As you like it” by Daniel Maclise (1806-70)

Page 6: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Conclusions (“Power and Passion of Shakespeare’s Pronouns”)

“The subtleties of the use of ‘you’ and ‘thou’ that have emerged … can seem, at worst, random or, at best,

unfathomable. …

A set of oppositions has been revealed here: … These oppositions are complex and slippery: they may operate

in parallel, may converge or diverge. Each pronoun choice has to be seen in a highly specific context.”

Page 7: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Definition: “Culturomics”

Narrower than “digital humanities” and broader than “corpus linguistics”.

( Loose interpretation of definitions at culturomics.org )

Page 8: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 9: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 10: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 11: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

“Culturomics” example:middle distance vs. middle ground

Page 12: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

As an NLP Researcher, where do your ideas come from?

Can HCI improve your work?

Page 13: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Sensemaking

• A vague information need

• Iteratively refine it by

• Searching

• Reading

• Analyzing

• Reach understandingPirolli and Card 2005, Pirolli and Russell 2011

Page 14: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Sensemaking for Literature Study

Page 15: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

WordSeer (version 1)

The North American Pre-civil-war Slave Narratives

Page 16: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

The North American Slave Narratives

• Stories of the lives of former slaves

• Published by white abolitionist sponsors

• About 3000 narratives survive

• ~300 in prototype

Do the north american slave narratives all conform to the

same stereotypes?

Page 17: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

A “Master Plan” for the slave narratives

“... conventions so early and firmly established that one can imagine a sort of master outline drawn from the

great narratives and guiding the lesser ones”

-- Olney, J. “I was born: Slave Narratives and their Status as Autobiography”, Callaloo, 1984

Page 18: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 19: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 20: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Our approach

• Phase 1: Support searching for instances of conventions

• Phase 2: Support visualizing their occurrence in the collection

Page 21: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Searching for stereotypes

• Keyword search is not enough

• Search words: “cruel” “harsh” “overseer” “master” “mistress”

• Instead: “overseer” “master”, “mistress” described as “cruel”, “harsh”

• Also want the entire picture, for comparison

• “overseer” “master”, “mistress” described as ____?_____

• ___?_____ described as cruel

Page 22: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Natural language processing

The cruel overseer beat us severely.

object

subject

modifier

(automatically-extracted structure)

Page 23: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Grammatical search

Page 24: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 25: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 26: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

• Prevalence

• Position of occurrence within a document

• Across the entire collection

Part 2: visualizing stereotypes

Page 27: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

“I was born”

Page 28: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Results (presented at MLA 2012)

• Prevalent stereotypes

• “I was born”

• Separation from parents

• Cruel treatment

• Escape

• A ‘missed’ stereotype

• Parents’ death

• Not as strictly ordered as implied by Olney’s master plan.

Page 29: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Problems

• Vocabulary

• Same concept expressed with many different wordings

• Needed to see synonyms, nearby words, suggestions on searches

• Comparison and curation

• Couldn’t isolate and compare results on sub-collections of document

Page 30: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

WordSeer (version 1.5)

wordseer.berkeley.edu

Page 31: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

The complete works of Shakespeare

• 42 documents -- plays and sonnet collections

• 1589 -- 1612

Analyze Hamlet.

How does the portrayal of men and women in Shakespeare

change in different circumstances?

(CHI ’12 works in progress)

English 203:Hamlet in the Humanities Lab Spring 2012, University of Calgary

Page 32: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

The Vocabulary Problem

Which words embody the concept of female beauty?

261 results

Page 33: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 34: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 35: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Collection and Comparison

Does the treatment of love vary between the comedies and tragedies?

Page 36: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 37: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Collection and Comparison

Step 2. Compare word usage

Page 38: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

comedies tragedies

Page 39: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

“in love”

comedies tragedies

Page 40: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Results

• WordSeer 1.5 being successfully used (so far) in Hamlet class

• How does the relationship between Hamlet and his mother change over the course of the play?

• How does Act 1 portray the character of Horatio?

• Investigated changing language use around men and women

• Unknowingly replicated and extended previous findings by other Shakespeare scholar

Page 41: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

How Does This Apply to Social Media Language?

Page 42: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

As an NLP Researcher, where do your ideas come from?

Can HCI improve your work?

Page 43: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Sentiment Analysis?

Sarcasm?

Page 44: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 45: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 46: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 47: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 48: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 49: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 50: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 51: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 52: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 53: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.
Page 54: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Summary

• We suggest enhancing NLP research with sensemaking tools to help with hypothesis formation

• Midway between reading the text and blind statistics.

• Helps with hypothesis formulation, verification, and refinement.

• This is clearly useful for literature analysis.

• It remains to be seen if it can help with social media analysis.

Page 55: Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Thank you!