Top Banner
TOWARDS A DICTIONARY OF THE FUTURE COUNTS, COMPARISONS, COLLOCATIONS, CONTESTATIONS
42

Counts, comparisons, collocations, contestations: Towards a dictionary of the future

Aug 15, 2015

Download

Technology

Idibon1
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

T O WA R D S A D I C T I O N A RY O F T H E F U T U R E

COUNTS, COMPARISONS, COLLOCATIONS, CONTESTATIONS

Page 2: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

DICTIONARY OF THE FUTURE?

Page 3: Counts, comparisons, collocations, contestations: Towards a dictionary of the future
Page 4: Counts, comparisons, collocations, contestations: Towards a dictionary of the future
Page 5: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

SOME OTHER PLACES TO CHECK OUT

• The Google Ngram Viewer helps you understand trends across a bazillion books that Google has digitized. It’s an amazing resource:• So are the Corpus of Historical American English:

http://corpus.byu.edu/coha/ (COHA)• And the Corpus of Contemporary English:

http://corpus.byu.edu/coca/ (COCA)

Page 6: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

TO COHA!

Page 7: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

TO COCA!

Page 8: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

TAKING CARE WITH COUNTS

• The counts in the last two slides are too small to be anything more than interesting• The next slide shows us tracking the collocates of

future• Collocates are the words that appear near a

given word—one of the chief collocates of salt is pepper, for example

Page 9: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

COUNTS COUNT

Page 10: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

DISCUSSIONS, DEMOCRACIES AND DICTIONARIES

Page 11: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

What’s going on in Urban Dictionary?• Identity• Play• Politics

Page 12: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

KEYWORDS

• What are the words that are most contested?• How do they

change?• Who controls the

future?• Liberty vs. Freedom

Page 13: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

JACK GRIEVE FINDING WOTY’S

• See also http://idibon.com/quantifying-word-year/

Page 14: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

• p.s.—in my ideal Dictionary of the Future, we understand the geography of how a word is used

Page 15: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

MEANING IS IN THE USE

• “For a large class of cases of the employment of the word ‘meaning’—though not for all—this way can be explained in this way: the meaning of a word is its use in the language” — Wittgenstein, Philosophical Investigations

Page 16: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

MEANING IN THE USE

• Tumblr moms use over 4 x’s as many

and

as Twitter peeps• What are the

collocates?• Blue: his he him• Purple: she’s she• No pink heart option!

• See also http://www.washingtonpost.com/sf/opinions/2015/02/12/why-moms-love-emoji/ and http://idibon.com/emomji-emoji-new-moms-use/

Page 17: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

CO-OCCURRENCES MATTER (MOVIE REVIEW RATINGS AND WORDS)

• The idea here is that if you’re writing a review and use the word wow, you’re being very positive or very negative. You don’t say Wow, I have a balanced and neutral opinion on this very often.

• If you’re using however, however, you’re likely to be in the middle of your movie review rating or travel summary—not at the very positive/negative extremes.

• See also http://web.stanford.edu/~cgpotts/manuscripts/potts-schwarz-exclamatives08.pdf and http://web.stanford.edu/~cgpotts/papers/constant-davis-potts-schwarz-expressives.pdf

Page 19: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

WHOLESOMENESSH TT P : / / I D I BO N . C O M / W H O L E S O M E -B RA N D I N G - C A M PA I G N - E F F E CT I V E N E S S

/

Page 20: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

BRANDS LOVE WORDS

Page 21: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

DEEP HISTORY

• The first uses of wholesome tended to be about ‘virtuous teachings’. • In Wycliffe’s Bible way back in 1382:

The..holsum wordis of oure Lord Jhesu Crist. (1 Timothy 6:3)

(Modern versions treat wordis as ‘words’, ‘teachings’, or ‘instructions’.)

Page 22: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

“WHOLESOME” [NOUN] OVER TIME

Page 23: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

HOW ABOUT IN SOCIAL MEDIA?

• You have to deal with spam (11% of data in this case; another 36% of data is “Wholesome Radio”, which is probably irrelevant)• In 2014 tweets:• Food: 23% (but mostly not about Honey Maid)• Humans: 23% (and how they can/should live; church-

related mentions are prominent)• Entertainment: 13% (movies, TV)

• Now let’s compare this to 2011 tweet uses:• Humans: 32%• Entertainment: 12%• Food: 9%

Page 24: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

WORDS ARE CONTESTED

Page 25: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

MORE ON CONTESTED WORDS

• In the next slide, you’ll see an image from Monroe et al (2008)

• This is work that takes the basic thing we know: Republicans and Democrats speak about the same issue differently.

• In the next slide, they are showing methods that can pull about how the parties speak about abortion when they take the floor.

• The words at the top are the Democratic party words, the ones at the bottom are the Republican party words.

• http://languagelog.ldc.upenn.edu/myl/Monroe.pdf

Page 26: Counts, comparisons, collocations, contestations: Towards a dictionary of the future
Page 27: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

ENTREPRENEURH TT P

: / / I D I BO N . C O M / E N T R E P R E N E U R S - F R E N C H - S PA N I S H - E N G L I S H /

Page 28: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

ENTREPRENEUR IN ENGLISH, FRENCH, SPANISH

• Tycoon, mogul, industrialist• A flavor of ‘ill-gotten gains’

• Entrepreuneur doesn’t seem to have this—in English right now• Collocates have to do with:• Advice• Success• Investors• Marketing• Social (media/services/topics/techniques)• Failure (especially fear-of)• Lots of named entities (SXSW, Dubai, #KSA, Twitter, Google, LinkedIn,

Etsy)

• The people using entrepreneur identify themselves as• Authors, speakers, writers, bloggers, strategists, (life) coaches,

consultants, moms, wives, husbands, fathers, food-lovers, music-lovers

Page 29: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

KEY: GET COMPARISON SETS

Group/

Context A

Group/

Context B

Page 30: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

INTERCONNECTED AXES OF DIFFERENCE

• Genre (State of the Unions vs. Reddit comments)• Time (1940s vs. the last ten years)• Geography (hella vs. wicked)• Traditional demographics (age, gender,

education)• Personal identity/style (nerd, goth, bro, mom)

Page 31: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

BECAUSE XHTTP: / / ID IBON.COM/INNOVATING- INNOVATION/

Page 32: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

INNOVATIONS AND THEIR COMMUNITIES

• Because X’ers disporportionately like:• YouTube• Tumblr• One Direction (especially Harry)• Justin Bieber• Ariana Grande• “bands”• pizza• sex• cats• books

• They are decidedly less likely to talk about • software• basketball• NASCAR• business• words associated with African-

American Vernacular English

Page 33: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

TH

E X

IN B

EC

AU

SE X

Page 34: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

Part of speech Word counts ≥ 50

Noun (people, spoilers) 32.02%

Compressed clause (ilysm)

21.78%

Adjective (ugly, tired) 16.04%

Interjection (sweg, omg) 14.71%

Agreement (yeah, no) 12.97%

Pronoun (you, me) 2.45%

PART OF SPEECH TAGGERS ARE GOOD

• There’s even a pretty good one for Twitter POS

Page 35: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

INNOVATIONS CLUMP

Page 36: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

#BLACKLIVESMATTERH TT P : / / I D I BO N . C O M / B L AC K L I V E S M ATT E R- E V E N T S - C H A N G E - C O NV E R S AT I O N S

/

Page 37: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

TOPIC MODELING

• In the previous sections, I’ve been noting what you can do when you have two or more comparison sets• How is wholesome used in time x vs. time y vs. time z• What are the differences between English speakers talking about

entrepreneurship vs. French speakers and Spanish speakers?• How are people who use the innovative Because X construction

different than people who don’t use it?

• In this section, we talk about topic modeling, which is a way to automatically identify clusters within a data set, even if you don’t have a comparison set.

• We’ll use this to explore conversations around #blacklivesmatter, but we’ll also see how these conversations shift before/after a particular moment in time

Page 38: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

TIME MATTERS

Page 39: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

TOPICS (EVEN WHEN YOU DON’T HAVE AN A PRIORI COMPARISON SET)

Page 40: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

UNKNOWN UNKNOWNS

• In general, topic modeling is a way of addressing the limits of our knowledge. If you’re asking a question about data, you probably know something about the data going in. • But what we hear from people is that they are keenly

aware that they don’t know what they don’t know.• Topic modeling is meant to help that.

• In the next slides, another use of topic modeling: identifying the themes of Martin Luther King Jr.’s major speeches and sermons

Page 41: Counts, comparisons, collocations, contestations: Towards a dictionary of the future

• Topic modeling Dr. King’s major speeches and sermons gets these topics•Which change over time• See also http://idibon.com/topic-detection-mlk/

Page 42: Counts, comparisons, collocations, contestations: Towards a dictionary of the future