Top Banner
How computers understand text content a presentation for the Auckland content strategy meetup by Anna Divoli @annadivoli . Ph.D. in Biomedical Text Mining | Text Analytics Researcher | Head of R&D at Pingar
35
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How computers understand text content - by Anna Divoli

How computers understand text content

a presentation for the Auckland content strategy meetup

by Anna Divoli@annadivoli

.

Ph.D. in Biomedical Text Mining | Text Analytics Researcher | Head of R&D at Pingar

Page 2: How computers understand text content - by Anna Divoli

Who am I?

• 14 years in academia + 4 years in industry• academically exposed to different disciplines:

biomedicine, bioinformatics, computational linguistics, information retrieval, information extraction, semantic technologies, human-computer interaction, search user interface usability, knowledge acquisition, visualizations

• lived in different countries:Greece, UK, US, NZ

• learned English as a second language (hint: I empathize with computer systems)

Anna Divoli Auckland content strategy meetup Aug 2015

Page 3: How computers understand text content - by Anna Divoli

Who are you?

• Marketing?• Digital content?• Information Architecture?• Journalists?• UX?• Business Analysis?• Software Development?• CS research (incl. “text” people)?• Other?

Anna Divoli Auckland content strategy meetup Aug 2015

Page 4: How computers understand text content - by Anna Divoli

What is “text”? Where is it?w

ww

.nai

lingi

t.com

/im

ages

/web

site

s.jp

g

ww

w.b

u.ed

u/to

day/

files

/201

2/10

/t_j

ourn

als1

.jpg

web

.cla

rku.

edu/

office

s/its

/im

ages

/file

pile

.jpg

ww

w.fl

ickr

.com

/pho

tos/

jlcon

for/

1419

1286

471

Page 5: How computers understand text content - by Anna Divoli

Human – Text Content Interaction

Humans:Slow, Inconsistent, Expensive

Text content:Overwhelmingly fast growing, Disseminated across multiple sources

Anna Divoli Auckland content strategy meetup Aug 2015

Page 6: How computers understand text content - by Anna Divoli

NLP Artificial Intelligence∈

Machine Learning

NLP

Computational Linguistics

Applied Text

Analytics

Storage

Memory

Security

Friendly UIs

Visualizations

Anna Divoli Auckland content strategy meetup Aug 2015

Page 7: How computers understand text content - by Anna Divoli

So, what’s in the text?

• Entities• Facts• Relations• Themes/topics• Opinions & sentiment• …

+ Time/Location dimensions:• Trends & paradigm shifts• Networks• …

Anna Divoli Auckland content strategy meetup Aug 2015

Page 8: How computers understand text content - by Anna Divoli

Named Entity Recognition

Find and classify names…

S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month.

John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee.

Anna Divoli Auckland content strategy meetup Aug 2015

Page 9: How computers understand text content - by Anna Divoli

Named Entity Recognition

Find and classify names…

S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month.

John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee.

People Locations Organizations

Methods: lexicon-based (gazeteers)grammar-based (rule-based)

✓ statistical models (machine learning: algorithms + features)

✓ hybrids Anna Divoli Auckland content strategy meetup Aug 2015

Page 10: How computers understand text content - by Anna Divoli

Named Entity Recognition

Find and classify names…

S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month.

John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee.

People DatesLocations Organizations

Who? Where?

When?

Anna Divoli Auckland content strategy meetup Aug 2015

Page 11: How computers understand text content - by Anna Divoli

Disambiguation & Normalization:Word Sense Disambiguation & Text Normalization

S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month.

John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee.

Word Sense Disambiguation: identifying which sense/meaning of a word is used in a sentence, when the word has multiple meanings. Synonyms & homonyms. Use context!!

Text normalization: transforming text into a single canonical form that it might not have had before.

Anna Divoli Auckland content strategy meetup Aug 2015

Page 12: How computers understand text content - by Anna Divoli

Word Sense Disambiguation & Text Normalization

S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month.

Sam Arlington initiated partnership discussions during his visit to Eureka offices in July.

John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee.

J. Smith went to Washington DC to see the Smithsonian Institute and also met up with Virginia Peterson for a coffee.

Anna Divoli Auckland content strategy meetup Aug 2015

Page 13: How computers understand text content - by Anna Divoli

S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month.

Sam Arlington initiated partnership discussions during his visit to Eureka office in July.

John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee.

J. Smith went to Washington DC to see the Smithsonian Institute and also met up with Virginia Peterson for a coffee.

Word Sense Disambiguation & Text Normalization

Anna Divoli Auckland content strategy meetup Aug 2015

Page 14: How computers understand text content - by Anna Divoli

Fact & Relationship extraction

S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month.

John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee.

What?

Anna Divoli Auckland content strategy meetup Aug 2015

Page 15: How computers understand text content - by Anna Divoli

Deeper knowledge & Sentiment

S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month.

John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee.

How? Why? How do we feel about it?

S. Arlington visited the Eureka’s Ltd offices last month to initiate partnership discussions.

John Smith was delighted to go to Washington to see the Smithsonian and also met up with Virginia for a coffee.

Anna Divoli Auckland content strategy meetup Aug 2015

Page 16: How computers understand text content - by Anna Divoli

Sentiment analysis & opinion mining

• Dictionary-based (e.g. LIWC)• Statistical• Hybrid

• Polarity & strength • Feelings• Mood• Aspects• Who has this sentiment (source)• What is the target of the sentiment

Pos | Neu | Neg & scoreAngry, sad…Happy, depressed…Location, cleanliness…Employees, customers…Product, event, person…

Anna Divoli Auckland content strategy meetup Aug 2015

Page 17: How computers understand text content - by Anna Divoli

So, what’s in the text?

Anna Divoli Auckland content strategy meetup Aug 2015

• Entities• Facts• Relations• Themes/topics no training or ontologies need!

can utilize web resources (e.g., Wikipedia)• Opinions & sentiment• …

+ Time/Location dimensions:• Trends & paradigm shifts• Networks• …

Page 18: How computers understand text content - by Anna Divoli

So, what ELSE is in the text?• Ambiguity• Metaphors• Sarcasm• Colloquialism/Slang• Negation• Hedging• Conditional statements• Inconsistencies/Bad grammar• Text speak• Anaphora• Humor

I want an apple.He drowned in a sea of grief.George W Bush. Love him!I slept like crap last night. I am not sure I want to go to NYC.The results indicate this.When it rains I feel sad.I think your smart.C u l8r @JacksJohn met with Nick. He was upset. Did you take a bath today? No. Is one missing?

Anna Divoli Auckland content strategy meetup Aug 2015

Page 19: How computers understand text content - by Anna Divoli

So, what ELSE is in the text?• Ambiguity• Metaphors• Sarcasm• Colloquialism/Slang• Negation• Hedging• Conditional statements• Inconsistencies/Bad grammar• Text speak• Anaphora• Humor

I want an apple.He drowned in a sea of grief.George W Bush. Love him!I slept like crap last night. I am not sure I want to go to NYC.The results indicate this.When it rains I feel sad.I think your smart.C u l8r @JacksJohn met with Nick. He was upset. Did you take a bath today? No. Is one missing?

Consider: distributed information (dialogue), technical/scientific text, legal text, creative/poetry…

Anna Divoli Auckland content strategy meetup Aug 2015

Page 20: How computers understand text content - by Anna Divoli

Human language!

Eye drops off shelf.

Include your children when baking cookies.

Turn right here.

John saw the man on the mountain with a telescope.

He gave her cat food.

They are hunting dogs. Anna Divoli Auckland content strategy meetup Aug 2015

Page 21: How computers understand text content - by Anna Divoli

Examples: Biology…

Looking for: interactions between SAF and viral LTR elements(SAF is a transcription factor, LTR stands for ‘long terminal repeat’)(Also: SAF = single and free, LTR = long term relationship)

Gene names:tinman, lilliputian, dreadlocks, lush, cheap date, methuselah, Van Gogh, maggie, brainiac, grim, reaper, cleopatra, swiss cheese, fucK, out cold, ken and barbie, kenny, lava lamp, hamlet, sonic hedgehog, werewolf, half pint, drop dead, chardonnay, agnostic, I’m not dead yet…

Anna Divoli Auckland content strategy meetup Aug 2015

Page 22: How computers understand text content - by Anna Divoli

Current State of NLP

• Rule-based systems for high precision results

• Hybrid systems for more robust performance (rules + dictionaries/ontologies + statistical models)

• Limitation: specialized systems perform better (much like humans!)

• Workflows offer work-around for more generic systemse.g., check language check category choose model

Anna Divoli Auckland content strategy meetup Aug 2015

Page 23: How computers understand text content - by Anna Divoli

Examples of applications

(some are very specialized!)

Anna Divoli Auckland content strategy meetup Aug 2015

Page 24: How computers understand text content - by Anna Divoli
Page 25: How computers understand text content - by Anna Divoli
Page 26: How computers understand text content - by Anna Divoli
Page 27: How computers understand text content - by Anna Divoli
Page 28: How computers understand text content - by Anna Divoli
Page 29: How computers understand text content - by Anna Divoli
Page 30: How computers understand text content - by Anna Divoli

Content Enrichment

Content Inventory

Content Intelligence

Page 31: How computers understand text content - by Anna Divoli
Page 32: How computers understand text content - by Anna Divoli
Page 33: How computers understand text content - by Anna Divoli

pingar.com/discoveryone/

www.youtube.com/watch?v=i9FnMylGQxw

Page 34: How computers understand text content - by Anna Divoli

Take home messages

• Machines can do a lot of consistent, fast information extraction

• Specialization is needed in several fields but systems can have internal workflows

• Big data + statistics = magic!

• Always room for improvement

• Information management AND decisions AND predictions

Page 35: How computers understand text content - by Anna Divoli

Time for questions and discussion!

https://xkcd.com/1263/

Anna Divoli Auckland content strategy meetup Aug 2015

@annadivoli.