PhD defense Koen Deschacht

1

Weakly supervised methods for information extraction

PhD defense Koen Deschacht

Supervisors : Prof. MarieFrancine Moens Prof. Danny De Schreye

2

Overview

3

Information extraction

Detect and classify structures in unstructured Text Images / video

Examples

Word sense disambiguation in (WSD)Semantic role labeling (SRL)Visual entity detection

4

WSD: Determine meaning of a word

He kicked the ball in the goal.At a formal ball attendees wear evening attire.He stood on the balls of his feet.



5

SRL: Who is doing what, where ?

John broke the window with a stone.John broke the window with little doubt.The window broke.



6

Who/what is present in the image?

Hillary ClintonBill Clinton



7

Common approach:

Word sense disambiguationSemantic role labelingVisual entity detection

and many, many more...



8

Common approach:

Word sense disambiguationSemantic role labelingVisual entity detection

and many, many more...



Supervised machinelearning methods

9

Supervised machine learning

Statistical methods that are trained on many annotated examplesSRL : 113.000 verbsWSD : 250.000 wordsLearn soft rules from the data

10

Example: WSD

Ball = round object1. He kicked the ball in the goal.2. Ricardo blocks the ball as Benzema tries to shoot.3. Patrice Evra almost kicked the ball in his own goal.…

Ball = formal dance1. Obama and his wife danced at the inaugural ball.2. Casey Gillis was dressed in a white ball gown.3. Dance Unlimited's Spring Ball takes place tomorrow....

11

Example: WSD

Machine learning methods can combine many complimentary and/or contradicting rules

Soft rules : If “kicked” If “goal” ...

If “dance” If “gown” ...

ball = “round object”

ball = “formal dance”

12

Supervised machine learning

Current stateoftheart machine learning methods

Machine learning method often independent of task

Successful for many tasksFlexible, fast development

for new tasksOnly some expert

knowledge needed

Manually annotated corpus needed for every new task, language or domain

Features need to be manually engineered

High variation of language limits performance even with large training corpora

13

Solution: use unlabeled data

Unlabeled data: cheap, available for many domains and languagesSemisupervised learning

Optimize single function that incorporates labeled and unlabeled dataViolation of assumptions cause deteriorating results when adding more unlabeled data

Unsupervised learningFirst learn model on unlabeled data, then use model in supervised machine learning method

14

Distributional hypothesis

It is possible to determine the meaning of a word by investigating its occurrence in a corpus.

Example:

What does “pulque” mean?

15

Distributional hypothesis

It is possible to determine the meaning of a word by investigating its occurrence in a corpus.

Example:“It takes a maguey plant twelve years before it is mature enough to produce the sap for pulque.”“The consumption of pulque peaked in the 1800’s.”“After the Conquest, pulque lost its sacred character, and both indigenous and Spanish people began to drink it.”“In this way, the making of pulque passed from being a homemade brew to one commercially produced.”

16

Latent words language model

Directed Bayesian model that models likely synonyms of a word, depending on context.Automatically learns synonyms and related words.

17


We hope there is an increasing need for reform

Original sentence

18




I believe this was the enormous chance of restructuring

They think that 's no important demand to change

You feel it are some increased potential that peace... ... ... ... ... ... ... ... ...

Automatically learned synonyms

19




I believe this was the enormous chance of restructuring

They think that 's no important demand to change

You feel it are some increased potential that peace... ... ... ... ... ... ... ... ...

Time to compute all possible combinations: ~ very, very long...Approximate: consider only most likely ~ pretty fast

20

LWLM: quality

Measure how well the model can predict new, previously unseen texts in terms of perplexity

LWLM outperforms other language models

Model Reuters APNews EnWikiADKN 114.96 134.42 161.41

IBM 108.38 125.65 149.21

LWLM 108.78 124.57 151.98

int. LWLM 96.45 112.81 138.03

21

LWLM for information extraction

Word sense disambiguation

Semantic role labeling

Latent words : help with underspecification and ambiguity

standard + cluster features + hidden words66.32% 66.97% 67.61%

5% 20% 50% 100%40%

50%

60%

70%

80%

90%

standard+ clusters+ hidden words

22

Automatic annotation of images & video

Texts describe content of imagesExtract information in structured format

EntitiesAttributesActionsLocations

23

Automatic annotation of images & video

Texts describe content of imagesExtract information in structured format

EntitiesAttributesActionsLocations

24

Annotation of entities in images

Extract entities from descriptive news text that are present in the image.Former President Bill Clinton, left, looks on as an honor guard folds the U.S. flag during a graveside service for Lloyd Bentsen in Houston, May 30, 2006. Bentsen, a former senator and former treasury secretary, died last week at the age of 85.

service Lloyd Bentsen Houston age ...

Bill Clinton guard flag

25


Assumption: Entity is present in image if important in descriptive text and possible to perceive visually.

Salience: Dependent on textCombines analysis of discourse and syntax

Visualness:Independent of text Extracted from semantic database

26


Bill Clinton guard flag

Former President Bill Clinton, left, looks on as an honor guard folds the U.S. flag during a graveside service for Lloyd Bentsen in Houston, May 30, 2006. Bentsen, a former senator and former treasury secretary, died last week at the age of 85.

service Lloyd Bentsen Houston age ...

27

Salience

Is the entity important in descriptive text?Discourse model

Important entities are referred to by other entities and terms.Graph models entities, coreferents and other terms Eigenvectors find most important entities

Syntactic modelImportant entities appear high in parse treeImportant entities have many children in tree

28

Visualness

Can the entity be perceived visually?Similarity measure on entities in WordNet

s(“car”,“truck”) = 0.88s(“car”,“horse”) = 0.38s(“horse”, “cow”) = 0.79

Visual seeds “person”, “vehicle” , “animal”, ...

Nonvisual seeds “thought”, “power”, “air”, …

Visualness: combine similarity measure and seeds“entities close to visual seeds will be visual”

s(“thought”,“house”) = 0.23s(“house”,“building”) = 0.91s(“car”, “house”) = 0.40

29

Annotation of entities: Results

Appearance model : combine visualness and salience

Appearance model dramatically increases accuracy!

All entities + visualness + salience + salience + visualness

26.66% 62.78% 59.56% 69.39%

30

Scene location annotation

Annotate location of every scene in sitcom series Input : video and transcript

Shot of Buffy opening the refrigerator and taking out a carton of milk. Buffy sniffs the milk and puts it on the counter. In the background we see Dawn opening a cabinet to get out a box of cereal. Buffy turns away.

31

Scene location annotation

Annotate location of every scene in sitcom series

Dawn's room the kitchen

the living room the street

32

Scene segmentation

Segment transcript and video in scenesScene cut classifier in textShot cut detector in video

Shot of Buffy opening the refrigerator and taking out a carton of milk. Buffy sniffs the milk and puts it on the counter. In the background we see Joyce drinking coffee and Dawn opening a cabinet to get out a box of cereal. ...Buffy & Riley move into the living room. They sit on the sofa. Buffy nods in resignation. Smooch. Riley gets up. Cut to a shot of a bright red convertible driving down the street. Giles is at the wheel, Buffy beside him and Dawn in the back. Classical music plays on the radio. ....

Transcript

Scen

e cu

t s

33

Scene segmentation


34

Scene segmentation


Shot of Buffy opening the refrigerator and taking out a carton of milk. ...Buffy & Riley move into the living room. They sit on the sofa. …Cut to a shot of a bright red convertible driving down the street.....

35

Location detection and propagation

Detect locations in text

Propagate locations to other scenesLatent Dirichlet allocation: learn correlation locations & other objects (“refrigerator” “kitchen”)→Visual reweighting: visually similar scenes should be in the same location

Shot of Buffy opening the refrigerator and taking out a carton of milk. ...Buffy & Riley move into the living room. They sit on the sofa. Cut to a shot of a bright red convertible driving down the street.

36

Location annotation results

Scene cut classifier

Location detector

Location annotation

precision recall f1measure

91.71% 97.48% 85.16%

precision recall f1measure

68.75% 75.54% 71.98%

episode only text text + LDA text + LDA + vision

2 54.72% 58.89% 57.39%

3 60.11% 65.87% 68.57%

37

Contributions 1/2

The latent words language modelBest ngram language modelUnsupervised learning of word similarities Unsupervised disambiguation of words

Using the latent words for WSDBest WSD system

Using the latent words for SRLImprovement of soa classifier

38

Contributions 2/2

Image annotation : First full analysis of entities in descriptive textsVisualness: capture knowledge from WordNet Salience: capture knowledge from syntactic properties

Location annotation : Automatic annotation of locations from transcriptsIncluding new locationsIncluding locations that are not explicitly mentioned

39

Thank you!

Questions?

Comments?

PhD defense Koen Deschacht

Technology

ifkicked ifgoal ball

ifdance ifgownball

useunlabeleddata unlabeleddata

hillaryclinton billclinton6