Weakly Supervised Machine Reading

Weakly Supervised

Machine Reading

Isabelle Augenstein

University College London

October 2016

What is Machine Reading?

• Automatic reading (i.e. encoding of text)

• Automatic understanding of text

• Useful ingredients for machine reading

• Representation learning

• Structured prediction

• Generating training data

Machine Reading

RNNs are a popular method for machine reading method_for ( MR, XXX )

Supporting text Question

u(q)r(s)

g(x)What is a good method

for machine reading?

Machine Reading Tasks

• Word Representation Learning

• Output: vector for each word

• Learn relations between words, learn to distinguish words from one

another

• Unsupervised objective: word embeddings

• Sequence Representation Learning

• Output: vector for each sentence / paragraph

• Learn how likely a sequence is given a corpus, learn what next

most likely word is given a sequence of words

• Unsupervised objective: unconditional language models, natural

language generation

• Supervised objective: sequence classification tasks

Machine Reading Tasks

• Pairwise Sequence Representation Learning

• Output: vector for pairs of sentences / paragraphs

• Learn how likely a sequence is given another sequence and a

corpus

• Pairs of sequences can be encoded independently or encoded

conditioned on one another

• Unsupervised objective: conditional language models

• Supervised objective: stance detection, knowledge base slot filling,

question answering

Talk Outline

• Learning emoji2vec Embeddings from their Description– Word representation learning, generating training data

• Numerically Grounded and KB Conditioned Language

Models– (Conditional) Sequence representation learning

• Stance Detection with Bidirectional Conditional Encoding– Conditional sequence representation learning, generating training data

Machine Reading: Word Representation

Learning

What is a good method

emoji2vec

• Emoji use has increased

• Emoji carry sentiment, which could be useful for

e.g. sentiment analysis

emoji2vec

• Task: learn representations for emojis

• Problem: many emojis are used infrequently, and

typical word representation learning methods (e.g.

word2vec) require them to be seen several times

• Solution: learn emojis from their description

emoji2vec

• Method: emoji embedding is sum of word

embeddings of words in description

emoji2vec

• Results

– Emoji vectors are useful in addition to GoogleNews

vectors for sentiment analysis task

– Analogy task also works for emojis

emoji2vec

• Conclusions

– Alternative source for learning representations

(descriptions) very useful, especially for rare words

Machine Reading: Sequence Representation

Learning (Unsupervised)

What is a good method

Numerically Grounded + KB Conditioned

Language Models

Semantic Error Correction with Language Models

Language Models

• Problem: clinical data contains many numbers,

many are unseen at test time

• Solution: concatenate RNN input embeddings with

numerical representations

• Problem: clinical data contains, in addition to

report, incomplete and inconsistent KB entry for

each patient, how to use it?

• Solution: lexicalise KB and condition on it

Language Models

Model MAP P R F1

Random 27.75 5.73 10.29 7.36

Base LM 64.37 39.54 64.66 49.07

Cond 62.76 37.46 62.20 46.76

Num 68.21 44.25 71.19 54.58

Cond+Num 69.14 45.36 71.43 55.48

Semantic Error Correction Results

Language Models

• Conclusions

– Accounting for out-of-vocabulary tokens at test time

increases performance

– Duplicate information from lexicalising KB can help

further

Machine Reading: Pairwise Sequence

Representation Learning (Supervised)

u(q)r(s)

g(x)What is a good method

Stance Detection with Conditional Encoding

“@realDonaldTrump is the only honest voice of the

@GOP”

• Task: classify attitude of a text towards a given

target as “positive”, ”negative”, or ”neutral”

• Example tweet is positive towards Donald Trump,

but (implicitly) negative towards Hillary Clinton

• Challenges

– Learn a model that interprets the tweet stance towards

a target that might not be mentioned in the tweet itself

– Learn model without labelled training data for the target

with respect to which we are predicting the stance

• Challenges

– Learn a model that interprets the tweet stance towards

a target that might not be mentioned in the tweet itself

• Solution: bidirectional conditional model

– Learn model without labelled training data for the target

with respect to which we are predicting the stance

• Solution 1: use training data labelled for other targets (domain

adaptation setting)

• Solution 2: automatically label training data for target, using a

small set of manually defined hashtags (weakly labelled setting)

• Domain Adaptation Setting

– Train on Legalization of Abortion, Atheism, Feminist

Movement, Climate Change is a Real Concern and

Hillary Clinton, evaluate on Donald Trump tweets

Model Stance P R F1

FAVOR 0.3145 0.5270 0.3939

Concat AGAINST 0.4452 0.4348 0.4399

Macro 0.4169

FAVOR 0.3033 0.5470 0.3902

BiCond AGAINST 0.6788 0.5216 0.5899

Macro 0.4901

• Weakly Supervised Setting

– Weakly label Donald Trump tweets using hashtags,

evaluate on Donald Trump tweets

Model Stance P R F1

FAVOR 0.5506 0.5878 0.5686

Concat AGAINST 0.5794 0.4883 0.5299

Macro 0.5493

FAVOR 0.6268 0.6014 0.6138

BiCond AGAINST 0.6057 0.4983 0.5468

Macro 0.5803

• Other findings

– Pre-training word embeddings on large in-domain

corpus with unsupervised objective and continuing to

optimise them towards supervised objective works well

• Better than pre-training without further optimisation, or random

initialisation, or Google News embeddings

– LSTM encoding of tweets and targets works better than

sum of word embeddings baseline, despite small

training set (7k – 14k instances)

– Almost all instances for which target mentioned in tweet

have non-neutral stance

• Conclusions

– Modelling sentence pair relationship is important

– Automatic labelling of in-domain tweets is even more

important

– Learning sequence representations also a good

approach for small data

Thank you!

isabelleaugenstein.github.io

i.augenstein@ucl.ac.uk

@IAugenstein

github.com/isabelleaugenstein

References

Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak,

Sebastian Riedel. emoji2vec: Learning Emoji Representations from

their Description. SocialNLP at EMNLP 2016.

https://arxiv.org/abs/1609.08359

Georgios Spithourakis, Isabelle Augenstein, Sebastian Riedel.

Numerically Grounded Language Models for Semantic Error

Correction. EMNLP 2016. https://arxiv.org/abs/1608.04147

Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, Kalina

Bontcheva. Stance Detection with Bidirectional Conditional

Encoding. EMNLP 2016. https://arxiv.org/abs/1606.05464

Collaborators

Kalina Bontcheva

University of SheffieldAndreas Vlachos

University of Sheffield

George

Spithourakis

UCLMatko

Bošnjak

Sebastian Riedel

Tim Rocktäschel

Eisner

Princeton

Weakly Supervised Machine Reading

Technology

Weakly Supervised Complementary Parts Models for Fine...

Weakly Supervised Object Detection in...

Self-supervised Equivariant Attention Mechanism for Weakly.....

Weakly-Supervised Physically Unconstrained Gaze Estimation

Weakly Supervised Semantic Segmentation using Web-Crawled...

W2F: A Weakly-Supervised to Fully-Supervised Framework for.....

UntrimmedNets for Weakly Supervised Action … ·...

Weakly-Supervised Semantic Segmentation by Iteratively...

Geometry Constrained Weakly Supervised Object Localization

Weakly Supervised Correspondence Estimation

P ENCYCLOPEDIA: WEAKLY SUPERVISED K -PRETRAINED …

Cross-Domain Weakly-Supervised Object Detection Through...

Weakly Supervised Discriminative Feature Learning...

Self-Supervised Equivariant Attention Mechanism for Weakly.....

Boosting Weakly Supervised Object Detection with...

Weakly-Supervised Spatial Context Networks