Weakly Supervised Machine Reading

Weakly Supervised

Machine Reading

Isabelle Augenstein

University College London

October 2016

What is Machine Reading?

• Automatic reading (i.e. encoding of text)

• Automatic understanding of text

• Useful ingredients for machine reading

• Representation learning

• Structured prediction

• Generating training data

Machine Reading

RNNs are a popular method for machine reading method_for ( MR, XXX )

Supporting text Question

u(q)r(s)

g(x)What is a good method

for machine reading?

Machine Reading Tasks

• Word Representation Learning

• Output: vector for each word

• Learn relations between words, learn to distinguish words from one

another

• Unsupervised objective: word embeddings

• Sequence Representation Learning

• Output: vector for each sentence / paragraph

• Learn how likely a sequence is given a corpus, learn what next

most likely word is given a sequence of words

• Unsupervised objective: unconditional language models, natural

language generation

• Supervised objective: sequence classification tasks

Machine Reading Tasks

• Pairwise Sequence Representation Learning

• Output: vector for pairs of sentences / paragraphs

• Learn how likely a sequence is given another sequence and a

corpus

• Pairs of sequences can be encoded independently or encoded

conditioned on one another

• Unsupervised objective: conditional language models

• Supervised objective: stance detection, knowledge base slot filling,

question answering

Talk Outline

• Learning emoji2vec Embeddings from their Description– Word representation learning, generating training data

• Numerically Grounded and KB Conditioned Language

Models– (Conditional) Sequence representation learning

• Stance Detection with Bidirectional Conditional Encoding– Conditional sequence representation learning, generating training data

Machine Reading: Word Representation

Learning



What is a good method


emoji2vec

• Emoji use has increased

• Emoji carry sentiment, which could be useful for

e.g. sentiment analysis

emoji2vec

emoji2vec

• Task: learn representations for emojis

• Problem: many emojis are used infrequently, and

typical word representation learning methods (e.g.

word2vec) require them to be seen several times

• Solution: learn emojis from their description

emoji2vec

• Method: emoji embedding is sum of word

embeddings of words in description

emoji2vec

• Results

– Emoji vectors are useful in addition to GoogleNews

vectors for sentiment analysis task

– Analogy task also works for emojis

emoji2vec

• Conclusions

– Alternative source for learning representations

(descriptions) very useful, especially for rare words

Machine Reading: Sequence Representation

Learning (Unsupervised)



What is a good method


Numerically Grounded + KB Conditioned

Language Models

Semantic Error Correction with Language Models


Language Models

• Problem: clinical data contains many numbers,

many are unseen at test time

• Solution: concatenate RNN input embeddings with

numerical representations

• Problem: clinical data contains, in addition to

report, incomplete and inconsistent KB entry for

each patient, how to use it?

• Solution: lexicalise KB and condition on it


Language Models


Language Models

Model MAP P R F1

Random 27.75 5.73 10.29 7.36

Base LM 64.37 39.54 64.66 49.07

Cond 62.76 37.46 62.20 46.76

Num 68.21 44.25 71.19 54.58

Cond+Num 69.14 45.36 71.43 55.48

Semantic Error Correction Results


Language Models

• Conclusions

– Accounting for out-of-vocabulary tokens at test time

increases performance

– Duplicate information from lexicalising KB can help

further

Machine Reading: Pairwise Sequence

Representation Learning (Supervised)



u(q)r(s)

g(x)What is a good method


Stance Detection with Conditional Encoding

“@realDonaldTrump is the only honest voice of the

@GOP”

• Task: classify attitude of a text towards a given

target as “positive”, ”negative”, or ”neutral”

• Example tweet is positive towards Donald Trump,

but (implicitly) negative towards Hillary Clinton


• Challenges

– Learn a model that interprets the tweet stance towards

a target that might not be mentioned in the tweet itself

– Learn model without labelled training data for the target

with respect to which we are predicting the stance


• Challenges

– Learn a model that interprets the tweet stance towards

a target that might not be mentioned in the tweet itself

• Solution: bidirectional conditional model

– Learn model without labelled training data for the target

with respect to which we are predicting the stance

• Solution 1: use training data labelled for other targets (domain

adaptation setting)

• Solution 2: automatically label training data for target, using a

small set of manually defined hashtags (weakly labelled setting)



• Domain Adaptation Setting

– Train on Legalization of Abortion, Atheism, Feminist

Movement, Climate Change is a Real Concern and

Hillary Clinton, evaluate on Donald Trump tweets

Model Stance P R F1

FAVOR 0.3145 0.5270 0.3939

Concat AGAINST 0.4452 0.4348 0.4399

Macro 0.4169

FAVOR 0.3033 0.5470 0.3902

BiCond AGAINST 0.6788 0.5216 0.5899

Macro 0.4901


• Weakly Supervised Setting

– Weakly label Donald Trump tweets using hashtags,

evaluate on Donald Trump tweets

Model Stance P R F1

FAVOR 0.5506 0.5878 0.5686

Concat AGAINST 0.5794 0.4883 0.5299

Macro 0.5493

FAVOR 0.6268 0.6014 0.6138

BiCond AGAINST 0.6057 0.4983 0.5468

Macro 0.5803


• Other findings

– Pre-training word embeddings on large in-domain

corpus with unsupervised objective and continuing to

optimise them towards supervised objective works well

• Better than pre-training without further optimisation, or random

initialisation, or Google News embeddings

– LSTM encoding of tweets and targets works better than

sum of word embeddings baseline, despite small

training set (7k – 14k instances)

– Almost all instances for which target mentioned in tweet

have non-neutral stance


• Conclusions

– Modelling sentence pair relationship is important

– Automatic labelling of in-domain tweets is even more

important

– Learning sequence representations also a good

approach for small data

Thank you!

isabelleaugenstein.github.io

[email protected]

@IAugenstein

github.com/isabelleaugenstein

References

Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak,

Sebastian Riedel. emoji2vec: Learning Emoji Representations from

their Description. SocialNLP at EMNLP 2016.

https://arxiv.org/abs/1609.08359

Georgios Spithourakis, Isabelle Augenstein, Sebastian Riedel.

Numerically Grounded Language Models for Semantic Error

Correction. EMNLP 2016. https://arxiv.org/abs/1608.04147

Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, Kalina

Bontcheva. Stance Detection with Bidirectional Conditional

Encoding. EMNLP 2016. https://arxiv.org/abs/1606.05464




Collaborators

Kalina Bontcheva

University of SheffieldAndreas Vlachos

University of Sheffield

George

Spithourakis

UCLMatko

Bošnjak

UCL

Sebastian Riedel

UCL

Tim Rocktäschel

UCL

Ben

Eisner

Princeton

Weakly Supervised Machine Reading

Technology