Top Banner
Weakly Supervised Machine Reading Isabelle Augenstein University College London October 2016
31

Weakly Supervised Machine Reading

Jan 22, 2018

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Weakly Supervised Machine Reading

Weakly Supervised

Machine Reading

Isabelle Augenstein

University College London

October 2016

Page 2: Weakly Supervised Machine Reading

What is Machine Reading?

• Automatic reading (i.e. encoding of text)

• Automatic understanding of text

• Useful ingredients for machine reading

• Representation learning

• Structured prediction

• Generating training data

Page 3: Weakly Supervised Machine Reading

Machine Reading

RNNs are a popular method for machine reading method_for ( MR, XXX )

Supporting text Question

u(q)r(s)

g(x)What is a good method

for machine reading?

Page 4: Weakly Supervised Machine Reading

Machine Reading Tasks

• Word Representation Learning

• Output: vector for each word

• Learn relations between words, learn to distinguish words from one

another

• Unsupervised objective: word embeddings

• Sequence Representation Learning

• Output: vector for each sentence / paragraph

• Learn how likely a sequence is given a corpus, learn what next

most likely word is given a sequence of words

• Unsupervised objective: unconditional language models, natural

language generation

• Supervised objective: sequence classification tasks

Page 5: Weakly Supervised Machine Reading

Machine Reading Tasks

• Pairwise Sequence Representation Learning

• Output: vector for pairs of sentences / paragraphs

• Learn how likely a sequence is given another sequence and a

corpus

• Pairs of sequences can be encoded independently or encoded

conditioned on one another

• Unsupervised objective: conditional language models

• Supervised objective: stance detection, knowledge base slot filling,

question answering

Page 6: Weakly Supervised Machine Reading

Talk Outline

• Learning emoji2vec Embeddings from their Description– Word representation learning, generating training data

• Numerically Grounded and KB Conditioned Language

Models– (Conditional) Sequence representation learning

• Stance Detection with Bidirectional Conditional Encoding– Conditional sequence representation learning, generating training data

Page 7: Weakly Supervised Machine Reading

Machine Reading: Word Representation

Learning

RNNs are a popular method for machine reading method_for ( MR, XXX )

Supporting text Question

What is a good method

for machine reading?

Page 8: Weakly Supervised Machine Reading

emoji2vec

• Emoji use has increased

• Emoji carry sentiment, which could be useful for

e.g. sentiment analysis

Page 9: Weakly Supervised Machine Reading

emoji2vec

Page 10: Weakly Supervised Machine Reading

emoji2vec

• Task: learn representations for emojis

• Problem: many emojis are used infrequently, and

typical word representation learning methods (e.g.

word2vec) require them to be seen several times

• Solution: learn emojis from their description

Page 11: Weakly Supervised Machine Reading

emoji2vec

• Method: emoji embedding is sum of word

embeddings of words in description

Page 12: Weakly Supervised Machine Reading

emoji2vec

• Results

– Emoji vectors are useful in addition to GoogleNews

vectors for sentiment analysis task

– Analogy task also works for emojis

Page 13: Weakly Supervised Machine Reading

emoji2vec

• Conclusions

– Alternative source for learning representations

(descriptions) very useful, especially for rare words

Page 14: Weakly Supervised Machine Reading

Machine Reading: Sequence Representation

Learning (Unsupervised)

RNNs are a popular method for machine reading method_for ( MR, XXX )

Supporting text Question

What is a good method

for machine reading?

Page 15: Weakly Supervised Machine Reading

Numerically Grounded + KB Conditioned

Language Models

Semantic Error Correction with Language Models

Page 16: Weakly Supervised Machine Reading

Numerically Grounded + KB Conditioned

Language Models

• Problem: clinical data contains many numbers,

many are unseen at test time

• Solution: concatenate RNN input embeddings with

numerical representations

• Problem: clinical data contains, in addition to

report, incomplete and inconsistent KB entry for

each patient, how to use it?

• Solution: lexicalise KB and condition on it

Page 17: Weakly Supervised Machine Reading

Numerically Grounded + KB Conditioned

Language Models

Page 18: Weakly Supervised Machine Reading

Numerically Grounded + KB Conditioned

Language Models

Model MAP P R F1

Random 27.75 5.73 10.29 7.36

Base LM 64.37 39.54 64.66 49.07

Cond 62.76 37.46 62.20 46.76

Num 68.21 44.25 71.19 54.58

Cond+Num 69.14 45.36 71.43 55.48

Semantic Error Correction Results

Page 19: Weakly Supervised Machine Reading

Numerically Grounded + KB Conditioned

Language Models

• Conclusions

– Accounting for out-of-vocabulary tokens at test time

increases performance

– Duplicate information from lexicalising KB can help

further

Page 20: Weakly Supervised Machine Reading

Machine Reading: Pairwise Sequence

Representation Learning (Supervised)

RNNs are a popular method for machine reading method_for ( MR, XXX )

Supporting text Question

u(q)r(s)

g(x)What is a good method

for machine reading?

Page 21: Weakly Supervised Machine Reading

Stance Detection with Conditional Encoding

“@realDonaldTrump is the only honest voice of the

@GOP”

• Task: classify attitude of a text towards a given

target as “positive”, ”negative”, or ”neutral”

• Example tweet is positive towards Donald Trump,

but (implicitly) negative towards Hillary Clinton

Page 22: Weakly Supervised Machine Reading

Stance Detection with Conditional Encoding

• Challenges

– Learn a model that interprets the tweet stance towards

a target that might not be mentioned in the tweet itself

– Learn model without labelled training data for the target

with respect to which we are predicting the stance

Page 23: Weakly Supervised Machine Reading

Stance Detection with Conditional Encoding

• Challenges

– Learn a model that interprets the tweet stance towards

a target that might not be mentioned in the tweet itself

• Solution: bidirectional conditional model

– Learn model without labelled training data for the target

with respect to which we are predicting the stance

• Solution 1: use training data labelled for other targets (domain

adaptation setting)

• Solution 2: automatically label training data for target, using a

small set of manually defined hashtags (weakly labelled setting)

Page 24: Weakly Supervised Machine Reading

Stance Detection with Conditional Encoding

Page 25: Weakly Supervised Machine Reading

Stance Detection with Conditional Encoding

• Domain Adaptation Setting

– Train on Legalization of Abortion, Atheism, Feminist

Movement, Climate Change is a Real Concern and

Hillary Clinton, evaluate on Donald Trump tweets

Model Stance P R F1

FAVOR 0.3145 0.5270 0.3939

Concat AGAINST 0.4452 0.4348 0.4399

Macro 0.4169

FAVOR 0.3033 0.5470 0.3902

BiCond AGAINST 0.6788 0.5216 0.5899

Macro 0.4901

Page 26: Weakly Supervised Machine Reading

Stance Detection with Conditional Encoding

• Weakly Supervised Setting

– Weakly label Donald Trump tweets using hashtags,

evaluate on Donald Trump tweets

Model Stance P R F1

FAVOR 0.5506 0.5878 0.5686

Concat AGAINST 0.5794 0.4883 0.5299

Macro 0.5493

FAVOR 0.6268 0.6014 0.6138

BiCond AGAINST 0.6057 0.4983 0.5468

Macro 0.5803

Page 27: Weakly Supervised Machine Reading

Stance Detection with Conditional Encoding

• Other findings

– Pre-training word embeddings on large in-domain

corpus with unsupervised objective and continuing to

optimise them towards supervised objective works well

• Better than pre-training without further optimisation, or random

initialisation, or Google News embeddings

– LSTM encoding of tweets and targets works better than

sum of word embeddings baseline, despite small

training set (7k – 14k instances)

– Almost all instances for which target mentioned in tweet

have non-neutral stance

Page 28: Weakly Supervised Machine Reading

Stance Detection with Conditional Encoding

• Conclusions

– Modelling sentence pair relationship is important

– Automatic labelling of in-domain tweets is even more

important

– Learning sequence representations also a good

approach for small data

Page 29: Weakly Supervised Machine Reading

Thank you!

isabelleaugenstein.github.io

[email protected]

@IAugenstein

github.com/isabelleaugenstein

Page 30: Weakly Supervised Machine Reading

References

Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak,

Sebastian Riedel. emoji2vec: Learning Emoji Representations from

their Description. SocialNLP at EMNLP 2016.

https://arxiv.org/abs/1609.08359

Georgios Spithourakis, Isabelle Augenstein, Sebastian Riedel.

Numerically Grounded Language Models for Semantic Error

Correction. EMNLP 2016. https://arxiv.org/abs/1608.04147

Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, Kalina

Bontcheva. Stance Detection with Bidirectional Conditional

Encoding. EMNLP 2016. https://arxiv.org/abs/1606.05464

Page 31: Weakly Supervised Machine Reading

Collaborators

Kalina Bontcheva

University of SheffieldAndreas Vlachos

University of Sheffield

George

Spithourakis

UCLMatko

Bošnjak

UCL

Sebastian Riedel

UCL

Tim Rocktäschel

UCL

Ben

Eisner

Princeton