Transcript
Weakly Supervised
Machine Reading
Isabelle Augenstein
University College London
October 2016
What is Machine Reading?
• Automatic reading (i.e. encoding of text)
• Automatic understanding of text
• Useful ingredients for machine reading
• Representation learning
• Structured prediction
• Generating training data
Machine Reading
RNNs are a popular method for machine reading method_for ( MR, XXX )
Supporting text Question
u(q)r(s)
g(x)What is a good method
for machine reading?
Machine Reading Tasks
• Word Representation Learning
• Output: vector for each word
• Learn relations between words, learn to distinguish words from one
another
• Unsupervised objective: word embeddings
• Sequence Representation Learning
• Output: vector for each sentence / paragraph
• Learn how likely a sequence is given a corpus, learn what next
most likely word is given a sequence of words
• Unsupervised objective: unconditional language models, natural
language generation
• Supervised objective: sequence classification tasks
Machine Reading Tasks
• Pairwise Sequence Representation Learning
• Output: vector for pairs of sentences / paragraphs
• Learn how likely a sequence is given another sequence and a
corpus
• Pairs of sequences can be encoded independently or encoded
conditioned on one another
• Unsupervised objective: conditional language models
• Supervised objective: stance detection, knowledge base slot filling,
question answering
Talk Outline
• Learning emoji2vec Embeddings from their Description– Word representation learning, generating training data
• Numerically Grounded and KB Conditioned Language
Models– (Conditional) Sequence representation learning
• Stance Detection with Bidirectional Conditional Encoding– Conditional sequence representation learning, generating training data
Machine Reading: Word Representation
Learning
RNNs are a popular method for machine reading method_for ( MR, XXX )
Supporting text Question
What is a good method
for machine reading?
emoji2vec
• Emoji use has increased
• Emoji carry sentiment, which could be useful for
e.g. sentiment analysis
emoji2vec
emoji2vec
• Task: learn representations for emojis
• Problem: many emojis are used infrequently, and
typical word representation learning methods (e.g.
word2vec) require them to be seen several times
• Solution: learn emojis from their description
emoji2vec
• Method: emoji embedding is sum of word
embeddings of words in description
emoji2vec
• Results
– Emoji vectors are useful in addition to GoogleNews
vectors for sentiment analysis task
– Analogy task also works for emojis
emoji2vec
• Conclusions
– Alternative source for learning representations
(descriptions) very useful, especially for rare words
Machine Reading: Sequence Representation
Learning (Unsupervised)
RNNs are a popular method for machine reading method_for ( MR, XXX )
Supporting text Question
What is a good method
for machine reading?
Numerically Grounded + KB Conditioned
Language Models
Semantic Error Correction with Language Models
Numerically Grounded + KB Conditioned
Language Models
• Problem: clinical data contains many numbers,
many are unseen at test time
• Solution: concatenate RNN input embeddings with
numerical representations
• Problem: clinical data contains, in addition to
report, incomplete and inconsistent KB entry for
each patient, how to use it?
• Solution: lexicalise KB and condition on it
Numerically Grounded + KB Conditioned
Language Models
Numerically Grounded + KB Conditioned
Language Models
Model MAP P R F1
Random 27.75 5.73 10.29 7.36
Base LM 64.37 39.54 64.66 49.07
Cond 62.76 37.46 62.20 46.76
Num 68.21 44.25 71.19 54.58
Cond+Num 69.14 45.36 71.43 55.48
Semantic Error Correction Results
Numerically Grounded + KB Conditioned
Language Models
• Conclusions
– Accounting for out-of-vocabulary tokens at test time
increases performance
– Duplicate information from lexicalising KB can help
further
Machine Reading: Pairwise Sequence
Representation Learning (Supervised)
RNNs are a popular method for machine reading method_for ( MR, XXX )
Supporting text Question
u(q)r(s)
g(x)What is a good method
for machine reading?
Stance Detection with Conditional Encoding
“@realDonaldTrump is the only honest voice of the
@GOP”
• Task: classify attitude of a text towards a given
target as “positive”, ”negative”, or ”neutral”
• Example tweet is positive towards Donald Trump,
but (implicitly) negative towards Hillary Clinton
Stance Detection with Conditional Encoding
• Challenges
– Learn a model that interprets the tweet stance towards
a target that might not be mentioned in the tweet itself
– Learn model without labelled training data for the target
with respect to which we are predicting the stance
Stance Detection with Conditional Encoding
• Challenges
– Learn a model that interprets the tweet stance towards
a target that might not be mentioned in the tweet itself
• Solution: bidirectional conditional model
– Learn model without labelled training data for the target
with respect to which we are predicting the stance
• Solution 1: use training data labelled for other targets (domain
adaptation setting)
• Solution 2: automatically label training data for target, using a
small set of manually defined hashtags (weakly labelled setting)
Stance Detection with Conditional Encoding
Stance Detection with Conditional Encoding
• Domain Adaptation Setting
– Train on Legalization of Abortion, Atheism, Feminist
Movement, Climate Change is a Real Concern and
Hillary Clinton, evaluate on Donald Trump tweets
Model Stance P R F1
FAVOR 0.3145 0.5270 0.3939
Concat AGAINST 0.4452 0.4348 0.4399
Macro 0.4169
FAVOR 0.3033 0.5470 0.3902
BiCond AGAINST 0.6788 0.5216 0.5899
Macro 0.4901
Stance Detection with Conditional Encoding
• Weakly Supervised Setting
– Weakly label Donald Trump tweets using hashtags,
evaluate on Donald Trump tweets
Model Stance P R F1
FAVOR 0.5506 0.5878 0.5686
Concat AGAINST 0.5794 0.4883 0.5299
Macro 0.5493
FAVOR 0.6268 0.6014 0.6138
BiCond AGAINST 0.6057 0.4983 0.5468
Macro 0.5803
Stance Detection with Conditional Encoding
• Other findings
– Pre-training word embeddings on large in-domain
corpus with unsupervised objective and continuing to
optimise them towards supervised objective works well
• Better than pre-training without further optimisation, or random
initialisation, or Google News embeddings
– LSTM encoding of tweets and targets works better than
sum of word embeddings baseline, despite small
training set (7k – 14k instances)
– Almost all instances for which target mentioned in tweet
have non-neutral stance
Stance Detection with Conditional Encoding
• Conclusions
– Modelling sentence pair relationship is important
– Automatic labelling of in-domain tweets is even more
important
– Learning sequence representations also a good
approach for small data
Thank you!
isabelleaugenstein.github.io
i.augenstein@ucl.ac.uk
@IAugenstein
github.com/isabelleaugenstein
References
Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak,
Sebastian Riedel. emoji2vec: Learning Emoji Representations from
their Description. SocialNLP at EMNLP 2016.
https://arxiv.org/abs/1609.08359
Georgios Spithourakis, Isabelle Augenstein, Sebastian Riedel.
Numerically Grounded Language Models for Semantic Error
Correction. EMNLP 2016. https://arxiv.org/abs/1608.04147
Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, Kalina
Bontcheva. Stance Detection with Bidirectional Conditional
Encoding. EMNLP 2016. https://arxiv.org/abs/1606.05464
Collaborators
Kalina Bontcheva
University of SheffieldAndreas Vlachos
University of Sheffield
George
Spithourakis
UCLMatko
Bošnjak
UCL
Sebastian Riedel
UCL
Tim Rocktäschel
UCL
Ben
Eisner
Princeton
top related