Top Banner
Word embeddings and neural language modeling AACIMP 2015 Sergii Gavrylov
57

(Kpi summer school 2015) word embeddings and neural language modeling

Jan 22, 2017

Download

Science

Sergii Gavrylov
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: (Kpi summer school 2015) word embeddings and neural language modeling

Word embeddings and neural language modeling

AACIMP 2015Sergii Gavrylov

Page 2: (Kpi summer school 2015) word embeddings and neural language modeling

Overview● Natural language processing

● Word representations

● Statistical language modeling

● Neural models

● Recurrent neural network models

● Long short-term memory rnn models

Page 3: (Kpi summer school 2015) word embeddings and neural language modeling

Natural language processing

● NLP mostly works with text data (but its methods could be applied

to music, bioinformatic, speech etc.)

● From the perspective of machine learning NL is a sequence of

variable-length sequences of high-dimensional vectors.

Page 4: (Kpi summer school 2015) word embeddings and neural language modeling

Word representation

Page 5: (Kpi summer school 2015) word embeddings and neural language modeling

One-hot encodingV = {zebra, horse, school, summer}

Page 6: (Kpi summer school 2015) word embeddings and neural language modeling

One-hot encodingV = {zebra, horse, school, summer}

v(zebra) = [1, 0, 0, 0] v(horse) = [0, 1, 0, 0]v(school) = [0, 0, 1, 0]v(summer) = [0, 0, 0, 1]

Page 7: (Kpi summer school 2015) word embeddings and neural language modeling

One-hot encodingV = {zebra, horse, school, summer}

v(zebra) = [1, 0, 0, 0] v(horse) = [0, 1, 0, 0]v(school) = [0, 0, 1, 0]v(summer) = [0, 0, 0, 1]

(+) Pros:Simplicity

(-) Cons:One-hot encoding can be memory inefficientNotion of word similarity is undefined with one-hot encoding

Page 8: (Kpi summer school 2015) word embeddings and neural language modeling

Distributional representation

Is there a representation that preserves the similarities of word meanings?

d(v(zebra), v(horse)) < d(v(zebra), v(summer))

Page 9: (Kpi summer school 2015) word embeddings and neural language modeling

Distributional representation

Is there a representation that preserves the similarities of word meanings?

d(v(zebra), v(horse)) < d(v(zebra), v(summer))

“You shall know a word by the company it keeps” - John Rupert Firth

Page 10: (Kpi summer school 2015) word embeddings and neural language modeling

Distributional representation

clic.cimec.unitn.it/marco/publications/acl2014/lazaridou-etal-wampimuk-acl2014.pdf

“A cute, hairy wampimuk is sitting on the hands.”

Page 11: (Kpi summer school 2015) word embeddings and neural language modeling

Distributional representation

www.cs.ox.ac.uk/files/6605/aclVectorTutorial.pdf

Page 12: (Kpi summer school 2015) word embeddings and neural language modeling

Distributional representation

www.cs.ox.ac.uk/files/6605/aclVectorTutorial.pdf

Page 13: (Kpi summer school 2015) word embeddings and neural language modeling

Distributional representation

(+) Pros:SimplicityHas notion of word similarity

(-) Cons:Distributional representation can be memory inefficient

Page 14: (Kpi summer school 2015) word embeddings and neural language modeling

Distributed representation

V is a vocabulary

wi ∈ V

v(wi) ∈ Rn

v(wi) is a low-dimensional, learnable,

dense word vector

Page 15: (Kpi summer school 2015) word embeddings and neural language modeling

Distributed representation

colah.github.io/posts/2014-07-NLP-RNNs-Representations

Page 16: (Kpi summer school 2015) word embeddings and neural language modeling

Distributed representation

(+) Pros:Has notion of word similarityis memory efficient (low dimensional)

(-) Cons:is computationally intensive

Page 17: (Kpi summer school 2015) word embeddings and neural language modeling

Distributed representation as a lookup table

W is a matrix whose rows are v(wi) ∈ Rn

v(wi) returns ith row of W

Page 18: (Kpi summer school 2015) word embeddings and neural language modeling

Statistical language modeling

A sentence s = (x1, x

2, … , x

T)

How likely is s?p(x

1, x

2, … , x

T)

according to the chain rule (probability)

Page 19: (Kpi summer school 2015) word embeddings and neural language modeling

n-gram modelsn-th order Markov assumption

Page 20: (Kpi summer school 2015) word embeddings and neural language modeling

n-gram modelsn-th order Markov assumption

bigram model of s = (a, cute, wampimuk, is, on, the, tree, .)1. How likely does ‘a’ follow ‘<S>’?2. How likely does ‘cute’ follow ‘a’?3. How likely does ‘wampimuk’ follow ‘cute’?4. How likely does ‘is’ follow ‘wampimuk’?5. How likely does ‘on’ follow ‘is’?6. How likely does ‘the’ follow ‘on’?7. How likely does ‘tree’ follow ‘the’?8. How likely does ‘.’ follow ‘tree’?9. How likely does ‘<\S>’ follow ‘.’?

Page 21: (Kpi summer school 2015) word embeddings and neural language modeling

n-gram modelsn-th order Markov assumption

bigram model of s = (a, cute, wampimuk, is, on, the, tree, .)

the counts are obtained from a training corpus

Page 22: (Kpi summer school 2015) word embeddings and neural language modeling

n-gram modelsIssues:

Data sparsityLack of generalization:

[ride a horse], [ride a llama] [ride a zebra]

Page 23: (Kpi summer school 2015) word embeddings and neural language modeling

Neural language model

ride

a

lookup table

Page 24: (Kpi summer school 2015) word embeddings and neural language modeling

Neural language model0000100�0

0000001�0

ride

a

lookup table

Page 25: (Kpi summer school 2015) word embeddings and neural language modeling

Neural language model0000100�0

0000001�0

...

ride

a

lookup table

Page 26: (Kpi summer school 2015) word embeddings and neural language modeling

0000100�0

0000001�0

...

ride

a

lookup table Neural language model

Page 27: (Kpi summer school 2015) word embeddings and neural language modeling

Neural language model0000100�0

0000001�0

...

ride

a

lookup table

neural network

Page 28: (Kpi summer school 2015) word embeddings and neural language modeling

Neural language model0000100�0

0100000�0

0000001�0

...

ride

a

lookup table

neural network

Page 29: (Kpi summer school 2015) word embeddings and neural language modeling

Neural language model0000100�0

0100000�0

0000001�0

...

ride

a

lookup table

neural network

zebra should have representation similar to horse and llama

Now we can generalize to the unseen n-grams

[ride a zebra]

Page 30: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

There is no Markov assumption

Page 31: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

There is no Markov assumption

arxiv.org/pdf/1503.04069v1.pdf

Page 32: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

yesterday

Page 33: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

yesterday

Page 34: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

yesterday

Page 35: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

yesterday

Page 36: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

yesterday we

Page 37: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

yesterday we

Page 38: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

yesterday we

Page 39: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

yesterday we

Page 40: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

yesterday we were

Page 41: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

yesterday we were

Page 42: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

yesterday we were

Page 43: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

yesterday we were

Page 44: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

yesterday we were riding a

Page 45: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

yesterday we were riding a

horse

Page 46: (Kpi summer school 2015) word embeddings and neural language modeling

Recurrent neural network models

Vanishing/exploding gradient problemwww.jmlr.org/proceedings/papers/v28/pascanu13.pdf

Naïve transition has difficulty in handling long-term dependencies

Page 47: (Kpi summer school 2015) word embeddings and neural language modeling

Long short-term memory rnn models

arxiv.org/pdf/1503.04069v1.pdf

Page 48: (Kpi summer school 2015) word embeddings and neural language modeling

Long short-term memory rnn models

arxiv.org/pdf/1503.04069v1.pdf

Page 49: (Kpi summer school 2015) word embeddings and neural language modeling

Long short-term memory rnn models

arxiv.org/pdf/1503.04069v1.pdf

Page 50: (Kpi summer school 2015) word embeddings and neural language modeling

Long short-term memory rnn models

arxiv.org/pdf/1503.04069v1.pdf

Page 51: (Kpi summer school 2015) word embeddings and neural language modeling

Long short-term memory rnn models

arxiv.org/pdf/1503.04069v1.pdf

Page 52: (Kpi summer school 2015) word embeddings and neural language modeling

Long short-term memory rnn models

arxiv.org/pdf/1503.04069v1.pdf

Page 53: (Kpi summer school 2015) word embeddings and neural language modeling

Long short-term memory rnn models

arxiv.org/pdf/1503.04069v1.pdf

Page 55: (Kpi summer school 2015) word embeddings and neural language modeling

Image captioning

cs.stanford.edu/people/karpathy/cvpr2015.pdf

Page 57: (Kpi summer school 2015) word embeddings and neural language modeling

Conclusion

CS224d: Deep Learning for Natural Language Processingcs224d.stanford.edu

● Neural methods provide us with a powerful set of tools for embedding language.

● They provide better ways of tying language learning to extra-linguistic contexts (images, knowledge-bases, cross-lingual data).