Иван Лобов, Data-Centric Alliance, «Текущие тенденции в сфере исследования глубокого обучения»

DL research trendsbased on AAAI 2016 proceedings

Attention & Memory

Ilya Sutskever, Research Director at OpenAI

Problem - limited memory for sequences

Source: WildML blog post

http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/

Solution - use direct weighted connections

Source: WildML blog post

http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/

Text attention - better long-term memory

Source: Karl Moritz Hermann et al, arXiv

http://arxiv.org/pdf/1506.03340v3.pdf

Text attention - better translation

Source: Dzmitry Bahdanau et al, arXiv


Text attention - better Q&A

Source: Ming Tan et al, arXiv

http://arxiv.org/pdf/1511.04108.pdf

Memory - Context + Q&A

Source: Sainbayar Sukhbaatar et al, arXiv


Visual Attention - better cap generation

Source: Kelvin Xu et al, arXiv


Adversarial Networks

Problem - MCMC-based sampling is hard● Backprop is good, but it requires direct feedback / gradient● Hard to train anything non-backprop

Source: Deep Learning Book

http://www.deeplearningbook.org/

Solution - 2-players game

Sources: Torch blog post

1. Take samples from original distribution

2. Generative model tries to create new images

3. Discriminative model tries to distinguish between them

http://torch.ch/blog/2015/11/13/gan.html

Solution - 2-players game

Source: Ian J. Goodfellow et al., arXiv


After 1 training epoch

Source: Alec Radford et al., arXiv


After 5 training epochs



Smooth transitions in latent space

Sources: Torch blog post

http://www.youtube.com/watch?v=PmC6ZOaCAOs

http://torch.ch/blog/2015/11/13/gan.html

Vector arithmetic for visual concepts



Conditional Generative Adversarial Nets

Source: Mehdi Mirza et al., arXiv


Char-level text comprehension

Problem - word-level models ignores chars

● Word-level models cannot gracefully deal with new words

● Every new form of word is a new embedding to learn (unless stemmed or lemmatized)

● Char-level LSTMs are having difficulties learning high-level features (sentences, meaning, etc)

Solution - CharCNN (n-grams in core)

Sources: Rafal Jozefowicz et al, arXiv Yoon Kim et al, arXiv



Out-of-vocabulary examples

Source: Yoon Kim et al, arXiv


Char-level text comprehensionOn 7th of February 2016 Google sets a new record in language modeling, beating the previous best result by 41.5% in terms of perplexity

Key tricks:● Very large network● Importance Sampling● Char-level CNNs

Source: Rafal Jozefowicz et al, arXiv


Why perplexity matters?

Perplexity is a technical metric used to evaluate general language modeling algorithms.

But it influences all language tasks:● Better grammar checking● Better machine translation● Better text-generation from chatbot● Better document compression

Source: Coursera, Stanford NLP

https://class.coursera.org/nlp/lecture/129

Questions?

Иван Лобов, Data-Centric Alliance, «Текущие тенденции в сфере исследования глубокого обучения»

Science