Top Banner
Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN / LSTM, Eugenio Culurciello Wise words to live by indeed
33

Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

May 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Recurrent Neural Networks11-785 / 2020 Spring / Recitation 7

Vedant Sanil, David Park

“Drop your RNN and LSTM, they are no good!”The fall of RNN / LSTM, Eugenio Culurciello

Wise words to live by indeed

Page 2: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Content

• 1 Language Model

• 2 RNNs in PyTorch

• 3 Training RNNs

• 4 Generation with an RNN

• 5 Variable length inputs

Page 3: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

A recurrent neural network and the unfolding in time of the computation involved in its forward computation.

Page 4: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

RNNs Are Hard to TrainWhat isn’t? I had to spend a weektraining an MLP :(

Page 5: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Different Tasks

Each rectangle is a vector and arrows represent functions (e.g. matrix multiply). Input vectors are in red, output vectors are in blue and green vectors hold the RNN's state (more on this later)

Page 6: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Different Tasks

One-to-OneA simple MLP, no recurrence

One-To-ManyExample, Image Captioning: Have a single

image, generate a sequence of words

Page 7: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Different Tasks

Many-to-OneExample, Sentiment analysis: Given a sentence, classify if its sentiment as positive or negative

Many-To-ManyExample, Machine Translation: Have an input sentence

in one language, convert it to another language

A history of machine translations from Cold War to Deep Learning:https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/

Page 8: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Language Models

• Goal: predict the “probability of a sentence” P(E)

• How likely it is to be an actual sentence

Credits to cs224n

A lot of jargon, to basically say:

Page 9: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

An RNN Language Model

Page 10: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

RNN module in Pytorch

Page 11: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

RNN modules in Pytorch

• Important: the outputs are exactly the hidden states of the final layer. Hence if the model returns y, h:

• y: (seq_len, batch, num_directions * hidden_size)

• h: (num_layers * num_directions, batch, hidden_size)

• Lets num_directions = 1, y[-1] == h[-1]

• LSTM and GRU:

• (h_n, c_n): (hidden_states, memory_cell)

Page 12: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Can be better visualized as,

Page 13: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

RNN cell

Page 14: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Embedding Layers

Page 15: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Training a Language Model

Page 16: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Evaluate your model

Page 17: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Generation

Page 18: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Greedy Search, Random Search and Beam Search1. Greedy search: select the most likely word

2. Random Search: sample a word from the distribution

3. Beam Search: keep the n best words at each step, n is the beam size

Page 19: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Are we done?

Page 20: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

• No.

Page 21: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

How to train a LM: fixed length

Page 22: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Limits of fixed-length inputs

Page 23: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

How to train a LM: Variable Length• Your dataset is now a list of N sequences of different lengths

• The input has a fixed dimension (seq_len, batch, input_size)

• How could we deal with this situation?

1. pad_sequence

2. Packed sequence

Page 24: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

pad_sequence

Notice how seq_len is different, but input size is same

Page 25: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

torch.Size([4, 3, 2])

(seq_len, batch, input_size)

pad_sequence

Page 26: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

pack_sequence

packed_2 = pack_sequence([x1,x2,x3], enforce_sorted=True) packed = pack_sequence([x1,x2,x3], enforce_sorted=False)

Page 27: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

pack_padded_sequence and pad_packed_sequence

Page 28: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Why is the batch_size = tensor([3, 2, 2, 1]) here?

batch_sizes (Tensor): Tensor of integers holding information about the batch size at each sequence stepFor instance, given data ``abc`` and ``x`` the:class:`PackedSequence` would contain data ``axbc`` with``batch_sizes=[2,1,1]``.

Page 29: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Packed Sequences and RNNs

• Packed sequences are on the same device as the padded sequence

• Packed sequences could help your RNNs know the length for each instance

Page 30: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

MLP, RNN, CNN, Transformer• All these layers are just features extractors

• Temporal convolutional network (TCN) “outperform canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory”

(An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling)

Page 31: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN
Page 32: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Is that enough?

• RNN, LSTM, GRU

• Transformer (would be covered more in the future lectures and recitations)

• CNN

Credits: Attention is all you need

Page 33: Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN

Papers

• https://arxiv.org/pdf/1708.02182.pdf

• For more tricks about Regularizing and Optimizing LSTM Language Models

• Attention is all you need!

• https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0

• LSTM song!!!