Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant Sanil, David Park “Drop your RNN and LSTM, they are no good!” The fall of RNN / LSTM, Eugenio Culurciello Wise words to live by indeed
Recurrent Neural Networks11-785 / 2020 Spring / Recitation 7
Vedant Sanil, David Park
“Drop your RNN and LSTM, they are no good!”The fall of RNN / LSTM, Eugenio Culurciello
Wise words to live by indeed
Content
• 1 Language Model
• 2 RNNs in PyTorch
• 3 Training RNNs
• 4 Generation with an RNN
• 5 Variable length inputs
A recurrent neural network and the unfolding in time of the computation involved in its forward computation.
RNNs Are Hard to TrainWhat isn’t? I had to spend a weektraining an MLP :(
Different Tasks
Each rectangle is a vector and arrows represent functions (e.g. matrix multiply). Input vectors are in red, output vectors are in blue and green vectors hold the RNN's state (more on this later)
Different Tasks
One-to-OneA simple MLP, no recurrence
One-To-ManyExample, Image Captioning: Have a single
image, generate a sequence of words
Different Tasks
Many-to-OneExample, Sentiment analysis: Given a sentence, classify if its sentiment as positive or negative
Many-To-ManyExample, Machine Translation: Have an input sentence
in one language, convert it to another language
A history of machine translations from Cold War to Deep Learning:https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/
Language Models
• Goal: predict the “probability of a sentence” P(E)
• How likely it is to be an actual sentence
Credits to cs224n
A lot of jargon, to basically say:
An RNN Language Model
RNN module in Pytorch
RNN modules in Pytorch
• Important: the outputs are exactly the hidden states of the final layer. Hence if the model returns y, h:
• y: (seq_len, batch, num_directions * hidden_size)
• h: (num_layers * num_directions, batch, hidden_size)
• Lets num_directions = 1, y[-1] == h[-1]
• LSTM and GRU:
• (h_n, c_n): (hidden_states, memory_cell)
Can be better visualized as,
RNN cell
Embedding Layers
Training a Language Model
Evaluate your model
Generation
Greedy Search, Random Search and Beam Search1. Greedy search: select the most likely word
2. Random Search: sample a word from the distribution
3. Beam Search: keep the n best words at each step, n is the beam size
Are we done?
• No.
How to train a LM: fixed length
Limits of fixed-length inputs
How to train a LM: Variable Length• Your dataset is now a list of N sequences of different lengths
• The input has a fixed dimension (seq_len, batch, input_size)
• How could we deal with this situation?
1. pad_sequence
2. Packed sequence
pad_sequence
Notice how seq_len is different, but input size is same
torch.Size([4, 3, 2])
(seq_len, batch, input_size)
pad_sequence
pack_sequence
packed_2 = pack_sequence([x1,x2,x3], enforce_sorted=True) packed = pack_sequence([x1,x2,x3], enforce_sorted=False)
pack_padded_sequence and pad_packed_sequence
Why is the batch_size = tensor([3, 2, 2, 1]) here?
batch_sizes (Tensor): Tensor of integers holding information about the batch size at each sequence stepFor instance, given data ``abc`` and ``x`` the:class:`PackedSequence` would contain data ``axbc`` with``batch_sizes=[2,1,1]``.
Packed Sequences and RNNs
• Packed sequences are on the same device as the padded sequence
• Packed sequences could help your RNNs know the length for each instance
MLP, RNN, CNN, Transformer• All these layers are just features extractors
• Temporal convolutional network (TCN) “outperform canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory”
(An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling)
Is that enough?
• RNN, LSTM, GRU
• Transformer (would be covered more in the future lectures and recitations)
• CNN
Credits: Attention is all you need
Papers
• https://arxiv.org/pdf/1708.02182.pdf
• For more tricks about Regularizing and Optimizing LSTM Language Models
• Attention is all you need!
• https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0
• LSTM song!!!