Sequence Models - UMD

Post on 11-May-2022

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Sequence Models

Machine Learning: Jordan Boyd-GraberUniversity of MarylandLSTM VARIANTS

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 1 / 5

GRU simplifies slightly

No explicit memory

Only one gate

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 2 / 5

GRU simplifies slightly

Slightly fewer parameters

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 2 / 5

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

What’s most important part of LSTM

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

Bi-directional LSTMs

Simple extension, often slightly improve performance (but don’t alwaysmake sense for task)

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 4 / 5

Comparing architechtures

� GRUs seem competitive

� LSTM seems to be good tradeoff

� Bi-directional often offers slight improvement

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 5 / 5

top related