Reinforcement Learning for NLP Advanced Machine Learning for NLP Jordan Boyd-Graber REINFORCEMENT OVERVIEW, POLICY GRADIENT Adapted from slides by David Silver, Pieter Abbeel, and John Schulman Advanced Machine Learning for NLP | Boyd-Graber Reinforcement Learning for NLP | 1 of 1
29
Embed
Reinforcement Learning for NLP - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CSCI_7000/11a.pdf · Advanced Machine Learning for NLP jBoyd-Graber Reinforcement Learning for NLP 10 of
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Reinforcement Learning for NLP
Advanced Machine Learning for NLPJordan Boyd-GraberREINFORCEMENT OVERVIEW, POLICY GRADIENT
Adapted from slides by David Silver, Pieter Abbeel, and John Schulman
Advanced Machine Learning for NLP | Boyd-Graber Reinforcement Learning for NLP | 1 of 1
• I used to say that RL wasn’t used in NLP . . .
• Now it’s all over the place
• Part of much of ML hype
• But what is reinforcement learning?
◦ RL is a general-purpose framework for decision-making◦ RL is for an agent with the capacity to act◦ Each action influences the agent’s future state◦ Success is measured by a scalar reward signal◦ Goal: select actions to maximise future reward
Advanced Machine Learning for NLP | Boyd-Graber Reinforcement Learning for NLP | 2 of 1
• I used to say that RL wasn’t used in NLP . . .
• Now it’s all over the place
• Part of much of ML hype
• But what is reinforcement learning?
◦ RL is a general-purpose framework for decision-making◦ RL is for an agent with the capacity to act◦ Each action influences the agent’s future state◦ Success is measured by a scalar reward signal◦ Goal: select actions to maximise future reward
Advanced Machine Learning for NLP | Boyd-Graber Reinforcement Learning for NLP | 2 of 1
• At each step t the agent:
◦ Executes action at
◦ Receives observation ot
◦ Receives scalar reward rt
• The environment:
◦ Receives action at
◦ Emits observation ot+1
◦ Emits scalar reward rt+1
Advanced Machine Learning for NLP | Boyd-Graber Reinforcement Learning for NLP | 3 of 1