Performant deep reinforcement learning: latency, hazards, and pipeline stalls in the GPU era… and how to avoid them 0 Performant deep reinforcement learning: latency, hazards, and pipeline stalls in the GPU era… and how to avoid them Mark Hammond Co-founder / CEO
27
Embed
Performant deep reinforcement learning: latency, hazards, and …on-demand.gputechconf.com/gtc/2017/presentation/s7359... · 2017-05-11 · Performant deep reinforcement learning:
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Performant deep reinforcement learning: latency, hazards, and pipeline stalls in the GPU era… and how to avoid them 0
Performant deep reinforcement learning:latency, hazards, and pipeline stalls in the GPU era… and how to avoid them
Mark HammondCo-founder / CEO
Performant deep reinforcement learning: latency, hazards, and pipeline stalls in the GPU era… and how to avoid them 1
Latency (n): The time elapsed (typically in clock cycles) between a stimulus and the response to it
Hazard (n): A problem with the instruction pipeline in CPU microarchitectures when the next instruction cannot execute in the following clock cycle
Performant deep reinforcement learning: latency, hazards, and pipeline stalls in the GPU era… and how to avoid them 2
1. CPU feeds registers and addition instruction to the ALU2. ALU performs operation and stores to temporary register3. CPU directs memory controller to write result to memory4. CPU retrieves memory at indicated location and feeds
contents + register4 and subtraction instruction to ALU5. ALU performs operation and stores to register3
1. CPU feeds registers and addition instruction to the ALU2. ALU performs operation and stores to temporary register3. Concurrently
a. CPU directs memory controller to write result to memory
b. CPU forwards temporary register, register4, and subtraction instruction to ALU
4. ALU performs operation and stores to register3 Michael AbrashZen of Code Optimization
Performant deep reinforcement learning: latency, hazards, and pipeline stalls in the GPU era… and how to avoid them 3
From the CPU to the GPU
Performant deep reinforcement learning: latency, hazards, and pipeline stalls in the GPU era… and how to avoid them 4
The Essence of Machine Learning
Traditional programming:
Programmer authored
User / data inputs
Desired outputs
Machine learned
Observed inputs
Observed outputs
Machine learning:
𝑓𝑓() 𝑥𝑥 𝑓𝑓(𝑥𝑥)
Performant deep reinforcement learning: latency, hazards, and pipeline stalls in the GPU era… and how to avoid them 5
Even with simple one dimensional functions, you have to worry about things like overfitting
underfit generalized fit overfit
Performant deep reinforcement learning: latency, hazards, and pipeline stalls in the GPU era… and how to avoid them 6
Challenges to Learning the Underlying Function
Real data is multi-dimensional and often entails crafting features
Title: The Triumph of the Nerds: The Rise of Accidental EmpiresRelease date: April 14, 1996Genre: DocumentarySynopsis: Three part documentary that shows the insight look at the history of computers, from its rise in the 1970s to the beginning of the Dot-com boom of the late 1990s.Writers: Robert X. Cringely (book), Robert X. Cringely (screenplay) Stars: Robert X. Cringely, Douglas Adams, Sam AlbertRunning time: 150 minutesReviewers: 1,102Rating: 8.5/10
What movies will someone enjoy watching
An engineered feature:Topic area(s) (As derived using natural language processing techniques on synopsis)
Performant deep reinforcement learning: latency, hazards, and pipeline stalls in the GPU era… and how to avoid them 7
ANN image from Wikimedia Commons - Mcstrother
For a good overview of neural network types see:http://www.asimovinstitute.org/neural-network-zoo/http://www.asimovinstitute.org/neural-network-zoo-prequel-cells-layers/
Deep learning for large scale flexibility
How do we make this performant?
1) Make sure the data pipeline can keep the GPUs populated and processing
2) Optimize the efficiency of the neural network architecture by exploiting structural aspects of the data and problem
Continuous Deep Q-Learning with Model-based Acceleration – Gu, et.al. – https://arxiv.org/abs/1603.00748Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates
Gu, et.al. – https://arxiv.org/abs/1610.00633
• Parallelizes NAF using a parameter server approach
• Off-policy
• Does not require the careful balancing of resources required by GA3C