Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning Alessandro Pellegrini Sapienza, University of Rome HPCS 2016
Optimizing Memory Management for Optimistic Simulationwith Reinforcement Learning
Alessandro Pellegrini
Sapienza, University of Rome
HPCS 2016
Context
• Simulation is a powerful technique to explore complex scenarios
• Parallel Discrete Event Simulation (PDES) has been applied to alarge set of research fields
• Speculative Simulation (Time Warp-based) is proven to beeffective to deliver high performance simulations
• Ensuring consistency of speculative simulation requires effort
• Transparency towards the application-model developer is critical
2 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
Organization of a Time Warp Kernel
Communication Network
Machine
CPU
Kernel
LPLP
LP LPLP
LP LPLP
LP LPLP
LP
...
...
CPU CPU CPU
Machine
CPU
Kernel
...CPU CPU CPU
Kernel
3 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
The Synchronization Problem
LPi
LPj
LPk Execution Time
Execution Time
Execution Time
4 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
The Synchronization Problem
LPi
LPj 15
5
LPk Execution Time7
10 Execution Time
Execution Time
4 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
The Synchronization Problem
LPi
LPj 15
5
LPk Execution Time7 17
10
17
Execution Time
Execution Time
4 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
The Synchronization Problem
LPi
LPj 15
5 10
20
LPk Execution Time7 17 25
10
17
Execution Time
Execution Time
4 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
The Synchronization Problem
LPi
LPj 15
5 10
20
12
LPk Execution Time7 17 25
10
17
Execution Time
Execution Time
4 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
The Synchronization Problem
LPi
LPj 15
5 10
20
Straggler Message
12
LPk Execution Time7 17 25
10
17
Rollback Execution:
Recovering state at
LVT 10
Execution Time
Execution Time
4 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
The Synchronization Problem
LPi
LPj 15
5 10
20
Straggler Message
12
LPk Execution Time7 17 25
10
17 17
Anti-message
anti-message
reception
Rollback Execution:
Recovering state at
LVT 10
Rollback Execution:
Recovering State at
LVT 7
Execution Time
Execution Time
4 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
The Synchronization Problem
LPi
LPj 15
5 10
20 12
Straggler Message
12
LPk Execution Time7 17 25
10
25
17 17
Anti-message
anti-message
reception
Rollback Execution:
Recovering state at
LVT 10
Rollback Execution:
Recovering State at
LVT 7
Execution Time
Execution Time
4 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
Memory Management and Rollbacks
• How can a runtime environment restore a state?
• It has to know the complete memory map of each LP
• It should take “sometimes” a snapshot of that map
• The snapshot could be either full or incremental
• Memory management is fundamental to Time Warp systems◦ Too many snapshots: memory/latency inefficiency◦ Too few: rollbacks are long!◦ Full vs incremental: how to decide?
5 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
Memory Management and Rollbacks
• How can a runtime environment restore a state?
• It has to know the complete memory map of each LP
• It should take “sometimes” a snapshot of that map
• The snapshot could be either full or incremental
• Memory management is fundamental to Time Warp systems◦ Too many snapshots: memory/latency inefficiency◦ Too few: rollbacks are long!◦ Full vs incremental: how to decide?
5 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
Our memory management organization
• Transparency◦ Interception of memory-related operations (no platform APIs)◦ No application-level procedure for (incremental) log/restore tasks
• Optimism-Aware Runtime Supports◦ Recoverability of generic memory operations: allocation, deallocation,
and updating
• Incrementality◦ Cope with memory “abuse” of speculative rollback-based
synchronization schemes◦ Enhance memory locality
6 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
Our memory management organization
• Lightweight software instrumentation◦ Optimized memory-write access tracing and logging◦ Arbitrary-granularity memory-write tracing◦ Concentration of most of the instrumentation tasks at a pre-running
stage:
• No costly runtime dynamic disassembling
• Standard API wrappers◦ Code can call standard malloc services◦ Memory map transparently managed by the simulation platform
7 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
Our memory management organization
malloc_area
malloc_area
... Block status
bitmap
Dirty
bitmap
chunk
chunk
...
• Memory (for each LP) is pre-allocated
• Requests are served on a chunk basis
• Explicit avoidance of per-chunk metadata◦ Block status bitmap: tracks used chunks◦ Dirty bitmap: tracks updated chunks since last log
8 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
Our memory management organization
• We use Hijacker to track memory-update instructions
mov $3, xoriginal memory
update
jmp *%eaxindirect branch
mov $3, x
jmp .Jump
call track
jmp 0xXXXX.Jump:
push struct
new writeable section regular jump modi ed
by branch_corrector
call corrector
Instrumentation Process
Original Executable Final executable
push struct
regular jump
• Multi-code packs two different version of the program9 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
Memory management self-optimization
• To optimize the memory manager we have to determine:◦ When to take a snapshot◦ Its mode (incremental vs incremental)
• But to take an incremental log, tracing must be active
• Traditional approaches are based on analytic models◦ Periodic recomputation (e.g., checkpoint interval)◦ Non-responsive if dynamics change fast◦ Might not capture secondary effects
• We use reinforcement learning
10 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
Memory management self-optimization
• To optimize the memory manager we have to determine:◦ When to take a snapshot◦ Its mode (incremental vs incremental)
• But to take an incremental log, tracing must be active
• Traditional approaches are based on analytic models◦ Periodic recomputation (e.g., checkpoint interval)◦ Non-responsive if dynamics change fast◦ Might not capture secondary effects
• We use reinforcement learning
10 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
Reinforcement Leraning-based self-optimization
• An agent takes an action in the environment depending on thecurrent state
• An a-posteriori reward tells whether the choice was good
• Previous decisions affect future ones (we learn from history!)
• With some random probability, we ignore history and explore
• We take an action after the execution of each event
• After some knowledge has been acquired, the system can becomevery responsive
11 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
States and actions
Actions: Monitored, Unmonitored, Checkpoint
12 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
The reward
• We want to reduce the time spent in non-necessary tasks
• We define the expected computation loss:
Γ = E[∫ T
0X (t)dt
]• where:
X (t) =
0 if x = Non-IncrementalδM
(δe+δM) if x = Incremental
1 − γ if x = CKPTI
1 if x = CKPTF
1 if x = Rollback
13 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
Experimental Setup
• 64-bit NUMA machine, 24 cores, 32GB of RAM
• SuSe Enterprise 11, Linux 2.6.32.13
• GSM coverage simulation model
• High fidelity model (fading, power regulation, meteorologicalconditions)
• Ring highway coverage
• 1000 channels per cell
• Variable call interarrival (simulation of one week of traffic)
14 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
Experimental Results
0
50000
100000
150000
200000
250000
10000 30000 50000 70000
Cum
ulat
ed C
omm
itted
Eve
nts
Wall-clock-time (sec)
Overall Execution Speed
Q-LearningIncremental
Full
15 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
Experimental Results
0
20000
40000
60000
80000
100000
120000
RL Genetic
Total execution time
16 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
Experimental Results
0
5
10
15
20
25
30
35
40
200 400 600 800 1000Eve
nts
Com
mitt
ed b
etw
een
Che
ckpo
ints
Logical Processes
Checkpoint Interval
17 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning
Thanks for your attention
Questions?
http://www.dis.uniroma1.it/∼pellegrini
https://github.com/HPDCS/ROOT-Sim
18 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning