Armed Bandits, Machine Learning and Fast Java: Practical Advice for Real- Time APIs Breandan Considine OneSpot, Inc.
Dec 28, 2015
Armed Bandits, Machine Learning and Fast Java: Practical Advice for Real-
Time APIs
Breandan ConsidineOneSpot, Inc.
Why Real-Time?
● The world is full of hard problems
● Types of real time applications
Hard (nuclear reactor control)
Firm (auction bidding)
Soft (train scheduling)
● Real-time is a good thing
● Real world applications
● Performance over scalability
Benefits of Real-Time Processing
● Forces us to narrow our priorities
● Focus on constant, stable solutions rather than time-varying, exact solutions
● Abundance of data but scarce processing power
Lifespan of actionable data extremely short
Tradeoff between optimality and throughput
● Speed and parallelism will come over time
● Upfront investment with long-term benefits
Real-time Interactive Tasks (RITs)
● Online auctions: DSPs, SSPs
● Multivariate testing
● Inventory control, SCM
● Scheduling, navigation, routing
● Recommendation systems
● High frequency trading
● Fraud prevention
Common Thread
● Agent offered a context and set of choices
● Each choice has a unknown payoff distribution
● Choose an option, measure the outcome
● Goal: Maximize cumulative payoff
Many instances
Time sensitive
Nontrivial features
Challenges● Impractical to test every action in context
Computationally intractable to consider
Cost of full survey outweighs benefit
● Exploration-Exploitation Tradeoff
Opportunity cost for suboptimal choices
Local extrema conceal optimal solutions
● Latency comes at the cost of throughput
Every clock cycle must count
Firm real-time characteristics
Traditional Supervised Learning Cycle
Reinforcement Learning (RL)
Dis/advantages
● Starts from scratch, training is expensive
● Credit assignment problem & reward structure
● Issues with non-stationary systems
● Continuously integrates feedback
● Adapts to real-time decisions
● No assumptions about data
● Follows signal on-line
● Similar to how we learn
Non-blocking Algorithms
● Critical for high performance I/O
● Relatively difficult to implement correctly
● Offers large speedup over lock-based variants
● Types of non-blocking guarantees
Wait-freedom
Lock-freedom
Obstruction-freedom
Lock-Freedom● Guarantees progress for at least one thread
● Does not guarantee starvation-freedom
● May be slower overall, see Amdahl's law
Java Memory Model
● happens-before relation
Threaded operations follow a partial order
Ensures JVM does not reorder ops arbitrarily
● Sequential consistency is guaranteed for race-free programs
● Does not prevent threads from having different visibility on operations, unless explicitly declared
The volatile keyword
● Mechanics governed by two simple rules
Each action within a thread happens in program order
volatile writes happen before all subsequent reads on that same field
● Reads from and writes to main memory
● Syntactic shorthand for lock on read, unlock on write – incurs similar performance toll
Java Concurrency
● ConcurrentHashMap, ConcurrentLinkedQueue
● Need to carefully benchmark
Can be significantly slower depending on implementation
Avoid using default hash map constructor
Faster implementations exist, lock-free
Java 8 improvements in the pipeline
● Prone to atomicity violations
ConcurrentHashMap<String, Data> map;
Data updateAndGet(String key) {
Data d = map.get(key);
if(d == null) { // Atomic violation
d = new Data();
map.put(key, d);
}
return d;
}
Java Atomics
● Guarantees lock-free thread safety
● Uses CAS primitives to ensure atomic execution
● Better performance than volatile under low to moderate contention, must be tested in production setting
private T current;
public synchronized <T> T compareAndSet(T expected, T new) {
T previous = current;
if(current == expected)
current = new;
return previous;
}
ABA Problem
● Direct equality testing is not sufficient
● Full A-B-A transaction can execute immediately before execution of CAS primitive, causing unintended equality when structure has changed
● Solution: generate a unique tag whenever value changes, then CAS against value-tag pair
False Sharing
● Can be prevented by padding out fields
● Java 8 addresses this problem with @Contended
Multi-Armed Bandit Problems
● N choices, each with hidden payoff distributions
● What strategy maximizes cumulative payoff?
● Observation: Choose randomly from a distribution representing observed probability, return ARGMAX
Bayesian Bandits
*http://camdp.com/blogs/multi-armed-bandits
Adaptive Control Problems● Parameter estimation for real time processes
● Uses continuous feedback to adjust output
Pacing Techniques
PID Controller
Counting/Filtering Problems● Large domain of inputs (IPs, emails, strings)
● Need to maintain online, streaming aggregates
● See Hadoop libraries for good implementations
● Observation: Fast hashing is key.
Bloom Filters● Fast probabilistic membership testing
● Guarantees no false negatives, low space overhead
Special thanks toIan Clarke
Matt Cohen
References
http://mechanical-sympathy.blogspot.ie/
http://camdp.com/blogs/multi-armed-bandits
http://blog.locut.us/2011/09/22/proportionate-ab-testing
http://blog.locut.us/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/
http://www.cl.cam.ac.uk/research/srg/netos/lock-free/
https://github.com/edwardw/high-scale-java-lib
M. Michael, et al. Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms. [PDF]
P. Tsigas, et al. Wait-free queue algorithms for the real-time java specification. [PDF]