0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Performance per STE (*10000) Baseline AP/SpAP 0 1 2 3 4 5 6 Speedup 0.1% 1% Architectural Support for Efficient Large-Scale Automata Processing Hongyuan Liu* Mohamed Ibrahim* Onur Kayiran† Sreepathi Pai‡ Adwait Jog* *College of William & Mary †AMD ‡University of Rochester 1- Automata Processing 2- Challenges & Opportunities 5- Summary 4- Efficient Automata Processing on AP 3- Potential Benefits & Research Questions ▪ Observation: Repeated configurations and executions on AP which causes inefficiency ▪ Goal: Accelerate large-scale NFA processing on AP + Demonstrate that a large number of NFA states are Cold during execution but still configured to AP + Predict if a state is Cold or Hot @ compile time using a small profiling input + Propose topological-order based NFA partitioning into Predicted Cold and Predicted Hot states + Develop SparseAP to handle mispredictions efficiently using our proposed Enable and Jump operations ▪ Results + 2.1x Speedup (up to 47x) Potential Solution Remove Cold states from the NFAs Configure ONLY the Hot states to AP Used widely in different areas Von Neumann architectures are not efficient at FSM processing - Irregular memory accesses - Limited Parallelism Solution: Use Automata Processor (AP) - Enables in-memory processing - Exploits state parallelism of NFAs Applications are getting Bigger AP capacity is Limited Challenge: Repeated Executions! Time Opportunity: Underutilization of AP Pattern mismatch Many unused states are configured to AP 0% 20% 40% 60% 80% 100% CAV4k DS CAV DS03 DS06 Snort_L DS09 Snort HM1500 HM500 HM1000 HM PEN TCP Rg1 EM ER Rg05 Fermi Pro Brill LV Bro217 SPM RF1 RF2 Percentage of States Hot (Enabled) Cold (Never-enabled) Time Oracular knowledge of input Arbitrary states partitioning Question#1: How to predict Cold states? Question#2: How to partition NFAs? Mispredictions Question#3: How to handle mispredictions efficiently? Decrease Batches We acknowledge the support of the National Science Foundation (NSF) grants (#1657336, #1717532, #1750667) MICRO 2018 Q1: How to predict Cold states? 47x 1.8x 2.1x 32.1% Q2: How to partition NFAs? Q3: How to handle mispredictions efficiently? Training Testing Solution: Use a small profiling input to predict the Hot/Cold states % from Input % from Training Accuracy 50% 100% 97% 10% 20% 93% 1% 2% 90% 0.1% 0.2% 87% Oracular knowledge of input Time Input Predicted Hot Set Predicted Cold Set u v v’ Generated Intermediate Report(s) Problem: Input stream execution on the predicted Cold set is too Expensive v; Cycle x Solution: SparseAP Execution Mode SpAP = Jump Op + Enable Op Generated Intermediate Report(s) Arbitrary states partitioning Solution: Partition using Topological Order Correlates with Cold and Hot states Makes transition unidirectional 0% 20% 40% 60% 80% 100% CAV4k DS CAV DS03 DS06 Snort_L DS09 Snort HM1500 HM500 HM1000 HM PEN TCP Rg1 EM ER Rg05 Fermi Pro Brill LV Bro217 SPM RF1 RF2 Percentage of States Shallow Medium Deep