Resource-Efficient FPGA Pseudorandom Number Generation H¨ usrev Cılasun*, Ivy Peng † , Maya Gokhale † † Lawrence Livermore National Laboratory *University of Minnesota, Twin Cities Introduction I Probability distributions play a critical role in diverse application domains. . In simulations, modeling physical properties of materials, of processes, or of behaviors. . For instance, molecular dynamics codes often utilize the Maxwell-Boltzmann distribution for modeling temperature. I We introduce a resource-efficient hardware pseudo-random number generator (RNG) and two optimizations: . Alias table partitioning: Separates a target distribution into multiple sub-ranges and facilitates local optimizations in each sub-range to improve overall resource utilization . Adaptive threshold resolution: Adjusts bitsize for representing threshold values to the precision of underlying partition I Our main contributions: . Analytic study driven by dual considerations of improving accuracy and hardware mapping optimization . Automated HDL generation of both simulation and synthesis scripts . Diverse use cases: emulating Gaussian delay profile in FPGA-based LiME memory system emulator [1]; random number server for HPC applications Methodology I Walker’s Alias Method [2] is an efficient algorithm for FPGA hardware implementation. It generates arbitrary discrete distributions from uniformly generated random numbers. For a target distribution E (·), this method generates and uses a table of real threshold values F (·) and alternative index values A(·), where F (·), A(·), and E (·) are of the same length. Each output sample Y is generated as Y = ( X U ≤ F (X ) A(X ) U > F (X ), where U is a real uniform random number and X is a uniform random integer. The output quality is a function of the precision of U , i.e., increasing the bit size or representing U as a floating-point number [3] improves the quality. I We target following Maxwell-Boltzmann distribution (Eq.1) which has its PDF as a function of temperature T and the Planck distribution (Eq.2) which is parameterized by the factor a. f (x )= 2hc 2 x 5 exp - hc xkT (1) f (x )= r 2 π x 2 exp -x 2 2a 2 a 3 (2) Integration with MATLAB Alias Table Sampling Tcl Elaboration Simulation Walker’s Algorithm Desired Distribution HDL 2 Tests Boilerplate Text MATLAB/Octave Vivado Range Resolution Python Wrapper .csv Sample Count Figure 1: An automated flow of customization and testing PwCLT Architecture URNG-119 mixture_pdf_urng [118:0] c0_mixture_sign_flag [0:0] ROM addr [6:0] data [37:0] c0_alias _index [6:0] alias_table_urng [85:0] bernoulli _fp_urng [78:0] FP Comparator [0:0] [30:0] [6:0] [6:0] + [7:0] - [7:0] - cltfx_urng[31:0] - - “0000000” + “0”&x“00” [7:0] [16:0] [16:0] <<8 [16:0] [7:0] [7:0] [7:0] [8:0] [8:0] [9:0] FP Cast [7:0] [16:0] 4D [6:0] [30:0] Figure 2: PwCLT-8 Architecture[3] for LiME[1] integration. Alias Table Partitioning I We improve the resource utilization for alias tables by separating the target distribution into multiple subranges (four subranges are exemplified in Fig. 3). . In each subrange, the standard alias table implementation is performed. . This separation allows each table to be optimized locally, i.e., alias tables whose target distribution is smoother can be configured to have fewer threshold bits in F (·) table per entry. . Consequently, the alias tables can be selected based on their relative probability range and lifted accordingly. I We propose adaptive threshold resolution to adjust the threshold bitsize while maintaining statistical accuracy. . The quality of the generated samples is determined by the threshold resolution. . When alias table partitioning is employed, partitions with higher variance yield larger bitsize while smaller bitsize is required for those partitions with lower variance. URNG ROM > ROM > ROM > ROM > <c 1 <c 2 <c 3 <c 4 + Encoder 0 N/4 N/2 3N/4 Figure 3: An illustration of alias table partitioning scheme which selectively combines sub distributions by comparing a uniform random variable with CDF values of each distribution in partition boundaries. Validation and Evaluation 0 1 2 3 4 5 6 x 0 1 2 3 4 5 6 Normalized Sample Count 10 -5 Maxwell-Boltzmann Distribution, a=1 MATLAB alias table, floating-point threshold, size=65536 FPGA simulated alias table, fixed-point threshold, size=65536 Ideal Double-precision (a) Maxwell-Boltzman distribution 0 0.5 1 1.5 2 2.5 3 x 10 -5 0 1 2 3 4 5 6 7 8 9 Normalized Sample Count 10 -5 Planck Distribution, T=700K MATLAB alias table, floating-point threshold, size=65536 FPGA simulated alias table, fixed-point threshold, size=65536 Ideal Double-precision (b) Planck distribution (c) Gaussian Latency Histogram in LiME 11 11.5 12 12.5 13 13.5 14 14.5 15 15.5 16 Output bits 0 5 10 15 20 25 30 Memory saving (%) Single Alias Table 2-partitioned 4-partitioned 8-partitioned (d) Memory savings from various partitioning schemes. Conclusion I We introduced a resource-efficient hardware RNG whose accuracy is validated by χ 2 test. I We proposed an alias table partitioning technique for optimizing resource utilization. I Our RNG is evaluated in three use cases for memory emulations and scientific simulations. References [1] A. K. Jain, S. Lloyd, and M. Gokhale. Microscope on memory: Mpsoc-enabled computer memory system assessments. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 173–180, 2018. [2] Alastair J Walker. An efficient method for generating discrete random variables with general distributions. ACM Transactions on Mathematical Software (TOMS), 3(3):253–256, 1977. [3] D. B. Thomas. FPGA gaussian random number generators with guaranteed statistical accuracy. In 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines, pages 149–156, 2014. Acknowledgments This work was supported by LLNL LDRD 19-ERD-004. LLNL-ABS-813772.