+ + P.Err 1<< MAX X psc [i] PSc Err.[i+1] MUX Err Xt N.Err MAX/+ MUX > Quality Programmable Vector Processors for Approximate Computing Swagath Venkataramani 1 , Vinay Chippa 1 , Srimat Chakradhar 2 , Kaushik Roy 1 , Anand Raghunathan 1 1 Integrated Systems Laboratory, School of ECE, Purdue University 2 Systems Architecture Department, NEC Laboratories America Approximate Computing - Motivation Recognition Synthesis Mining Video Intrinsic Application Resilience Ability of applications to produce outputs of acceptable quality despite underlying computations executed imprecisely Search Intrinsic Application Resilience ‘Noisy’ Real World Inputs Self- Healing/ Iterative Algorithms Redundant Input Data No Golden Output Perceptual Limitations Statistical Probabilistic Computations Vision Sources of Resilience Quality Programmable Processors Experimental Methodology and Results QUORA: Quality Programmable 1D/2D Vector Processor National Science Foundation Summary Intrinsic application resilience: A new dimension to optimize HW and SW Objective: Energy-efficient & programmable processor for approximate computing Quality programmable processors: Quality codified as part of the instruction set QUORA: Quality programmable 1D/2D vector processor • Quality programmable ISA and microarchitecture Acknowledgement NEC laboratories America Notion of correctness is relaxed Good enough answers !!! Need programmable platforms for approximate computing! APE APE APE APE APE APE APE APE APE APE APE APE APE APE ACC ALU MUX MUX Reg Reg MAPE MAPE MUX Data. OUT APE APE APE APE APE APE APE APE APE APE 1-to-many-DEMUX MAPE MAPE SM SM SM SM MAPE MAPE MAPE MAPE SM SM SM SM SM MUX Data. OUT Data. IN 1-to-many-DEMUX SM INST. MEMORY Scalar Reg. File ALU Prog. Counter INST. DECODE & CONTROL UNIT CAPE Halt Data. IN DATA MEMORY ALU ACC Scratch Registers MAPE APE Scratch Registers ALU ACC MAPE Instruction Inst. Add Inst. Read APE ARRAY CLK RESET SM_row_sel MAPE_row_sel SM_col_sel MAPE_col_sel Data. IN Data. OUT Data. Read Data. Write Data. Add INTERFACE Quality Control Unit & Quality Monitors Quality Control Unit & Quality Monitors PE count Complexity Energy Approximation Scope 3-tiered PE hierarchy enables larger energy benefits from approximate computing APE CAPE MAPE Processing Element Hierarchy > 90% > 70% Quality Programmable Instruction Set Quality Configurable Execution 47 Instructions – 9 APE, 22 MAPE, 13 CAPE, 3 SM Instructions extended with 2 quality fields e.g., qpMAC R_length, R_row_enb, R_col_enb, R_q_type, R_q_amt Type of error – 3 quality metrics e.g., =| − | Amount of error Block Diagram of QUORA Micro-architecture Track positive & negative errors Modulate the threshold for round-off Quality specified @ vector inst. outputs Scale the precision of input operands Key idea: Compensate errors across many scalar operations C.PSc. +/> Error Error PSc PSc. Unit PSc. Unit PSc. Unit PSc. Unit PSc. Unit PSc. Unit PSc. Unit PSc. Unit MAPE MAPE MAPE MAPE MAPE MAPE MAPE MAPE APE APE APE APE APE APE APE APE APE APE APE APE APE APE APE ACC CL K Gated CLK C.PSc R.PSc Op-code Precision Scaling Unit Array Level View Quality Control Unit • Set precision values PSc. units shared across row/col <1% energy Micro-architectural Parameters Benchmarks Micro-architectural Parameters Value Array Dimensions 16 X 16 No. of PEs (APEs + MAPEs+ CAPE) 289 (256 + 32 + 1) No. of SM elements 32 Depth of SM elements 64 Operating Frequency 250 MHz Metric Value Feature Size 45nm Area 2.6 mm 2 Power 367.8 mW Gate Count 502042 51% 28% 1% 19% 0% 1% APEs (%) MAPEs (%) CAPE(%) SMs (%) PScE (%) Misc. (%) Applications Dataset Quality Metric Handwritten Digit Recognition (SVM-MNIST) MNIST Percentage classification accuracy Object Recognition (SVM-NORB) NORB Digit Classification (CNN) MNIST Eye Detection (GLVQ) NEC labs. Optical Character Recognition (k-NN) OCR digits Census Data Analysis (ANN) Adult Document Search (SSI) Subset of Wikipedia No. correct in top 25 results Image Segmentation (K-Means-Seg) Berkeley dataset Mean distance of clustered points from respective centroids Optical Character Clustering (K-Means-OCR) OCR digits Energy Benefits QP-Instructions 0 0.5 1 1.5 Normalized Energy --> No Approx. < 0.5% ~ 2.5 % ~ 7.5% 1.05-1.7X savings for NO quality loss 1.18-2.1X savings for < 2.5% quality loss > 2.5X savings for < 7.5% quality loss 88% 2% 10% 1% 3% 96% QP-APE QP-MAPE Accurate Dynamic inst. count Energy 90% of energy in QP instructions Precision Scaling Mechanisms 0 0.2 0.4 0.6 0.8 1 1.2 0 0.5 1 APE Energy --> Average Error (%) --> Trunc Up/Down Err. Comp Error compensation provides superior Q-E trade-off QP-MAC An abstract model for programmable approximate processors Application Program Application Quality Requirement Program with QP-ISA HW/SW INTERFACE Quality Programmability: Ability to specify the desired accuracy requirement to HW Notion of quality explicitly built into the instruction set 0 5 10 15 20 25 30 0 5 10 % of Approximate instructions % Loss in Output Quality --> Arbitrary < 50% < 25% < 7.5% Constraining errors enables 25-100X more approximate instructions compared to allowing arbitrary errors Quality Configurable Execution Unit QUALITY PROGRAMMABLE MICROARCHITECTURE Decode & Control Inst. Fetch Register File Quality Control Logic Instruction accuracy monitor Software visible Error Registers Quality Programmable ISA • Quality fields in insts. qpADD dest, op1, op2, MAG, 1% • Purely based on instruction semantics Quality- programmable add Error magnitude < 1% Capable of executing instructions with different quality levels Feedback about actual error - used by software to determine quality levels of future instructions Micro-architecture guarantees instruction level quality bounds SVM