Top Banner
FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio Pierini (CERN) Javier Duarte, Burt Holzman, Ben Kreis, Kevin Pedro, Mia Liu, Nhan Tran, Aris Tsaris (FNAL) Phil Harris, Dylan Rankin (MIT) Zhenbin Wu (UIC) ACAT, 11-15 March 2019, Saas Fee, Swizterland
25

FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

Jul 29, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

FPGA-accelerated machine learning inference as a service for particle physics computing

Jennifer Ngadiuba, Maurizio Pierini (CERN) Javier Duarte, Burt Holzman, Ben Kreis, Kevin Pedro, Mia Liu, Nhan Tran, Aris Tsaris (FNAL)

Phil Harris, Dylan Rankin (MIT) Zhenbin Wu (UIC)

ACAT, 11-15 March 2019, Saas Fee, Swizterland

Page 2: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019 �2

The LHC big data problem

The High-Luminosity LHC will pose major challenges:

instantaneous luminosity x 5—7 particles per collision x 5

more data x 15 more granular detectors with x 10 readout channels

→ event rates & datasets will increase to unprecedented levels!

2026

LHC TODAY HL-LHC

Page 3: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019 �3

The LHC big data problem

2026

LHC TODAY HL-LHC

x 20

Event complexity

x 5

Processingtime

x 50

Computingresources

x 20

Page 4: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

The LHC big data problem

�4

x 20

Moore’s law still valid… but Dennard’s scaling no

longer maintained!

Current data processing paradigms will not be sustainable with flat budget!

New technologies needed: machine learning & heterogeneous computing

Page 5: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

The success of ML in HEP

�5

m(jj) [GeV]60 80 100 120 140 160

S/(S

+B) w

eigh

ted

entri

es

0

500

1000

Data

bb→VH,H

bb→VZ,Z

S+B uncertainty

CMS (13 TeV)-177.2 fb

ex, neutrino event reconstruction with GoogleNet @ Nova

ex, Higgs boson observations @ ATLAS,CMS

Page 6: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Heterougeneous computing

�6

•Offload a CPU from the computational heavy parts to an “accelerator” → co-processor system

- CPU+FPGA / CPU+GPU / CPU+ASICs / … - high parallelization and data throughput - optimal for ML algorithms

•Increasing popularity of co-processor systems in industry

- exploit trends in developing new devices optimized for ML and speedup the inference

Microsoft Brainwave:

cloud FPGAs

cloud ASICs

cloud FPGAs

Page 7: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Solving computing challenges

�7

Computing intensive physics problems can benefit from co-processor systems ex, particle track reconstruction

Option 2

recast physics problem as a machine learning problem

Languages:C++, Python, …

Hardware: FPGA, GPU, ASIC

Challenge: how to map physics ↔ ML

Option 1

rewrite physics algorithmsfor new hardware

Languages: OpenCL, OpenMB, TBB, VHDL, …

Hardware: FPGA, GPU

Challenge: difficult to adapt to new and changing hardware

Page 8: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Solving computing challenges

�8

Computing intensive physics problems can benefit from co-processor systems ex, particle track reconstruction

THIS TALK!

Proof-of-concept: particle physics computing

with Brainwave

Option 2

recast physics problem as a machine learning problem

Languages:C++, Python, …

Hardware: FPGA, GPU, ASIC

Challenge: how to map physics ↔ ML

Page 9: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Event processing @ LHC

�9

Reduce data rates to manageable levels for offline processingby filtering events through multiple stages:

Absorbs 100s TB/s Trigger decision to be made in O(μs) Latencies require all-FPGA design

Javier Duarte I hls4ml 6

CMS TriggerHigh-Level TriggerL1 Trigger

1 kHz 1 MB/evt

40 MHz

100 kHz

• Level-1 Trigger (hardware)

• 99.75% rejected

• decision in ~4 μs

• High-Level Trigger (software)

• 99% rejected

• decision in ~100s ms

• After trigger, 99.99975% of events are gone forever

OfflineL1 Tri

gger High-Level Trigger

100 ms 1 s1 ns 1 μs

40 MHz 100 KHz 1 KHz1 MB/event

Analysis of the full event runs on commercial computers (30k CPU cores) Latency O(100 ms)

Page 10: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Event processing @ LHC

�10

CMS TriggerHigh-Level TriggerL1 Trigger

1 kHz 1 MB/evt

40 MHz

100 kHz

Offline

100 ms 1 s1 ns 1 μs

1 KHz1 MB/event

High-Level Trigger

Javier Duarte I hls4ml 6

CMS TriggerHigh-Level TriggerL1 Trigger

1 kHz 1 MB/evt

40 MHz

100 kHz

• Level-1 Trigger (hardware)

• 99.75% rejected

• decision in ~4 μs

• High-Level Trigger (software)

• 99% rejected

• decision in ~100s ms

• After trigger, 99.99975% of events are gone forever

L1 Trigger

100 KHz

Javier Duarte I hls4ml 6

CMS TriggerHigh-Level TriggerL1 Trigger

1 kHz 1 MB/evt

40 MHz

100 kHz

• Level-1 Trigger (hardware)

• 99.75% rejected

• decision in ~4 μs

• High-Level Trigger (software)

• 99% rejected

• decision in ~100s ms

• After trigger, 99.99975% of events are gone forever

Absorbs 100s TB/s Trigger decision to be made in O(μs) Latencies require all-FPGA design

Improvements at this stage a little tricky → see dedicated talk

Reduce data rates to manageable levels for offline processingby filtering events through multiple stages:

40 MHz

Page 11: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Javier Duarte I hls4ml 6

CMS TriggerHigh-Level TriggerL1 Trigger

1 kHz 1 MB/evt

40 MHz

100 kHz

• Level-1 Trigger (hardware)

• 99.75% rejected

• decision in ~4 μs

• High-Level Trigger (software)

• 99% rejected

• decision in ~100s ms

• After trigger, 99.99975% of events are gone forever

Event processing @ LHC

�11

CMS TriggerHigh-Level TriggerL1 Trigger

1 kHz 1 MB/evt

40 MHz

100 kHz

Offline

100 ms 1 s1 ns 1 μs

L1 Trigger

40 MHz 100 KHz

High-Level Trigger

1 KHz1 MB/event

Reduce data rates to manageable levels for offline processingby filtering events through multiple stages:

Analysis of the full event runs on commercial computers (30k CPU cores) Latency O(100 ms)

HLT/Offline processing as right place to explore heterogenous computing!

Page 12: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Co-processors as a service with Brainwave

�12

•On-site co-processors interesting solution for HLT computing farm where latency is the bottleneck

•For offline, better solution is using co-processors as a service on the cloud

- not feasible to buy specialized hardware for each T1, T2, T3 computing center

•Project Brainwave provides a full scalable real-time AI service on Azure cloud (more than just a single co-processor)

- Multi-FPGA+CPU fabric accelerating both computing and network

- Caveat: currently supports only selected computing vision off-the-shelf networks

b

Page 13: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Proof-of-concept: SONIC

�13

Service for Optimized Network Inference on Co-processors: a framework to exploit heterogeneous resources for on-demand ML inference

How to integrate FPGA co-processor into current multithreaded paradigms?

Network input

Datacenter (CPU farm)

CPU FPGA

Prediction

Experimental Software

gRPC protocol Heterogeneous Cloud Resource

CPUFPGA

Heterogeneous “Edge” Resource

gRPC protocol

Experimental software

Option 1: cloud service Option 2: edge service

Cloud service has a latency due to data transfer

→ option 2: explore also “edge” or “on-prem” case: run CMS software on Azure cloud machine to simulate on-site installation of FPGAs. Provide test of “HLT-like” performance.

Page 14: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Physics case: top tagging with ResNet50

�14

•Brainwave allows the use of custom weights for fixed architectures •Train ResNet-50 on 2D jet images to distinguish two types of jets

CASE STUDY: JET SUBSTRUCTURE 10

Just an illustrative example, lessons are generic! Might not be the best application, but a familiar one

ML in substructure is well-studied

CASE STUDY: JET SUBSTRUCTURE 10

Just an illustrative example, lessons are generic! Might not be the best application, but a familiar one

ML in substructure is well-studied

CASE STUDY: JET SUBSTRUCTURE 10

Just an illustrative example, lessons are generic! Might not be the best application, but a familiar one

ML in substructure is well-studied

Top-quark jets QCD jetsversus

gluonother quarkstop quark

Page 15: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Physics case: top tagging with ResNet50

�15

ResNet50: 25M parameters, 7B operations Examples of large networks used in CMS: •DeepAK8: 500K parameters,

15M operations (CMS-DP-2017-049)

•DeepDoubleB: 40K parameters, 700K operations (CMS-DP-2018-046)

ResNet-50 made of two components:

•featurizer: several convolutional layers to extract image features → computationally intensive and accelerated on the FPGA

•classifier: few fully connected layers → inference performed on CPU

Page 16: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Testing SONIC

�16

Measure the performance of the SONIC package measuring the total end-to-end latency of an inference request to Brainwave within CMSSW

remote test:FROM CPU @ Fermilab, IllinoisTO Azure @ Virginia → <time> = 60 ms (limited by distance and speed-of-light)

on-prem test:run CMSSW on Azure VM → <time> = 10 ms (~ 2ms on FPGA, rest is classifier and I/O)

log scale

linear scale

Page 17: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Testing SONIC at scale

�17

Here “worst-case” scenario: each process only executes the inference on the cloud •more realistic case: inference runing alongside many other modules → reduced probability of simultaneous requests

Test a large-scale deployment of cloud co-processors in a production environment

1.8% failure rate only for largest number of simultaneous requests

Page 18: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Testing SONIC at scale

�18

•Test with each simultaneous process completing serial processing of 5000 jet images •Populate the pipeline of data streaming into the service → number of inferences per second (throughput) increases with number of simultaneous requests

•Plateau at ∼ 650 inferences/s limited by FPGA inference time (∼ 2ms)

Test a large-scale deployment of cloud co-processors in a production environment

Page 19: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Comparison with CPU

�19

•Above plots for standalone python benchmark using i7 3.6 GHz, TensorFlow v1.10 - inference time ∼ 180 - 500 ms, 2/5 images per second

•Also run local test with CMSSW on cluster @ FNAL: - Xeon 2.6 GHz, TensorFlow v1.06 - 1.75 s/inference

Page 20: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Comparison with GPU

�20

TensorFlow ResNet50

Brainwave quantized ResNet50

Super optimized ResNet50

•Tested GPU: NVidia GTX 1080 Ti, connected directly to CPU with PCIe (no gRPC) •Locally connected GPU gives similar performane as on-prem/remote FPGA

co-processors, but: - GPUs need large batch size (how to batch?) - PCIe vs network

on-prem FPGA

on-prem FPGA

Page 21: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Conclusions•Current HEP computing paradigms not substainable for future requirements:

- HL-LHC here as an example but large-scale neutrino experiments (ex: DUNE) sharing similar challenges

•Possible solution: recast physics problems into a machine learning problem

- high phyiscs performance, highly parallelizable and highly supported in industry

•ML algorithms can be accelerated on ML-oriented hardware: GPUs, FPGAs, ASICs

•Presented proof-of-concept for acceleration on cloud FPGAs (Microsoft Brainwave)

- for large computing tasks, there is > x100 benefit over CPU-only computations

- closer clouds and edge solutions also suitable for latency-limited tasks (HLT)

•Work in progress: benchmark other platforms (Google/AWS/IBM) and keep R&D on ML algorithms for computationally intensive physics problems …

�21

Page 22: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Backup slides

�22

Page 23: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Summary

�23

A factor x175 (x30) speedup for Brainwave on-prem (remote) over current CMSSW CPU performance.

Page 24: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

ResNet-50 performance

�24

Page 25: FPGA-accelerated machine learning inference as a service for … · FPGA-accelerated machine learning inference as a service for particle physics computing Jennifer Ngadiuba, Maurizio

ACAT 2019 - FPGA-accelerated machine learning inference 12.03.2019

Particle physics computing model

�25