Top Banner
55

MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

May 17, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock
Page 2: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock
Page 3: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock
Page 4: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

世界最大のインフラストラクチャーhttps://azure.microsoft.com/ja-jp/regions/

Norway East / West 開設発表

China North 2 / East 2 提供開始

West Europe でのAvailability Zones 提供開始

2018.06 updates

Page 5: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

54REGIONS WORLDWIDE

100K+MILES OF FIBER AND SUBSEA CABLE

130+EDGE SITES

200+ExpressRoute Partners

Page 6: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock
Page 7: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock
Page 8: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

The world’s first globally distributed database service

Page 9: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock
Page 10: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

A car is running A man is cutting a piece of meat

A man is performing on a stage

A man is riding a bike

A man is singing A panda is walking A woman is riding a horse A man is flying in a field

Page 11: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock
Page 12: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock
Page 13: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock
Page 14: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

データ通信

変わるシステムアーキテクチャー

• アルゴリズムをもとにした開発からデータとAIによる

学習・モデル化による実装への変革

データ収集・分析を前提としたサービスの増加

• 単一データから多種多様なデータの総合的活用へ

課題は・・

• シングルデータリポジトリの不在

• セキュリティとデータ通信の避けられない連携

Page 15: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

リアルタイムAI

深層学習により得られた大きな進歩

• 画像処理

• 機械翻訳

• 音声認識

• QA

• 他

課題は・・• オンラインサービスにおける運用、展開、そしてスケーラビリティ

Convolutional Neural Networks

ht-1 ht ht+1

xt-1 xt xt+1

ht-1 ht ht+1

yt-1 yt yt+1

Recurrent Neural Networks

Page 16: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock
Page 17: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

17

Page 18: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock
Page 19: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

Training

Inference

Client Cloud

Humans

ASICs

GPUs

?

Page 20: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

Efficiency(ASICS)

Homogeneity

Page 21: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

DRAMコントローラ

USB コントローラ

イーサネットコントローラ

DSPスライス

RAM

RAM

DSPスライス

CPU

CPU

Page 22: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

Catapult v0

Catapult v1

スケール v1

Catapult v2

2011 2012 2013 2014 2015 2016 ...

Ignite

本番展開

Page 23: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

23

• 1U rack-mounted

• 2 x 10Ge ports

• 3 x16 PCIe slots

• 12 Intel Westmere

cores (2 sockets)

Page 24: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

Stratix V

8GB DDR3

PCIe Gen3 x8

Page 25: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

Server 1FPGA

Server 48

FPGA

Top Of Rack Switch (TOR)

Server 2FPGA

Server 47

FPGA

Server 3FPGA

Server 46

FPGA

Server 4FPGA

Server 45

FPGA

Server 23FPGA

Server 26

FPGA

Server 24FPGA

Server 25

FPGA

… …DTWS DTWS

DTWS DTWS

DTWS DTWS

DTWS DTWS

DTWS DTWS

DTWS DTWS

S0 S0

S0 S0

S0 S0

S1 S1

S2 S2

S2 S2

10G

b E

thern

et

Lin

ks

FPGA Torus

Page 26: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

Complex ALULn, ÷, div

Basic Tile

Basic Tile

Basic Tile

Basic Tile

Registers

Constants

FFE 1 Inst.

FFE nInst.

Compression Thresholds

Local ALU

DSP

DSP

Sched

ulin

g Logic

Distribution latches

Control/Data Tokens

FeatureTransmission

Network

Stream Preprocessing

FSM

FE FFE DTS

FE0

89 Non-BodyBlockFeatures

34 State Machines55 % Utilization

FE1

55 BodyBlockFeatures

20 State Machines45 % Utilization

DTT [3][7]

DTT [3][6]

DTT [3][5]

DTT [3][4]

DTT [3][3]

DTT [3][2]

DTT [3][1]

DTT [3][0]

DTT [3][11]

DTT [3][10]

DTT [3][9]

DTT [3][8]

DTT [2][7]

DTT [2][6]

DTT [2][5]

DTT [2][4]

DTT [2][3]

DTT [2][2]

DTT [2][1]

DTT [2][0]

DTT [2][11]

DTT [2][10]

DTT [2][9]

DTT [2][8]

DTT [1][7]

DTT [1][6]

DTT [1][5]

DTT [1][4]

DTT [1][3]

DTT [1][2]

DTT [1][1]

DTT [1][0]

DTT [1][11]

DTT [1][10]

DTT [1][9]

DTT [1][8]

DTT [0][7]

DTT [0][6]

DTT [0][5]

DTT [0][4]

DTT [0][3]

DTT [0][2]

DTT [0][1]

DTT [0][0]

DTT [0][11]

DTT [0][10]

DTT [0][9]

DTT [0][8]

FFE [1][3]

FFE [1][2]

FFE [1][1]

FFE [1][0]

FFE [0][3]

FFE [0][2]

FFE [0][1]

FFE [0][0]

FFE: 64 cores / chip

256-512 threads

DTT: 48 DTT tiles/chip

240 tree processors

2880 trees/chip

Page 27: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock
Page 28: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

WCS Gen 4.1 ブレードと NIC と Catapult FPGA

Catapult v2 Mezzanine カード

[ISCA’14, HotChips’14, MICRO’16]

Page 29: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock
Page 30: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

マイクロソフトは、FPGAに対して世界最大のクラウド投資

人工知能の容量を複数のExa-Ops で

FPGA ファブリック上で強力な DNN サービスプラットフォーム

急速に進化する ML への対応

CNN、LSTM、MLP、強化学習、特徴抽出、デシジョンツリーなど

推論に最適化された数値精度

カスタム値、複数化、極小精密ネット

より大きい、より速いモデルのための更なる圧縮

数万TOPでの、低ロットサイズでの効果的な推論スループット

超低遅延による最新の DNN の提供

CPUやGPU と比較して10倍以上

単一の DNN サービスで多くの FPGA に拡張

パフォーマンス

柔軟性

スケール

Page 31: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

f f f

l0

l1

f f f

l0

Pretrained DNN モデルCNTK などで

スケーラブルな DNNハードウェア マイクロサービス

BrainWaveSoft DPU

Instruction

Decoder & Ctrl

Neural FU

ネットワークスイッチ

FPGA

Page 32: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

Web search

ranking

Traditional software (CPU) server plane

QPICPU

QSFP

40Gb/s ToR

FPGA

CPU

40Gb/s

QSFP QSFP

Hardware acceleration plane

Web search

ranking

Deep neural

networks

SDN offload

SQL

CPUs

FPGAs

Routers

Page 33: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

FPGA0 FPGA1

Add500

1000-dim ベクトル

1000-dim ベクトル

分割

500x500マトリックス

MatMul500

500x500マトリックス

MatMul500 MatMul500 MatMul500

500x500マトリックス

Add500Add500

Sigmoid500 Sigmoid500

分割

Add500

500 500

concat

500 500

500x500マトリックス

ターゲット

コンパイラ

FPGA

ターゲット

コンパイラ

CPU-CNTK

フロント

ポータブル IR

ターゲット

コンパイラ

CPU-カフェ

トランスフォーム IRs

グラフスプリッタとオプティマイザ

展開パッケージ

Caffeモデル

FPGA ハードウェアマイクロサービス

CNTKモデル

Tensorflowモデル

Page 34: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

=

O(N2) dataO(N2) compute

入力アクティベーション

出力前のアクティベーション

N ウェイトカーネル

O(N3) dataO(N4K2) compute

=

Page 35: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

FFPGA2xCPU

DRAM で初期化されたモデルパラメータ

Page 36: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

FPGA2xCPU

DRAM で初期化されたモデルパラメータ

Page 37: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

バッチサイズ

ハードウェア

利用

(%)

FPGA

Page 38: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

バッチサイズ

99回目

待ち時

最大許可遅延

バッチサイズ

ハードウェア

利用

(%)

バッチ処理により HW の使用率が向上するが、待ち時間は増加

Page 39: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

バッチサイズ

99回目

の待ち

時間

最大許可遅延

バッチサイズ

ハードウェア

利用

(%)

バッチ処理により HW の使用率が向上するが、待ち時間が増加

Page 40: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

FPGA2xCPU

Page 41: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

2xCPU

観測

Page 42: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

2xCPU

Page 43: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

2xCPU

Page 44: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock
Page 45: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

2CPU

2CPU

2CPU

2CPU

2CPU

2CPU

2CPU

2CPU

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

Page 46: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

2CPU

2CPU

2CPU

2CPU

2CPU

2CPU

2CPU

2CPU

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

Page 47: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

2CPU

2CPU

2CPU

2CPU

2CPU

2CPU

2CPU

2CPU

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

Page 48: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

2CPU

2CPU

2CPU

2CPU

2CPU

2CPU

2CPU

2CPU

Page 49: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

2CPU

2CPU

2CPU

2CPU

2CPU

2CPU

2CPU

2CPU

Page 50: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

FPGA MVU カーネル

+

+

×

×+

×

×+

+

×

×+

×

×+

Page 51: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

「Wikipedia」 英→西訳

「戦争と平和」 露→英訳

Microsoft製FPGAボード

Page 52: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

Configurable CloudCPU compute layer

Reconfigurable compute layer

Converged network

Page 53: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock
Page 54: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock

• 1 VM 10Gbps時代(Hostは>100Gbps以上)

Host

CPU

SmartNIC

◼ SDN◼ Security◼ AI/MLS

FPGANICASIC

CPU

Page 55: MPLS JAPAN 2019 - PPT Template...2019/11/19  · ALU DSP DSP ic Distribution latches Control/Data Tokens Feature Transmission Network Stream Preprocessing FSM FE FFE DTS FE0 89 Non-BodyBlock