Top Banner
INSTITUTE OF COMPUTING TECHNOLOGY Micro Benchmarks Wanling Gao ICT, Chinese Academy of Sciences HPCA 2019, Washington D.C., USA
58

MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

Jul 31, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

INSTITUTE O

F COM

PUTING

TECHN

OLO

GY

Micro Benchmarks

Wanling Gao

ICT,ChineseAcademyofSciences

HPCA2019, WashingtonD.C.,USA

Page 2: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

AscalablebigdataandAIbenchmarksuite

n Treat big data, AI and Internet service workloads as a pipeline of units of computation handling (input or intermediate) data

n Target: find the main abstractions of time-consuming units of computation (data motifs)n The combination of data motifs = complex workloads

• Similar to Relational Algebra

n Datamotifs-basedscalablebenchmarkingmethodologyWanling Gao,Jianfeng Zhan,LeiWang, et al.DataMotif: A Lens towards Fully Understanding BigDataandAIWorkloads.PACT 2018.

Page 3: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

BigDataBench Publicationsn DataMotifs:A Lens TowardsFullyUnderstandingBigDataand AIWorkloads.

PACT’18.n BigDataBench:aScalable and UnifiedBigDataandArtificialIntelligence

BenchmarkSuite.TechnicalReport.n UnderstandingBigDataAnalyticsWorkloadson ModernProcessors.TPDS’16n Auto-tuningSparkBigDataWorkloadsonPOWER8:Prediction-BasedDynamic

SMT.PACT’16n BigDataBench:aBigDataBenchmarkSuitefromInternetServices.HPCA’14n CVR:EfficientVectorizationofSpMV onX86Processors.CGO’18.n BOPS,NotFLOPS!ANewMetric,MeasuringTool,andRooflinePerformance

ModelForDatacenterComputing.Technicalreport.n Data Motif-based Proxy Benchmarks for Big Data and AI Workloads. IISWC

2018.

Page 4: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Micro Benchmark Target

n Capture one class of unit of computation inbig data and AI

n Easily be ported to anewcomputersystemorarchitecture at an earlier stage

Page 5: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Outline

n Summary of Micro Benchmark

n Micro Benchmark Characterization

n Conclusion

Page 6: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Summary

n 27 micro benchmarksn Covering 6 workload types

• Offline analytics, Graph analytics• Streaming, NoSQL, Data warehouse• AI

n Covering 8 data motifs• Transform, Graph, Set, Sort, Matrix, Logic, Sampling, Basicstatistics

n Covering 5 application domains• Internet Service (Social network, Search engine, E-commerce)• Recognition Science• Medical Science

Page 7: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

MicroBenchmarks

AI

NoSQL

Offlineanalytics

Graphanalytics

Streaming

Datawarehouse

Page 8: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Sort

n Sort the key value according to a certain order

n Data inputn Wikipedia entries

n Software stacksn Hadoop,Spark,Flink, MPI

Page 9: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Grep

n Extract matchingstringsfromtextfilesandcountshowmanytimetheyoccurred

n Data inputn Wikipedia entries

n Software stacksn Hadoop,Spark,Flink, MPI

Page 10: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

WordCount

n Count thenumberof words inadocument

n Data inputn Wikipedia entries

n Software stacksn Hadoop,Spark,Flink, MPI

Page 11: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

MD5

n A widelyused hash function producinga128-bit hashvaluen Theinputmessageisbrokenupintochunksof512-bitblocks

n Data inputn Wikipedia entries

n Software stacksn Hadoop,Spark,MPI

Page 12: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Connected Component

n A subgraph inwhichany two verticesare connected toeachotherby pathsn Easily computed in lineartimeusingeither breadth-firstsearch or depth-firstsearch

n Data inputn Facebooksocialnetwork

n Software stacksn Hadoop,Spark,Flink,GraphLab,MPI

Page 13: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

RandSample

n Selectasubsetsamples randomlyn Using a random data generator to determinewhether the data is selected or not

n Data inputn Wikipedia entries

n Software stacksn Hadoop,Spark,MPI

Page 14: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

FFT

n Cooley–Tukeyalgorithmn radix-2 decimation-in-time(DIT)FFT

n Data inputn Two-dimensional matrix

n Software stacksn Hadoop,Spark,MPI

Page 15: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Matrix Multiply

n Compute a matrix from two matrics

n Data inputn Two-dimensional matrix

n Software stacksn Hadoop,Spark,MPI

Page 16: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

NoSQL ---Read, Write, Scan

n Benchmarksn Read records randomlyn Write new recordsn Scan records in order

n Data inputn ProfSearch resumes

• asemi-structureddatasetfromaverticalsearchengineforscientists

n Software stacksn Hbase, MongoDB

Page 17: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

OrderBy

n Order the data according to specific item

n Data inputn E-commercetransaction

n Software stacksn Hive,Spark-SQL,Impala

Page 18: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Aggregation

n Gather information andaggregate inasummaryform

n Data inputn E-commercetransaction

n Software stacksn Hive,Spark-SQL,Impala

Page 19: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Project

n Retrieve specified attributes(columns)

n Data inputn E-commercetransaction

n Software stacksn Hive,Spark-SQL,Impala

Page 20: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Filter

n Selectpartial recordsthatmatchcertaincriteria

n Data inputn E-commercetransaction

n Software stacksn Hive,Spark-SQL,Impala

Page 21: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Select

n Select a set ofrecordsfromoneormoretables

n Data inputn E-commercetransaction

n Software stacksn Hive,Spark-SQL,Impala

Page 22: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Union

n CombinetheresultoftwoormoreSELECTstatements

n Data inputn E-commercetransaction

n Software stacksn Hive,Spark-SQL,Impala

Page 23: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Convolution

n The general expression

n Data inputn Image dataset---Cifar, ImageNetn Convolution kernel

n Software stacksn TensorFlow, Caffe2, PyTorch, Pthread

Note:g(x,y) is the filtered image, f(x,y) is the original image, ω is the filter kernel

Page 24: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Fully Connected

n Haveconnectionstoallneurons inthepreviouslayern Matrix multiplication followedbyabiasoffset

n Data inputn Image dataset---Cifar, ImageNet

n Software stacksn TensorFlow, Caffe2, PyTorch, Pthread

Page 25: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Relu

n Abbreviationof rectified linear unitn Is definedasthepositivepartofitsargument

n Data inputn Image dataset---Cifar, ImageNet

n Software stacksn TensorFlow, Caffe2, PyTorch, Pthread

x is the input to a neuron

Page 26: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Sigmoid

n Sigmoid activation function

n Data inputn Image dataset---Cifar, ImageNet

n Software stacksn TensorFlow, Caffe2, PyTorch, Pthread

Page 27: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Tanh

n Tanh activation function

n Data inputn Image dataset---Cifar, ImageNet

n Software stacksn TensorFlow, Caffe2, PyTorch, Pthread

Page 28: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

MaxPooling

n Non-linear down-samplingn Dividing theinputimageintoasetofnon-overlappingrectangles

n Outputsthemaximum foreachsub-rectangle

n Data inputn Image dataset---Cifar, ImageNet

n Software stacksn TensorFlow, Caffe2, PyTorch, Pthread

Page 29: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

AvgPooling

n Non-linear down-samplingn Dividing theinputimageintoasetofnon-overlappingrectangles

n Outputstheaverage value foreachsub-rectangle

n Data inputn Image dataset---Cifar, ImageNet

n Software stacksn TensorFlow, Caffe2, PyTorch, Pthread

Page 30: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Batch Normalization

n A normalizationmethod/layerforneuralnetworksn Foralayerwithd-dimensionalinputx=(x(1)...x(d))

n Data inputn Image dataset---Cifar, ImageNet

n Software stacksn TensorFlow, Caffe2, PyTorch, Pthread

Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.

Page 31: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Cosine Normalization

n UsingCosineSimilarityinNeuralNetworksn InsteadofDotProduct

n Data inputn Image dataset---Cifar, ImageNet

n Software stacksn TensorFlow, Caffe2, PyTorch, Pthread

where netnorm is the normalized pre-activation, w⃗ is the incoming weight vector and ⃗x is the input vector, (·) indicates dot product, f is nonlinear activation function

Luo C, Zhan J, Xue X, Wang L, Ren R, Yang Q. Cosine normalization: Using cosine similarity instead of dot product in neural networks. InInternational Conference on Artificial Neural Networks 2018 Oct 4 (pp. 382-391).

Page 32: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Dropout

n A regularization techniqueforreducingoverfitting in neural networks

n Data inputn Image dataset---Cifar, ImageNet

n Software stacksn TensorFlow, Caffe2, PyTorch, Pthread

Page 33: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Software Stacks

Page 34: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Outline

n Summary of Micro Benchmark

n Micro Benchmark Characterization

n Conclusion

Page 35: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Experiment Setups

n Three-node cluster

Page 36: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Data Configuration

n To fully utilize the memory resourcesn Big data micro benchmarks

• 100 GB text data• 2^26-vertex graph data• 65536two-dimensionmatrixdata

n AI micro benchmarks• Input dimension 224*224, channels 64• 100K images from ImageNet

Page 37: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

System Behaviorsn CPU Utilization &I/OWait

n Hadoop have higer CPU utilization and less I/O wait than sparkn AI micro benchmarks have lower I/O wait than big datan Some of AI micro benchmarks are cpu intensiven Pthread benchmarks havelessCPUutilizationandI/OWaitingeneral

Page 38: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

I/O Behaviors

n Disk I/O Bandwidth &Network I/O Bandwidthn SparkstackhasmuchlargernetworkI/OpressurethanthatofHadoop

stack• Moredatashuffles,soitneedstransferringdatafromonenodetoanotherone

frequently

Page 39: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Execution Performance

n The overall running efficiency of theworkloadsn Instruction level parallelism (ILP)

• Retired instructions per cycle (IPC)

n Memory level parallelism (MLP)• Dividing L1D_PEND_MISS.PENDINGbyL1D_PEND_MISS.PENDING_CYCLES

Page 40: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Execution Performance

n ILP & MLP

n CoverawiderangeofILPandMLP behaviors

• Distinct computation andmemory access patterns

n softwarestackchangescomputationandmemoryaccesspatterns

• Hadoop FFT v.s. Spark FFT

Page 41: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Top-DownMethod

n Issuepointasthedividingpoint

From“ATop-DownMethodforPerformanceAnalysisandCountersArchitecture”

Whetherthemicrooperationisretired?

Notreadywithmoreuops

Onlyretiringis“usefulwork”

Page 42: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Pipeline Efficiency

n Top-Down Methodologyn Retiring, Frontend bound, Backend bound, Bad speculation

• Hadoop: notablestallsduetofrontendboundandbadspeculation• Spark: Higher backend bound• AI reflects different bottlenecks

Page 43: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Frontend Bound

n Frontend latency bound > Frontend bandwidth boundn Latency bound: notablestallsduetofrontendboundandbad

speculationn Bandwidth bound: deliveringinsufficientuops comparingtothe

theoreticalvalue

Page 44: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Data Motif – Frontend Bound

n Frontend Bound Breakdownn Top 3:branchresteers, instructioncachemiss, MSswitch

• The first reason is the delaystoobtainthecorrectinstructions• MS switch: bigdataandAIsystemsusemanyCISCinstructionsthatcannotbe

decodedbydefaultdecoder

Page 45: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Data Motif – Backend Bound

n Memory bound (datamovementdelays) > Core boundn Memory bound:L1, L2, L3, external memory boundn Core bound: thelackofhardwareresources or portunder-utilization

Page 46: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Overview

n Lookingbackathistory

nWhat is DataMotif

nCharacterization of Data Motif

n Impact of Data Input

nConclusion

Page 47: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Impact of Data Input

Size Pattern Type &Source

Page 48: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Similarity Analysisn Three data configurations

n Small, Medium, Large

n Sixtymetricsspanningsystemandmicro-architecture

n MeasuringSimilarityn PCAn Hierarchicalclustering

Page 49: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Page 50: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Size Impact on I/O Behaviors

n I/O Bandwidthn UsingtheI/ObandwidthofSmalldatasizeasbaseline,wenormalize

theI/ObandwidthofMediumandLargedatasize

Page 51: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Size Impact on Pipeline Behavior

n Datasizeincreasesà frontendbounddecrease, backendboundincrease

Page 52: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Impact of Data Input

Size Pattern Type &Source

Page 53: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Impact of Data Pattern

n Dense matrix V.S. Sparse matrixn I/O Bandwidth: Sparse < Densen Frontend Stalls: Sparse > Dense

Page 54: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Impact of Data Input

Size Pattern Type &Source

Page 55: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Impact of Data Type and Source

n Un-structured text data & Semi-structuredsequencedatan System:1.12-7.29 differencesn Architecture: text format incurs more backend bound

Page 56: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Outline

n Summary of Micro Benchmark

n Micro Benchmark Characterization

n Conclusion

Page 57: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019

Conclusion

n Website:n http://www.benchcouncil.org/benchmarks.htmln http://www.benchcouncil/BigDataBenchn http://prof.ict.ac.cn/BigDataBench

n Micro benchmarkn Single data motif implementation

Page 58: MicroBenchmarks - BenchCouncil · n For a layer with d-dimensional input x = (x(1) . . . x(d)) nDatainput n Imagedataset---Cifar,ImageNet nSoftwarestacks n TensorFlow,Caffe2,PyTorch,Pthread

BigDataBench HPCA2019