INSTITUTE OF COMPUTING TECHNOLOGY How to Use BigDataBench 4.0 Jianfeng Zhan, Chen Zheng, and Wanling Gao http://prof.ict.ac.cn ICT, Chinese Academy of Sciences ASPLOS 2018, Williamsburg, VA, USA
INSTITUTE O
F COM
PUTING
TECHN
OLO
GY
How to Use BigDataBench 4.0
Jianfeng Zhan, Chen Zheng, and Wanling Gaohttp://prof.ict.ac.cn
ICT,ChineseAcademyofSciences
ASPLOS2018, Williamsburg, VA, USA
BigDataBench ASPLOS2018
General Steps to Use BigDataBench
n Currentreleasen Version4.0 onhttp://prof.ict.ac.cn
n Generalstepstorunthebenchmarksn PreparethepackageofBigDataBenchn Preparetheenvironmentsoftheselectedsoftwarestackn Generatedatasetsasyouneed•YoucanfindagenDate*oraprepare*shellscriptineachdirectoryofthebenchmarks
n Runthescriptsorcommands(User Manual!)
BigDataBench ASPLOS2018
Directory Structure
Root directory
MicroBenchmark
AI TensorFlow, Caffe2
Offline analytics Hadoop, Spark, Flink, MPI
Hadoop, Spark, Flink,GraphLab, MPIGraph analytics
NoSQL Hbase, MongoDBComponentBenchmark
Online service Xapian
Data warehouse Hive, SparkSQL, Impala
Streaming Spark streaming, JStorm
Data Generator(BDGS)
BigDataBench ASPLOS2018
BDGS - Text
n Text_datagenn Wikipedia generator - 3trainedmodels• lda_wiki1w, wiki_1w5, wiki_noSW_90_Sampling
n Amazon movie review generator – 2 models• amazonMR1, AMR1_noSW_95_Sampling
n Use“gen_text_data.sh”
e.g.lda_wiki1w e.g.10 e.g.100 e.g.10000
e.g.amazonMR1 e.g.10 e.g.100 e.g.10000
Wiki example:
Amazon example:
BigDataBench ASPLOS2018
BDGS - Graph
n Graph_datagenn Kronecker Model• Weighted graph• Un-weighted graph
e.g.kronecker model parameter Vertex: 2^16
BigDataBench ASPLOS2018
BDGS - Table
n Table_datagenn E-commerce data generation• PDGF: usesXMLconfigurationfilesfordatadescriptionanddistribution
n Personal Resume generation
BigDataBench ASPLOS2018
Micro Benchmark
n Offline analytics & Graph analyticsn Streaming
BigDataBench ASPLOS2018
Offline Analytics - RandSample
n Target: run RandSample microbenchmarkn General steps:
n Prepare Hadoop environmentn Prepare input data• Using wikipedia text data generator
n ./run_RandSample.sh• hadoop jarRandSample.jar RandSample <input><output><sample_ratio>
BigDataBench ASPLOS2018
Offline Analytics – FFT examplen Target: run “FFT” micro benchmarkusinghadoopn General steps:
n Prepare Hadoop environmentn Prepare matrix data
• cd/BigDataBench_V4.0_Hadoop/MicroBenchmark/OfflineAnalytics/FFT• sh genData_FFT.shsh generate-matrix<mat_row><mat_col><sparsity>
n RunFFT:• sh run_FFT.shhadoop jarfft.jarorg.fft.fft <inputfile><outputfile1><outputfile2><log2_col><log2_co>:(auto-generated by run_FFT.sh)
BigDataBench ASPLOS2018
Streaming – Grep example
n Target:rungrep benchmarkusingSparkstreamingn Generalsteps:
n PrepareSparkstreamingenvironmentn cd/BigDataBench_V4.0_Streaming/MicroBenchmark/Streaming/Grep
n ./run-sparkstreaming-grep.sh
BigDataBench ASPLOS2018
Micro Benchmark
n AI
BigDataBench ASPLOS2018
AI – Conv2d example
n Target: run conv2d micro benchmark usingTensorFlow
n General steps:n Prepare TensorFlow environmentn Prepare image datan Config image directory in conv2d.pyn python conv2d.py
BigDataBench ASPLOS2018
Micro Benchmark
n NoSQL
BigDataBench ASPLOS2018
NoSQL – Write example
n Target:run“write”operationsusingHBasen Generalsteps:
n PrepareHBase accordingtotheofficeguide• sh /hbase-0.94.5/bin/hbase shell• create'usertable','f1','f2','f3'
n PrepareYCSBastheworkloadgenerator• YCSBisinthedirectoryofBasicDatastoreOperaOons/ycsb-0.1.4
n RunYCSBcommandslikethis:• •sh bin/ycsb loadhbase -Pworkloads/workloadc -pthreads=<thread-numbers>-pcolumnfamily=<family>-precordcount=<recordcount-value>-phosts=<hosOp>-s>load.dat
BigDataBench ASPLOS2018
Component Benchmark
n AI
BigDataBench ASPLOS2018
AI – Alexnet Examplen Target: run “Alexnet” micro benchmarkusingTensorflown General steps:
n Prepare Tensorflow environmentn RunAlexnet:
• cd/BigDataBench_V4.0_Tensorflow/ComponentBenchmark/AI/Alexnet• pythonalexnet_cifar10.py• Choosing CPU or GPU environment
BigDataBench ASPLOS2018
Component Benchmark
n Offline analytics & Graph analyticsn Streaming
BigDataBench ASPLOS2018
Offline Analytics – SIFT examplen Target: run “SIFT” component benchmarkusinghadoopn General steps:
n Prepare Hadoop environmentn Prepare SIFT data
• cd/BigDataBench_V4.0_Hadoop/ComponentBenchmark/OfflineAnalytics/SIFT• Put the image data under SIFT directory• sh genData_SIFT.shhadoopjar$jarFile/hibImport.jar-h/testimage/out.hib
n RunSIFT:• sh run_SIFT.shhadoop jarsift.jar<out.hib><outsif><out.hib>:genData_SIFT.shgeneratedata<outsif>:theresulttosavepath
BigDataBench ASPLOS2018
Streaming – Kmeans example
n Target:runkmeans benchmarkusingSparkstreaming
n Generalsteps:n PrepareSparkstreamingenvironmentn cd/BigDataBench_V4.0_Streaming/ComponentBenchmark/Streaming/Kmeans
n ./run-sparkstreaming-kmeans.sh
BigDataBench ASPLOS2018
Graph Analytics – PageRankn Target: run “PageRank” component benchmarkusinghadoopn General steps:
n Prepare Hadoop environmentn Runthedatagenerationscript
• cd/BigDataBench_V4.0_Hadoop/ComponentBenchmark/GraphAnalytics/PageRank
• sh genData_PageRank.sh
n RunPageRank:• sh run_PageRank.shhadoop jarpegasus.PagerankNaive <inputfile>pr_tempmv pr_output<Internation><reducers><1024><makesym><new>
BigDataBench ASPLOS2018
Online Service – Xapian (cont’)
n Target: run searching using Xapiann General steps:
n 3) Online searching• Run xapian/run_networked.sh
BigDataBench ASPLOS2018
Online Service – Xapian
n Target: run searching using Xapiann General steps:
n 1) Install Xapian according to user manual• ./build.sh to install harness (gcc version > 4.8)• xapian/build.sh to install xapian
BigDataBench ASPLOS2018
Online Service – Xapian (cont’)
n Target: run searching using Xapiann General steps:
n 2) Configuration• vim xapian/run_networked.sh
BigDataBench ASPLOS2018
Component Benchmark
n Data warehouse
BigDataBench ASPLOS2018
Data Warehouse – Select example
n Target: run “Select” benchmarkusinghadoop hiven General steps:
n Prepare Hadoop andhiveenvironmentn Runthedatagenerationscript
• cd/BigDataBench_V4.0_Hadoop/ComponentBenchmark/Datawarehouse/Select/• sh genData_Select.sh
n RunSelectlikethis:• sh run_Select.sh
BigDataBench ASPLOS2018
Conclusion
n Website:http://prof.ict.ac.cn
n Please refer to user manual for more details !
BigDataBench ASPLOS2018