INSTITUTE OF COMPUTING TECHNOLOGY DCBench: a Data Center Benchmark Suite Zhen Jia (贾禛) http://prof.ict.ac.cn/zhenjia/ Institute of Computing Technology, Chinese Academy of Sciences 2nd BPOE workshop in conjunction with CCF HPC China 2013 October 31,2013,Guilin
40
Embed
Zhen Jia (贾禛prof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Naive Bayes SVM Grep WordCount K ‐ means Fuzzy K ‐ means PageRank Sort ... Support Vector Machine Classification
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
INS
TITUTE O
F CO
MP
UTIN
G TEC
HN
OLO
GY
DCBench: a Data Center Benchmark Suite
Zhen Jia (贾 禛)http://prof.ict.ac.cn/zhenjia/
Institute of Computing Technology, Chinese Academy of Sciences
2nd BPOE workshopin conjunction with CCF HPC China 2013
October 31,2013,Guilin
HPC China 20132nd BPOE
Workload SpectrumCPU intensive Memory intensive
I/O intensiveFigure from Intel
HPC China 20132nd BPOE
Workload Spectrum
Data Centers
HPC China 20132nd BPOE
Why Benchmarking ?
• Sometimes there is a solution.
HPC China 20132nd BPOE
Why Benchmarking ?
• What about the solution when …
HPC China 20132nd BPOE
Benchmark’s Role in Computer Science
“Benchmarking is the quantitative foundation of computer system and architecture research, are used to experimentally determine the benefits of new designs.”
‐‐ C. Bienia, S. Kumar, J. Singh, and K. Li. The parsec benchmark suite: Characterization and architectural implications. PACT 2008
[2] Zhen Jia et al, “Characterizing Data Analysis Workloads in Data Centers”IISWC 2013 Best Paper
HPC China 20132nd BPOE
Compared Benchmarks
Filed : Scale out workloads HPC CPU Web
Workloads :
CloudSuite v1 HPCC SPEC CPU 2006 SPEC Web 2005
Web search HPL SPEC INT TPC‐W
Data serving Streaming SPEC FP
Web serving Ptrans PARSEC
Media streaming RandomAccess
Software testing DGEMM
FFT
Comm
• Scale-out service workloads share many similarity characteristics with that of traditional service workloads.
• So we just use the service workloads to describe them
HPC China 20132nd BPOE
Breakdown of Executed Instructions
• Analysis workloads have more application level instructions• The service workloads have higher percentages of kernel level
instructions
Data analysisservice
0%10%20%30%40%50%60%70%80%90%
100%
Naive Bayes
SVM
Grep
WordC
ount
K‐means
Fuzzy K‐means
PageRa
nkSort
Hive‐ben
chIBCF
HMM avg
Software Testing
Med
ia Streaming
Data Serving
Web
Search
Web
Serving
SPEC
Web
TPC‐W
SPEC
FPSPEC
INT
PARSEC
HPCC
‐DGEM
MHP
CC‐FFT
HPCC
‐HPL
HPCC
‐PTR
ANS
HPCC
‐Rando
mAccess
HPCC
‐STR
EAM
kernel application
HPC China 20132nd BPOE
Architecture Block Diagram
Figure from Intel
HPC China 20132nd BPOE
Pipeline Stalls• The service workloads have more RAT (Register Allocation Table) stalls • The data analysis workloads have more RS (Reservation Station) and
ROB (ReOrder Buffer) full stalls• Front end stalls !
Data analysis
Service
HPC China 20132nd BPOE
Main reason of pipeline stall: memory‐wall
Figure from :The Architecture of the Nehalem Processor And Nehalem-EP SMP Platforms
HPC China 20132nd BPOE
Reasons of Front End Stalls• High Icache misses and ITLB misses cause front end stall
Data analysis service
0
20
40
60
80
100
L1 IC
ache
Miss p
er K‐In
struction
HPC China 20132nd BPOE
0
20
40
60
80
100
L2 Cache
misses pe
r k‐In
struction
L2 Cache Behaviors
• Data analysis workloads have good L2 cache behaviors
Data analysis
service
HPC China 20132nd BPOE
LLC behaviors
• Data Center workloads – Have good LLC behaviors– Better than most of the HPC workloads
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Percen
tage of L2 misses s
atisfie
d by
L3
HPC China 20132nd BPOE
Branch Prediction• Data analysis workloads have pretty good branch behaviors
• Branches of Services workloads are hard to predict
34
Data analysis service
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
Bran
ch m
ispred
ictio
n ratio
HPC China 20132nd BPOE
Some Observations• Analysis workloads are different from scale‐out service
workloads and traditional workloads• For data analysis workloads, more app level instructions are
executed• High Icache and ITLB misses
– Impact: High percentage of front end stall – Cause: Massive scale of software infrastructure, high level languages, third
party lib– Rethink the design of Icache or ITLB or simplify SW stack
• Low level caches are good for data analysis workloads– Pay more attention to area and energy of caches
• The branch predictor is quite effective
HPC China 20132nd BPOE
More information: http://prof.ict.ac.cn/DCBench/
HPC China 20132nd BPOE
Back up
HPC China 20132nd BPOE
Data Center v.s. Big Data
Big Data Analytic
Scale‐outService
VM Operation
DataIntensive
HPC
Data center Big Data
HPC China 20132nd BPOE
Each Algorithm’s Application ScenariosAlgorithm Application Scenarios
SortRanking the pages according to its importance (PageRank)Pages sorting by its ID (Web storage in database)
WordcountCalculating the TF‐IDF base information,such as term frequencyObtain the user operations count to analysis their social behavior (in Wolfram Alpha)
GrepLog analysisWeb information extractionFuzzy search
Naïve BayesSpam recognition(Spam Filtering with Naive Bayes)Bioinformatics(Naïve Bayesian Classifier for Rapid Assignment of RNA Sequences into the New Bacterial Taxonomy)
Support Vector MachineClassification ( Question Classification)Image Processing (Image annotation)Text Categorization
HPC China 20132nd BPOE
Each Algorithm’s Application Scenarios (Cont’)K‐means