INSTITUTE OF COMPUTING TECHNOLOGY Benchmarking Datacenter and Big Data Systems Wanling Gao, Zhen Jia, Lei Wang, Yuqing Zhu, Chunjie Luo, Yingjie Shi, Yongqiang He, Shiming Gong, Xiaona Li, Shujie Zhang, Bizhu Qiu, Lixin Zhang, Jianfeng Zhan http://prof.ict.ac.cn/ICTBench 1
56
Embed
Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/2.0/BigDataBenchmarking_xi'an.pdf · Big Data Benchmarking Workshop DCBench DCBench: typical data center
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
Acknowledgements
This work is supported by the Chinese 973 project (Grant No.2011CB302502), the Hi-Tech Research and Development (863) Program of China (Grant No.2011AA01A203, No.2013AA01A213), the NSFC project (Grant No.60933003, No.61202075) , the BNSFproject (Grant No.4133081), and Huawei funding.
2/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
Publications BigDataBench: a Big Data Benchmark Suite from Web Search Engines. Wanling Gao, et
al. The Third Workshop on Architectures and Systems for Big Data (ASBD 2013) in conjunction with ISCA 2013.
Characterizing Data Analysis Workloads in Data Centers. Zhen Jia, et al. 2013 IEEE International Symposium on Workload Characterization (IISWC-2013)
Characterizing OS behavior of Scale-out Data Center Workloads. Chen Zheng et al. Seventh Annual Workshop on the Interaction amongst Virtualization, Operating Systems and Computer Architecture (WIVOSCA 2013). In Conjunction with ISCA 2013.[
Characterization of Real Workloads of Web Search Engines. Huafeng Xi et al. 2011 IEEE International Symposium on Workload Characterization (IISWC-2011).
The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems. Zhen Jia et al. Second workshop of big data benchmarking (WBDB 2012 India) & Lecture Note in Computer Science (LNCS)
CloudRank-D: Benchmarking and Ranking Cloud Computing Systems for Data Processing Applications. Chunjie Luo et al. Front. Comput. Sci. (FCS) 2012, 6(4): 347–362
3/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
Content
Background and Motivation
Our ICTBench
Case studies
4/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
Question One
Gap between Industry and Academia Longer and longer distance
• Code • Data sets
5/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
Question Two
Different benchmark requirements Architecture communities
• Simulation is very slow • Small data and code sets
System communities • Large-scale deployment is valuable.
Users • There are three kind of lies: lies, damn lies, and
benchmarks • Real-world applications
6/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
Data Centers in the World
Emerson December 2011 http://www.emersonnetworkpower.com/en-US/About/NewsRoom/Pages/2011DataCenterState.aspx
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
DCBench
DCBench: typical data center workloads Different from scientific computing: FLOPS Cover applications in important domains
• Search engine, electronic commence etc. Each benchmark = a single application
Purposes Architecture system (small-to-medium) researches
15/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
BigDataBench
Characterizing big data applications Not including data-intensive super computing Synthetic data sets varying from 10G~ PB Each benchmark = a single big application.
Purposes large-scale system and architecture researches
An incremental approach Release a start-up benchmark suite
• Workloads in the search engine system
Other important domains
16/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
CloudRank
Cloud computing Elastic resource management Consolidating different workloads
Cloud benchmarks Each benchmark = a group of consolidated data
center workloads. Three benchmarks: services/ data processing/ desktop
Purposes Capacity planning, system evaluation and researches User can customize their benchmarks.
17/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
Benchmarking Methodology To decide and rank main application domains
according to a publicly available metric e.g. page view and daily visitors
To single out the main applications from main
applications domains
18/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
Top Sites on the Web
More details in http://www.alexa.com/topsites/global;0
40%
25%
15%
5%
15%
Search Engine Social NetworkElectronic Commerce Media StreamingOthers
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
Front End Stall Reasons For DC, High Instruction cache miss and Instruction TLB
miss make the front end inefficiency
0
20
40
60
80
100
L1 I
Cach
e M
iss p
er K
-Inst
ruct
ion
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
ITLB
Pag
e W
alks
per
K-in
stru
ctio
n
44/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
MLC Behaviors DC workloads have more MLC misses than HPC
Data analysis workloads own better locality (less L2 cache misses)
0
20
40
60
80
100
L2 C
ache
mis
ses
per k
-Inst
ruct
ion
Data analysis
Service
HPCC
45/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
LLC Behaviors LLC is good enough for DC workloads
Most L2 cache misses can be satisfied by LLC
0%10%20%30%40%50%60%70%80%90%
100%
Nai
ve B
ayes
SVM
Grep
Wor
dCou
ntK-
mea
nsFu
zzy
K-m
eans
Page
Rank
Sort
Hive
-ben
chIB
CFHM
M avg
Soft
war
e Te
stin
gM
edia
Str
eam
ing
Data
Ser
ving
Web
Sea
rch
Web
Ser
ving
SPEC
FPSP
ECIN
TSP
ECW
ebHP
CC-C
OM
MHP
CC-D
GEM
MHP
CC-F
FTHP
CC-H
PLHP
CC-P
TRAN
SHP
CC-R
ando
mAc
cess
HPCC
-STR
EAM
The
ratio
of L
3 Ca
che
satis
fed
L2
Cach
e M
iss
46/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
DTLB Behaviors DC workloads own more DTLB miss than HPC
Most data analysis workloads have less DTLB miss
0
0.5
1
1.5
2
2.5
Nai
ve B
ayes
SVM
Grep
Wor
dCou
ntK-
mea
nsFu
zzy
K-m
eans
Page
Rank
Sort
Hive
-ben
chIB
CFHM
M avg
Soft
war
e Te
stin
gM
edia
Str
eam
ing
Data
Ser
ving
Web
Sea
rch
Web
Ser
ving
SPEC
FPSP
ECIN
TSP
ECW
ebHP
CC-C
OM
MHP
CC-D
GEM
MHP
CC-F
FTHP
CC-H
PLHP
CC-P
TRAN
SHP
CC-R
ando
mAc
cess
HPCC
-STR
EAMPa
ge W
alks
per
K-In
stru
ctio
n Data analysis Service HPCC
47/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
Branch Prediction DC:
Data analysis workloads have pretty good branch behaviors
Service’s branch is hard to predict
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
7.00%
8.00%
Nai
ve B
ayes
SVM
Grep
Wor
dCou
ntK-
mea
nsFu
zzy
K-m
eans
Page
Rank
Sort
Hive
-ben
chIB
CFHM
M avg
Soft
war
e Te
stin
gM
edia
Str
eam
ing
Data
Ser
ving
Web
Sea
rch
Web
Ser
ving
SPEC
FPSP
ECIN
TSP
ECW
ebHP
CC-C
OM
MHP
CC-D
GEM
MHP
CC-F
FTHP
CC-H
PLHP
CC-P
TRAN
SHP
CC-R
ando
mAc
cess
HPCC
-STR
EAMBr
anch
mis
pred
ictio
n ra
tio
Data analysis
Service
HPCC
48/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
DC Workloads Characteristics Data analysis workloads have different behaviors from service
workloads Instruction execution level: service own more kernel level instructions Cache behaviors: data analysis own better locality Branch prediction: service workloads are hard to predict
Front end inefficiency ITLB misses L1 I Cache misses
Diversity workloads are needed Different workloads have different characteristics No one-fit-all solution
49/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
Use Case 2: System Evaluation
Using BigDataBench 1.0 Beta Data Scale
10 GB – 2 TB
Hadoop Configuration 1 master 14 slave node
50/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
System Evaluation a threshold for each workload
100MB ~ 1TB System is fully loaded when the data
volume exceeds the threshold
Sort is an exception An inflexion point(10GB ~ 1TB) Data processing rate decreases after
this point Global data access requirements
• I/O and network bottleneck
System performance is dependent on applications and data volumes.
51/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
Use Case 3: Architecture Research
Using BigDataBench 1.0 Beta Data Scale
10 GB – 2 TB
Hadoop Configuration 1 master 14 slave node
52/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
Use Case 3: Architecture Research
Some micro-architectural events are tending towards stability when the data volume increases to a certain extent
Cache and TLB behaviors have different trends with increasing data volumes for different workloads L1I_miss/1000ins: increase for Sort, decrease for Grep
53/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
Search Engine Service Experiments
Same phenomena is observed Micro-architectural events
are tending towards stability when the index size increases to a certain extent
Big data impose challenges
to architecture researches since large-scale simulation is time-consuming
Index size:2GB ~ 8GB Segment size:4.4GB ~ 17.6GB
54/
Big Data Benchmarking Workshop Big Data Benchmarking Workshop
Conclusion
ICTBench DCBench BigDataBench CloudRank
An open-source project on datacenter and big data benchmarking http://prof.ict.ac.cn/ICTBench