Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/2.0/BigDataBenchmarking_xi'an.pdf · Big Data Benchmarking Workshop DCBench DCBench: typical data center

INSTITUTE O

F COM

PUTING

TECHN

OLO

GY

Benchmarking Datacenter and Big Data Systems

Wanling Gao, Zhen Jia, Lei Wang, Yuqing Zhu, Chunjie Luo, Yingjie Shi, Yongqiang He, Shiming Gong, Xiaona Li, Shujie Zhang, Bizhu Qiu, Lixin Zhang, Jianfeng Zhan

http://prof.ict.ac.cn/ICTBench

1


Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Acknowledgements

This work is supported by the Chinese 973 project (Grant No.2011CB302502), the Hi-Tech Research and Development (863) Program of China (Grant No.2011AA01A203, No.2013AA01A213), the NSFC project (Grant No.60933003, No.61202075) , the BNSFproject (Grant No.4133081), and Huawei funding.

2/


Publications BigDataBench: a Big Data Benchmark Suite from Web Search Engines. Wanling Gao, et

al. The Third Workshop on Architectures and Systems for Big Data (ASBD 2013) in conjunction with ISCA 2013.

Characterizing Data Analysis Workloads in Data Centers. Zhen Jia, et al. 2013 IEEE International Symposium on Workload Characterization （IISWC-2013)

Characterizing OS behavior of Scale-out Data Center Workloads. Chen Zheng et al. Seventh Annual Workshop on the Interaction amongst Virtualization, Operating Systems and Computer Architecture (WIVOSCA 2013). In Conjunction with ISCA 2013.[

Characterization of Real Workloads of Web Search Engines. Huafeng Xi et al. 2011 IEEE International Symposium on Workload Characterization （IISWC-2011).

The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems. Zhen Jia et al. Second workshop of big data benchmarking (WBDB 2012 India) & Lecture Note in Computer Science (LNCS)

CloudRank-D: Benchmarking and Ranking Cloud Computing Systems for Data Processing Applications. Chunjie Luo et al. Front. Comput. Sci. (FCS) 2012, 6(4): 347–362

3/


Content

Background and Motivation

Our ICTBench

Case studies

4/


Question One

Gap between Industry and Academia Longer and longer distance

• Code • Data sets

5/


Question Two

Different benchmark requirements Architecture communities

• Simulation is very slow • Small data and code sets

System communities • Large-scale deployment is valuable.

Users • There are three kind of lies: lies, damn lies, and

benchmarks • Real-world applications

6/


Data Centers in the World

Emerson December 2011 http://www.emersonnetworkpower.com/en-US/About/NewsRoom/Pages/2011DataCenterState.aspx

7/

http://www.emersonnetworkpower.com/en-US/About/NewsRoom/Pages/2011DataCenterState.aspx




State-of-Practice Benchmark Suites

SPEC CPU SPEC Web HPCC PARSEC

TPCC YCSB Gridmix

8/


Current Benchmarks Field Benchmark Name

CPU SPEC CPU

Web server SPEC Web

CMP PARSEC

OLTP TPC-C

OLAP TPC-DS

HPC HPCC, Linpack

NoSQL YCSB

Network httperf

… …

9/


Why a New Benchmark Suite for Datacenter Computing

No benchmark suite covers diversity of data center workloads

State-of-art: CloudSuite Only includes 6 applications according to its

popularity

10/


Memory Level Parallelism(MLP): Simultaneously outstanding cache misses

Why a New Benchmark Suite (Cont’)

MLP

11/

CloudSuite

our benchmark suite

DCBench


Scale-out performance

Why a New Benchmark Suite (Cont’)

1

2

3

4

5

6

1 4 8

sort

grep

wordcount

svm

kmeans

fkmeans

all-pairs

Bayes

HMM

Spe

ed u

p

Cloudsuite Data analysis benchmark

Working nodes

DCBench

12/


Content


Our ICTBench

Case studies

13/


ICTBench Project

Benchmarking Foundation of researches. Bridge

ICTBench: three benchmark suites DCBench: architecture (application, OS, and VM

execution) BigDataBench: System (large-scale big data application) CloudRank: Cloud benchmarks (distributed management)

Project homepage http://prof.ict.ac.cn/ICTBench

14/



DCBench

DCBench: typical data center workloads Different from scientific computing: FLOPS Cover applications in important domains

• Search engine, electronic commence etc. Each benchmark = a single application

Purposes Architecture system (small-to-medium) researches

15/


BigDataBench

Characterizing big data applications Not including data-intensive super computing Synthetic data sets varying from 10G~ PB Each benchmark = a single big application.

Purposes large-scale system and architecture researches

An incremental approach Release a start-up benchmark suite

• Workloads in the search engine system

Other important domains

16/


CloudRank

Cloud computing Elastic resource management Consolidating different workloads

Cloud benchmarks Each benchmark = a group of consolidated data

center workloads. Three benchmarks: services/ data processing/ desktop

Purposes Capacity planning, system evaluation and researches User can customize their benchmarks.

17/


Benchmarking Methodology To decide and rank main application domains

according to a publicly available metric e.g. page view and daily visitors

To single out the main applications from main

applications domains

18/


Top Sites on the Web

More details in http://www.alexa.com/topsites/global;0

40%

25%

15%

5%

15%

Search Engine Social NetworkElectronic Commerce Media StreamingOthers

Top Sites on the Web

19/

http://www.alexa.com/topsites/global;0


Benchmarking Methodology To decide and rank main application domains

according to a publicly available metric e.g. page view and daily visitors

To single out the main applications from main

applications domains

20/


40%

25%

15%

5%

15%


Algorithms in Top Sites: Search Engine

Algorithms used in Search: Pagerank Graph mining Segmentation Feature Reduction Grep Statistical counting Vector calculation sort Recommendation ……

Top Sites on The Web

21/


Our practice

Building a sematic search engine (Chinese) ProfSearch

• Search scientists or professionals • 267083 researchers across 260 universities and institutes • http://prof.ict.ac.cn/

22/


ProfSearch

• Scrapy

Crawler Workloads

• SVM, Naïve Bayes, K-means, HMM, CRFs, LSA, LDA

Analysis Workloads

• HDFS – Storing unstructured web pages • HIVE – Storing semi-structured intermediate data • MySQL – Storing structured data extracted from the web

Store and Management Workloads

• Sphinx

Web Service Workloads

23/


40%

25%

15%

5%

15%


Algorithms in Top Sites: Social Network

Algorithms used in Social Network: Recommendation Clustering Classification Graph mining Grep Feature Reduction Statistical counting Vector calculation Sort ……


24/


40%

25%

15%

5%

15%


Algorithms in Top Sites: Electronic Commerce

Algorithms used in electronic commerce: Recommendation Associate rule mining Warehouse operation Clustering Classification Statistical counting Vector calculation ……


25/


Main Algorithms in Data Centers

Data center algorithms

Basic operation

Association rule mining

Classification

Cluster

Recommendation

Warehouse operation

Feature reduction

Graph mining

Vector calculate

Segmentation

26/


Where Do Those Algorithms Exactly Used in Data Centers ?

Here, lets’ investigate mostly used applications in data centers

The ubiquitous search engine Frequently used recommendation

sub-systems

27/


Main Arithmetic in Common Search Engines （Nutch）

Word Grep Word Count

Segmentation

Sort Classification DecisionTree

BFS

Segmentation Scoring & Sort

Merge Sort

Vector calculate PageRank

28/


Algorithms in Search Engine

graph mining

grep & segmentation

pagerank word count

sort

vector calculation

29/


Representative Algorithms in Search Engine

Algorithms Role in the search engine

graph mining crawl web page

Grep abstracting content from HTML

segmentation word segmentation

pagerank compute the page rank value

Word counting word frequency count

vector calculation document matching

sort document sorting

30/


Algorithms in Recommendation Sub-systems

31/


Representative Algorithms in Recommendation Sub-systems

Algorithms Role in the recommendation sub-systems

Classification classify web pages/user behavior

Frequent pattern growth user log mining

Hidden markov model information extraction

Clustering/similarity analysis clustering web pages/user behavior

Collaborative filtering recommendation

Feature reduction text representation/user behavior representation

Graph mining web link analysis

32/


Overview of DCBench Category Workloads Programmin

g model language source

Basic operation Sort MapReduce Java Hadoop Wordcount MapReduce Java Hadoop Grep MapReduce Java Hadoop

Classification Naïve Bayes MapReduce Java Mahout Support Vector Machine

MapReduce Java Implemented by ourself

Cluster K-means MapReduce Java Mahout MPI C++ IBM PML

Fuzzy k-means MapReduce Java Mahout MPI C++ IBM PML

Recommendation

Item based Collaborative Filtering

MapReduce Java Mahout

Association rule mining

Frequent pattern growth

MapReduce Java Mahout

Segmentation Hidden Markov model MapReduce Java Implemented by ourself

33/


Category Workloads Programming model

language source

Warehouse operation

Database operations MapReduce Java Hive-bench

Feature reduction

Principal Component Analysis

MPI C++ IBM PML

Kernel Principal Component Analysis

MPI C++ IBM PML

Vector calculate Paper similarity analysis

All-Pairs C&C++ Implemented by ourself

Graph mining Breadth-first search MPI C++ Graph500

Pagerank MapReduce Java Mahout Service Search engine C/S Java nutch

Auction C/S Java Rubis

Service Media streaming C/S Java Cloudsuite

Overview of DCBench (Cont’)

34/


Workloads in BigDataBench 1.0 Beta

Analysis Workloads Simple but representative operations

• Sort, Grep, Wordcount Highly recognized algorithms

• Naïve Bayes, SVM

Search Engine Service Workloads Widely deployed services

• Nutch Server

35/


Features of Workloads Workloads Resource

Characteristic Computing Complexity Instructions

Sort I/O bound O(n*lgn) Integer comparison domination

Wordcount CPU bound

O(n) Integer comparison and calculation domination

Grep Hybrid

O(n) Integer comparison

domination

Naïve Bayes /

O(m*n) [m: the length of

dictionary]

Floating-point computation domination

SVM /

O(M*n) [M: the number of support

vectors * dimension]

Floating-point computation domination

Nutch Server I/O & CPU bound

Integer comparison

domination

36/


Variety of Workloads are Included

Workloads

Off-line

Base Operations

I/O bound Sort

CPU bound Wordcount

Hybrid Grep

Machine Learning

Naïve Bayes SVM

On-line

Nutch Server

37/


Methodology of Generating Big Data

To preserve the characteristics of real-world data

Small-scale

Data Big Data

Characteristic Analysis

Expand

Semantic Locality

Temporally

Spatially Word frequency

Word reuse distance

Word distribution in document

38/


Content


Our ICTBench

Case studies

39/


Use Case 1: Microarchitecture Characterization

Using DCBench Five nodes cluster

one mater and four slaves(working nodes)

Each node:

40/


Instructions Execution level

DCBench: Data analysis workloads have more app-level instructions Service workloads have higher percentages of kernel-level

instructions

0%10%20%30%40%50%60%70%80%90%

100%

Nai

ve B

ayes

SVM

Grep

Wor

dCou

ntK-

mea

nsFu

zzy

K-m

eans

Page

Rank

Sort

Hive

-ben

chIB

CFHM

M avg

Soft

war

e Te

stin

gM

edia

Str

eam

ing

Data

Ser

ving

Web

Sea

rch

Web

Ser

ving

SPEC

FPSP

ECIN

TSP

ECW

ebHP

CC-C

OM

MHP

CC-D

GEM

MHP

CC-F

FTHP

CC-H

PLHP

CC-P

TRAN

SHP

CC-R

ando

mAc

cess

HPCC

-STR

EAM

kernel application

service

Data analysis

41/


Architecture Block Diagram

42/


Pipeline Stall DC workloads have severe front end stall (i.e. instruction

fetch stall) Services: more RAT(Register Allocation Table) stall Data analysis: more RS(Reservation Station) and ROB(ReOrder Buffer) full

stall

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Instruction fetch_stall Rat_stall load_stall RS_full stall store_stall ROB_full stall

43/


Front End Stall Reasons For DC, High Instruction cache miss and Instruction TLB

miss make the front end inefficiency

0

20

40

60

80

100

L1 I

Cach

e M

iss p

er K

-Inst

ruct

ion

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

ITLB

Pag

e W

alks

per

K-in

stru

ctio

n

44/


MLC Behaviors DC workloads have more MLC misses than HPC

Data analysis workloads own better locality (less L2 cache misses)

0

20

40

60

80

100

L2 C

ache

mis

ses

per k

-Inst

ruct

ion

Data analysis

Service

HPCC

45/


LLC Behaviors LLC is good enough for DC workloads

Most L2 cache misses can be satisfied by LLC

0%10%20%30%40%50%60%70%80%90%

100%

Nai

ve B

ayes

SVM

Grep

Wor

dCou

ntK-

mea

nsFu

zzy

K-m

eans

Page

Rank

Sort

Hive

-ben

chIB

CFHM

M avg

Soft

war

e Te

stin

gM

edia

Str

eam

ing

Data

Ser

ving

Web

Sea

rch

Web

Ser

ving

SPEC

FPSP

ECIN

TSP

ECW

ebHP

CC-C

OM

MHP

CC-D

GEM

MHP

CC-F

FTHP

CC-H

PLHP

CC-P

TRAN

SHP

CC-R

ando

mAc

cess

HPCC

-STR

EAM

The

ratio

of L

3 Ca

che

satis

fed

L2

Cach

e M

iss

46/


DTLB Behaviors DC workloads own more DTLB miss than HPC

Most data analysis workloads have less DTLB miss

0

0.5

1

1.5

2

2.5

Nai

ve B

ayes

SVM

Grep

Wor

dCou

ntK-

mea

nsFu

zzy

K-m

eans

Page

Rank

Sort

Hive

-ben

chIB

CFHM

M avg

Soft

war

e Te

stin

gM

edia

Str

eam

ing

Data

Ser

ving

Web

Sea

rch

Web

Ser

ving

SPEC

FPSP

ECIN

TSP

ECW

ebHP

CC-C

OM

MHP

CC-D

GEM

MHP

CC-F

FTHP

CC-H

PLHP

CC-P

TRAN

SHP

CC-R

ando

mAc

cess

HPCC

-STR

EAMPa

ge W

alks

per

K-In

stru

ctio

n Data analysis Service HPCC

47/


Branch Prediction DC:

Data analysis workloads have pretty good branch behaviors

Service’s branch is hard to predict

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

7.00%

8.00%

Nai

ve B

ayes

SVM

Grep

Wor

dCou

ntK-

mea

nsFu

zzy

K-m

eans

Page

Rank

Sort

Hive

-ben

chIB

CFHM

M avg

Soft

war

e Te

stin

gM

edia

Str

eam

ing

Data

Ser

ving

Web

Sea

rch

Web

Ser

ving

SPEC

FPSP

ECIN

TSP

ECW

ebHP

CC-C

OM

MHP

CC-D

GEM

MHP

CC-F

FTHP

CC-H

PLHP

CC-P

TRAN

SHP

CC-R

ando

mAc

cess

HPCC

-STR

EAMBr

anch

mis

pred

ictio

n ra

tio

Data analysis

Service

HPCC

48/


DC Workloads Characteristics Data analysis workloads have different behaviors from service

workloads Instruction execution level: service own more kernel level instructions Cache behaviors: data analysis own better locality Branch prediction: service workloads are hard to predict

Front end inefficiency ITLB misses L1 I Cache misses

Diversity workloads are needed Different workloads have different characteristics No one-fit-all solution

49/


Use Case 2: System Evaluation

Using BigDataBench 1.0 Beta Data Scale

10 GB – 2 TB

Hadoop Configuration 1 master 14 slave node

50/


System Evaluation a threshold for each workload

100MB ~ 1TB System is fully loaded when the data

volume exceeds the threshold

Sort is an exception An inflexion point(10GB ~ 1TB) Data processing rate decreases after

this point Global data access requirements

• I/O and network bottleneck

System performance is dependent on applications and data volumes.

51/


Use Case 3: Architecture Research

Using BigDataBench 1.0 Beta Data Scale

10 GB – 2 TB

Hadoop Configuration 1 master 14 slave node

52/


Use Case 3: Architecture Research

Some micro-architectural events are tending towards stability when the data volume increases to a certain extent

Cache and TLB behaviors have different trends with increasing data volumes for different workloads L1I_miss/1000ins: increase for Sort, decrease for Grep

53/


Search Engine Service Experiments

Same phenomena is observed Micro-architectural events

are tending towards stability when the index size increases to a certain extent

Big data impose challenges

to architecture researches since large-scale simulation is time-consuming

Index size：2GB ~ 8GB Segment size：4.4GB ~ 17.6GB

54/


Conclusion

ICTBench DCBench BigDataBench CloudRank

An open-source project on datacenter and big data benchmarking http://prof.ict.ac.cn/ICTBench

Welcome downloading

55/

http://prof.ict.ac.cn/


Thank you! Any questions?

56/

Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/2.0/BigDataBenchmarking_xi'an.pdf · Big Data Benchmarking Workshop DCBench DCBench: typical data center

Documents