Performance Annotations for Cloud Computing · 2019-12-18 · Data Centers Are Everywhere... Data centers provide Services for end users Google, Facebook, Dropbox Services for companies

Post on 07-Jun-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Performance Annotations for Cloud Computing

Daniele Rogora∗ Steffen Smolka% Antonio Carzaniga∗ Amer Diwan$

Robert Soulé∗∼

presented by

Daniele Rogora

∗Università della Svizzera italiana %Cornell University $Google ∼Barefoot Networks

HotCloud 2017

1 / 27

Data Centers Are Everywhere...

Data centers provide◮ Services for end users

◮ Google, Facebook, Dropbox

◮ Services for companies and universities◮ AWS for SAP, cloud mail services

◮ Raw processing power (IaaS, PaaS)◮ Amazon EC2, Microsoft Azure

2 / 27

Data Centers Are Everywhere...

Data centers provide◮ Services for end users

◮ Google, Facebook, Dropbox

◮ Services for companies and universities◮ AWS for SAP, cloud mail services

◮ Raw processing power (IaaS, PaaS)◮ Amazon EC2, Microsoft Azure

Data centers are widespread◮ Microsoft has more than 100 data centers

◮ they account for more than 1M servers

◮ Amazon has more than 30 data centers◮ they account for more than 1.5M servers

◮ Google has 15 data centers scattered around the world◮ in 2013 they accounted for around 900k servers

2 / 27

... But They Are Complex

Machine 1

Machine 2

Machine 3

Machine 4

Machine 5

Switch 1

Machine 6

Machine 7

Machine 8

Machine 9

Machine 10

Switch 2

Machine 11

Machine 12

Machine 13

Machine 14

Machine 15

Switch 3

3 / 27

... But They Are Complex

Machine 1

Machine 2

Machine 3

Machine 4

Machine 5

Switch 1

Machine 6

Machine 7

Machine 8

Machine 9

Machine 10

Switch 2

Machine 11

Machine 12

Machine 13

Machine 14

Machine 15

Switch 3

CPU HDD NET

3 / 27

... But They Are Complex

Machine 1

Machine 2

Machine 3

Machine 4

Machine 5

Switch 1

Machine 6

Machine 7

Machine 8

Machine 9

Machine 10

Switch 2

Machine 11

Machine 12

Machine 13

Machine 14

Machine 15

Switch 3

CPU HDD NET

Operating System

3 / 27

... But They Are Complex

Machine 1

Machine 2

Machine 3

Machine 4

Machine 5

Switch 1

Machine 6

Machine 7

Machine 8

Machine 9

Machine 10

Switch 2

Machine 11

Machine 12

Machine 13

Machine 14

Machine 15

Switch 3

CPU HDD NET

Operating System

Virtual Network

3 / 27

... But They Are Complex

Machine 1

Machine 2

Machine 3

Machine 4

Machine 5

Switch 1

Machine 6

Machine 7

Machine 8

Machine 9

Machine 10

Switch 2

Machine 11

Machine 12

Machine 13

Machine 14

Machine 15

Switch 3

CPU HDD NET

Operating System

Virtual Network

Virtual Machines

3 / 27

... But They Are Complex

Machine 1

Machine 2

Machine 3

Machine 4

Machine 5

Switch 1

Machine 6

Machine 7

Machine 8

Machine 9

Machine 10

Switch 2

Machine 11

Machine 12

Machine 13

Machine 14

Machine 15

Switch 3

CPU HDD NET

Operating System

Virtual Network

Virtual Machines

Software

3 / 27

... But They Are Complex

Machine 1

Machine 2

Machine 3

Machine 4

Machine 5

Switch 1

Machine 6

Machine 7

Machine 8

Machine 9

Machine 10

Switch 2

Machine 11

Machine 12

Machine 13

Machine 14

Machine 15

Switch 3

CPU HDD NET

Operating System

Virtual Network

Virtual Machines

Software

CPU HDD NET

Operating System

Virtual Network

Virtual Machines

Software

3 / 27

... But They Are Complex

Machine 1

Machine 2

Machine 3

Machine 4

Machine 5

Switch 1

Machine 6

Machine 7

Machine 8

Machine 9

Machine 10

Switch 2

Machine 11

Machine 12

Machine 13

Machine 14

Machine 15

Switch 3

CPU HDD NET

Operating System

Virtual Network

Virtual Machines

Software

CPU HDD NET

Operating System

Virtual Network

Virtual Machines

Software

4 / 27

... But They Are Complex

Machine 1

Machine 2

Machine 3

Machine 4

Machine 5

Switch 1

Machine 6

Machine 7

Machine 8

Machine 9

Machine 10

Switch 2

Machine 11

Machine 12

Machine 13

Machine 14

Machine 15

Switch 3

CPU HDD NET

Operating System

Virtual Network

Virtual Machines

Web Server

CPU HDD NET

Operating System

Virtual Network

Virtual Machines

DBMS

4 / 27

... But They Are Complex

Machine 1

Machine 2

Machine 3

Machine 4

Machine 5

Switch 1

Machine 6

Machine 7

Machine 8

Machine 9

Machine 10

Switch 2

Machine 11

Machine 12

Machine 13

Machine 14

Machine 15

Switch 3

CPU HDD NET

Operating System

Virtual Network

Virtual Machines

AUTH Web Server

CPU HDD NET

Operating System

Virtual Network

Virtual Machines

DBMS

RPC

4 / 27

... But They Are Complex

Machine 1

Machine 2

Machine 3

Machine 4

Machine 5

Switch 1

Machine 6

Machine 7

Machine 8

Machine 9

Machine 10

Switch 2

Machine 11

Machine 12

Machine 13

Machine 14

Machine 15

Switch 3

CPU HDD NET

Operating System

Virtual Network

Virtual Machines

AUTH Web Server

CPU HDD NET

Operating System

Virtual Network

Virtual Machines

DBMS

RPC

4 / 27

Machine 1

Machine 2

Machine 3

Machine 4

Machine 5

Switch 1

Machine 1

Machine 2

Machine 3

Machine 4

Machine 5

Switch 2

Machine 11

Machine 12

Machine 13

Machine 14

Machine 15

Switch 3

Operating System

Virtual Network

Virtual MachinesAUTH

Web Server

Operating System

Virtual Network

Virtual Machines

DBMS

RPC

Understanding the performance of a data center is difficult

many layers

limited scope of tools and people’s knowledge

Machine 1

Machine 2

Machine 3

Machine 4

Machine 5

Switch 1

Machine 1

Machine 2

Machine 3

Machine 4

Machine 5

Switch 2

Machine 11

Machine 12

Machine 13

Machine 14

Machine 15

Switch 3

Operating System

Virtual Network

Virtual MachinesAUTH

Web Server

Operating System

Virtual Network

Virtual Machines

DBMS

RPC

Understanding the performance of a data center is difficult

many layers

limited scope of tools and people’s knowledge

3 real-world questions by a data center operator

How much load increase can we support with the current

setup? Where will the bottleneck be? What would break

first?

How much would it help to move the database server to

faster hardware, or directly on the metal?

Can we understand and explain unexpected behaviors?

5 / 27

Goal: Creating a New Model

We want to build a dynamic performance model for data centers

comprehensive

live

interactive

6 / 27

Goal: Creating a New Model

We want to build a dynamic performance model for data centers

comprehensive

live

interactive

Logs

Query

Model

Goal: Creating a New Model

We want to build a dynamic performance model for data centers

comprehensive

live

interactive

Logs

Query

Model

How is it performing now?

Performance

Goal: Creating a New Model

We want to build a dynamic performance model for data centers

comprehensive

live

interactive

Logs

Query

Model

What if the we improve the hardware?

Predictive result

6 / 27

Goal: Creating a New Model

We want to build a dynamic performance model for data centers

comprehensive

live

interactive

Logs

Query

Statistical Model

6 / 27

Performance Annotations

7 / 27

Example: a Web Service

Request Response

System

The response time grows quadratically

with the size of the body of the request

Example: a Web Service

Request Response

System

The response time grows quadratically

with the size of the body of the request

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 20 40 60 80 100

tim

e (

ms)

body size (MB)

Example: a Web Service

Request Response

System

The response time grows quadratically

with the size of the body of the request

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 20 40 60 80 100

tim

e (

ms)

body size (MB)@Time ∼ a∗ |in|2 +b ∗ |in|+c

Example: a Web Service

Request Response

System

The memory used grows linearly

with the size of the body of the request

10

20

30

40

50

60

70

80

90

0 20 40 60 80 100

mem

ory

usage (

kB

)

body size (kB)@Mem ∼ a∗ |in|+b

8 / 27

Example: a Web Service

Request Response

System

Machine 1 Machine 7

Example: a Web Service

Request Response

System

Machine 1 Machine 7VM PostgreSQL

Example: a Web Service

Request Response

System

Machine 1 Machine 7VM PostgreSQL

Apache WS

Example: a Web Service

Request Response

System

Machine 1 Machine 7VM PostgreSQL

Apache WSOwnCloud

Example: a Web Service

Request Response

System

Machine 1 Machine 7VM PostgreSQL

Apache WSOwnCloud

getFile(string f)

normalizePath(string p)

parseQuery(string q)

execute(query q)

8 / 27

Example: a Web Service

Request Response

System

Machine 1 Machine 7VM PostgreSQL

Apache WSOwnCloud

getFile(string f)

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 20 40 60 80 100

tim

e (

ms)

body size (MB)

normalizePath(string p)

10

20

30

40

50

60

70

80

90

0 20 40 60 80 100

me

mo

ry u

sa

ge

(kB

)

body size (kB)

parseQuery(string q)

0

1

2

3

4

5

10 20 30 40 50 60 70 80 90 100

Me

mo

ry (

kb

yte

s)

String length

execute(query q)

0

10

20

30

40

50

60

70

80

90

0 20 40 60 80 100

tim

e (

ms)

query size

8 / 27

3

4

5

6

7

8

9

0 10 20 30 40 50 60 70 80

Me

mo

ry (

kb

yte

s)

String length

9 / 27

3

4

5

6

7

8

9

0 10 20 30 40 50 60 70 80

Me

mo

ry (

kb

yte

s)

String length

@Mem∼Constant(8kB)

9 / 27

3

4

5

6

7

8

9

0 10 20 30 40 50 60 70 80

Me

mo

ry (

kb

yte

s)

String length

@Mem∼Constant(8kB) 0

5

10

15

20

25

30

35

40

0 2 4 6 8 10 12 14

Me

mo

ry (

kb

yte

s)

Path length

9 / 27

3

4

5

6

7

8

9

0 10 20 30 40 50 60 70 80

Me

mo

ry (

kb

yte

s)

String length

@Mem∼Constant(8kB) 0

5

10

15

20

25

30

35

40

0 2 4 6 8 10 12 14

Me

mo

ry (

kb

yte

s)

Path length

Line Fit

9 / 27

3

4

5

6

7

8

9

0 10 20 30 40 50 60 70 80

Me

mo

ry (

kb

yte

s)

String length

@Mem∼Constant(8kB) 0

5

10

15

20

25

30

35

40

0 2 4 6 8 10 12 14

Me

mo

ry (

kb

yte

s)

Path length

Line Fit

@Mem∼Norm(5ms*path_len(path), 5ms)

9 / 27

3

4

5

6

7

8

9

0 10 20 30 40 50 60 70 80

Me

mo

ry (

kb

yte

s)

String length

@Mem∼Constant(8kB) 0

5

10

15

20

25

30

35

40

0 2 4 6 8 10 12 14

Me

mo

ry (

kb

yte

s)

Path length

Line Fit

@Mem∼Norm(5ms*path_len(path), 5ms)

0

1

2

3

4

5

10 20 30 40 50 60 70 80 90 100

Me

mo

ry (

kb

yte

s)

String length 9 / 27

3

4

5

6

7

8

9

0 10 20 30 40 50 60 70 80

Me

mo

ry (

kb

yte

s)

String length

@Mem∼Constant(8kB) 0

5

10

15

20

25

30

35

40

0 2 4 6 8 10 12 14

Me

mo

ry (

kb

yte

s)

Path length

Line Fit

@Mem∼Norm(5ms*path_len(path), 5ms)

0

1

2

3

4

5

10 20 30 40 50 60 70 80 90 100

Me

mo

ry (

kb

yte

s)

String length

@Mem∼Constant(4.2kB)

@Mem∼Constant(0.5kB)

9 / 27

3

4

5

6

7

8

9

0 10 20 30 40 50 60 70 80

Me

mo

ry (

kb

yte

s)

String length

@Mem∼Constant(8kB) 0

5

10

15

20

25

30

35

40

0 2 4 6 8 10 12 14

Me

mo

ry (

kb

yte

s)

Path length

Line Fit

@Mem∼Norm(5ms*path_len(path), 5ms)

0

1

2

3

4

5

10 20 30 40 50 60 70 80 90 100

Me

mo

ry (

kb

yte

s)

String length

@Mem∼Constant(4.2kB)

@Mem∼Constant(0.5kB)

700

750

800

850

900

950

1000

10 20 30 40 50 60 70 80 90M

em

ory

(b

yte

s)

String length 9 / 27

3

4

5

6

7

8

9

0 10 20 30 40 50 60 70 80

Me

mo

ry (

kb

yte

s)

String length

@Mem∼Constant(8kB) 0

5

10

15

20

25

30

35

40

0 2 4 6 8 10 12 14

Me

mo

ry (

kb

yte

s)

Path length

Line Fit

@Mem∼Norm(5ms*path_len(path), 5ms)

0

1

2

3

4

5

10 20 30 40 50 60 70 80 90 100

Me

mo

ry (

kb

yte

s)

String length

@Mem∼Constant(4.2kB)

@Mem∼Constant(0.5kB)

700

750

800

850

900

950

1000

10 20 30 40 50 60 70 80 90M

em

ory

(b

yte

s)

String length

@Mem∼?

9 / 27

How To Create Annotations

10 / 27

Instrumentation

11 / 27

Instrumentation

For every call of all the functions in the system, we need:

metrics of interest◮ execution time◮ dynamic memory allocation◮ locks holding time

relevant features of the input parameters◮ string length◮ collection size

11 / 27

Instrumentation

For every call of all the functions in the system, we need:

metrics of interest◮ execution time◮ dynamic memory allocation◮ locks holding time

relevant features of the input parameters◮ string length◮ collection size

Also, the instrumentation must be:

crosslayer

crossplatform

11 / 27

Automatic Annotation Inference

Automatic Annotation Inference

Logs

Function: correctFolderSize feature: collection size

pcc: 0.5610

0

5

10

15

20

25

0 1 2 3 4 5 6 7 8 9

Mem

ory

(kbyte

s)

Collection size

12 / 27

Automatic Annotation Inference

Logs

Function: correctFolderSize feature: string length

pcc: 0.78673

0

5

10

15

20

25

0 20 40 60 80 100 120

Mem

ory

(kbyte

s)

String length

12 / 27

Automatic Annotation Inference

Logs

Function: correctFolderSize feature: path length

pcc: 0.9473

0

5

10

15

20

25

0 1 2 3 4 5 6 7

Mem

ory

(kbyte

s)

Path length

12 / 27

Automatic Annotation Inference

Logs

Function: correctFolderSize feature: path length

pcc: 0.9473

0

5

10

15

20

25

0 1 2 3 4 5 6 7

Mem

ory

(kbyte

s)

Path length

Line fit

regression

12 / 27

Automatic Annotation Inference

Logs

Function: broadcastEvent feature: collection size

pcc: -0.1155

0

0.5

1

1.5

2

2.5

3

3.5

4

0 1 2 3 4 5

Tim

e (

ms)

Collection size

12 / 27

Automatic Annotation Inference

Logs

Function: broadcastEvent feature: string length

pcc: -0.2764

0

0.5

1

1.5

2

2.5

3

3.5

4

8 10 12 14 16 18 20

Tim

e (

ms)

String length

12 / 27

Automatic Annotation Inference

Logs

Function: broadcastEvent no feature

0

0.5

1

1.5

2

2.5

3

3.5

4

-1 -0.5 0 0.5 1

Tim

e (

ms)

Scalar value

clusters

12 / 27

Annotations Uses

13 / 27

Uses

14 / 27

Uses

Documentation◮ how do functions behave?

14 / 27

Uses

Documentation◮ how do functions behave?

Anomaly/failure detection◮ is the system behaving normally? Is there a performance regression?

14 / 27

Uses

Documentation◮ how do functions behave?

Anomaly/failure detection◮ is the system behaving normally? Is there a performance regression?

Extrapolation◮ can we scale up?

14 / 27

Uses

Documentation◮ how do functions behave?

Anomaly/failure detection◮ is the system behaving normally? Is there a performance regression?

Extrapolation◮ can we scale up?

Composition◮ can we infer the behavior of the caller from the annotations of the callees?

14 / 27

Case Study

15 / 27

Case Study

setup

15 / 27

The System: Application Level

Client

VM

HAProxy

VMnginx

OwnCloudVM

PostgreSQL

VM

NFS

WEBDAV WEBDAV

TCP

TCP

16 / 27

The System: Computing Resources

ETH

Host01 Host02

Host03 Host04 Host05 Host06 Host07

Host08 Host09 Host10 Host11 Host12

17 / 27

The System: Computing Resources

ETH + VLAN

Host01

Test

Host02

Openstack

Neutron

Host03 Host04 Host05 Host06 Host07

Host08 Host09 Host10 Host11 Host12

18 / 27

The System: Computing Resources

ETH + VLAN

STORAGE NODE

Host01

Test

Host02

Openstack

Neutron

Host03 Host04 Host05 Host06 Host07

Host08

Ceph OSD

Host09

Ceph OSD

Host10

Ceph OSD

Host11

Ceph OSD

Host12

Ceph OSD

18 / 27

The System: Computing Resources

ETH + VLAN

COMPUTE NODE

STORAGE NODE

Host01

Test

Host02

Openstack

Neutron

Host03

Ceph master

DBMS

Host04

Ceph master

NFS server

Host05

Ceph master

Web server

Host06

Sync server

DBMS

Host07

Load balancer

LDAP

Host08

Ceph OSD

Host09

Ceph OSD

Host10

Ceph OSD

Host11

Ceph OSD

Host12

Ceph OSD

18 / 27

PHP Instrumentation

OwnCloud (php)

function foo(){...}

19 / 27

PHP Instrumentation

OwnCloud (php)

function foo_inner() {...}

19 / 27

PHP Instrumentation

OwnCloud (php)

function foo_inner() {...}

foo_inner()

function foo() {

start = time()

end = time()

log(end - start)

}

19 / 27

Cross-Platform Instrumentation

OwnCloud (php)

function write_to_db() {

...

}

19 / 27

Cross-Platform Instrumentation

OwnCloud (php)

function write_to_db() {...

trace_id = rnd_string()...

}

19 / 27

Cross-Platform Instrumentation

OwnCloud (php)

function write_to_db() {...

trace_id = rnd_string()...

}

PostgreSQL (C)

function execute_query() {...

get(trace_id)...

}

19 / 27

Cross-Platform Instrumentation

OwnCloud (php)

function write_to_db() {...

trace_id = rnd_string()...

}

PostgreSQL (C)

function execute_query() {...

get(trace_id)...

}

Marshaller (C)

marshall() {...

put(trace_id)...

}

Unmarshaller (C)

unmarshall() {...

get(trace_id)...

}

19 / 27

Case Study

20 / 27

Case Study

annotations

20 / 27

It Works!

0

20

40

60

80

100

120

140

160

180

200

220

0 50 100 150 200 250 300

Me

mo

ry (

kb

yte

s)

Collection size

@Mem ∼ 721.362B ∗ |in|+8851.16B

21 / 27

It Works!

3

4

5

6

7

8

9

0 10 20 30 40 50 60 70 80

Me

mo

ry (

kb

yte

s)

String length

\OC\Files\View::getOwner()

@Mem ∼ 8.240kB

21 / 27

It Works!

0

0.1

0.2

0.3

0.4

0.5

0.6

0 50 100 150 200 250 300

Tim

e (

ms)

Collection size

\Sabre\DAV\Server::generateMultiStatus()

@Time ∼ 3.6e−03B ∗ |in|2 +3.8e−05B ∗ |in|+6.6e−06B

21 / 27

It Works!

0

100

200

300

400

500

600

0 20 40 60 80 100 120

Me

mo

ry (

byte

s)

String length

\OC\Files\Cache\Cache::normalize()

@Mem ∼ Norm(

411.9,15.35)

∨Norm(

435.3,24.41)

∨Norm(

448,0)

∨Norm(

459.6,19.48)

∨Norm(

477.6,23.18)

∨Norm(

502.0,18.58)

21 / 27

Case Study

22 / 27

Case Study

anomaly detection

22 / 27

Anomaly Injection

Client

VM

HAProxy

VMnginx

OwnCloudVM

PostgreSQL

VM

NFS

WEBDAV WEBDAV

TCP

TCP

23 / 27

Anomaly Injection

Client

VM

HAProxy

VMnginx

OwnCloudVM

PostgreSQL

VM

NFS

WEBDAV WEBDAV

TCP

TCP

Added Latency: 1-10ms23 / 27

Annotations Catch The Anomaly

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

%

Annotation robustness (%)

no delay

24 / 27

Annotations Catch The Anomaly

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

%

Annotation robustness (%)

no delay

1ms

24 / 27

Annotations Catch The Anomaly

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

%

Annotation robustness (%)

no delay

1ms

2ms

24 / 27

Annotations Catch The Anomaly

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

%

Annotation robustness (%)

no delay

1ms

2ms

5ms

24 / 27

Annotations Catch The Anomaly

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

%

Annotation robustness (%)

no delay

1ms

2ms

5ms

10ms

24 / 27

Future

25 / 27

Ongoing Work - Discussion

Machine learning techniques fine tuning

26 / 27

Ongoing Work - Discussion

Machine learning techniques fine tuning

Annotations composition◮ stack analysis

26 / 27

Ongoing Work - Discussion

Machine learning techniques fine tuning

Annotations composition◮ stack analysis

Workload creation◮ can we forge workloads that expose specific behaviors?

26 / 27

Ongoing Work - Discussion

Machine learning techniques fine tuning

Annotations composition◮ stack analysis

Workload creation◮ can we forge workloads that expose specific behaviors?

Feature Selection◮ Is a heuristically built set of basic features enough?◮ Can we exploit programmers’ knowledge of the system?

26 / 27

Ongoing Work - Discussion

Machine learning techniques fine tuning

Annotations composition◮ stack analysis

Workload creation◮ can we forge workloads that expose specific behaviors?

Feature Selection◮ Is a heuristically built set of basic features enough?◮ Can we exploit programmers’ knowledge of the system?

Extensive testing◮ Java, DaCapo benchmarks

26 / 27

Performance Annotations for Cloud Computing

Daniele Rogora∗ Steffen Smolka% Antonio Carzaniga∗ Amer Diwan$

Robert Soulé∗∼

presented by

Daniele Rogora

∗Università della Svizzera italiana %Cornell University $Google ∼Barefoot Networks

HotCloud 2017

27 / 27

top related