Top Banner
DieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta Diwaker Gupta Kashi V. Vishwanath Amin Vahdat University of California, San Diego
29

DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Jan 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

DieCast: Testing Distributed Systems with an Accurate Scale Model

Diwaker GuptaDiwaker Gupta

Kashi V. Vishwanath

Amin Vahdat

University of California, San Diego

Page 2: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

High performancefilesystemAlice

June 7, 2008 NSDI 2008 | DieCast 2

Limited testinginfrastructure

Diverse deploymentenvironments Use smaller

infrastructure to test a much larger system

Page 3: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Goals

• Fidelity– How closely can we replicate the target system?

• Reproducibility• Reproducibility– Can we do controlled experiments?

• Efficiency– Use fewer resources

June 7, 2008 NSDI 2008 | DieCast 3

DieCast can scale up a test infrastructure by an order of magnitude

Page 4: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

DieCast Overview

�Replicate target system using fewer machines

�Resource equivalence: perceived CPU capacity, disk and network characteristicscapacity, disk and network characteristics

�Preserve application performance

×Not scaled

×Physical memory: mitigating solutions

× Secondary storage: cheap

June 7, 2008 NSDI 2008 | DieCast 4

Page 5: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Original System

Applicationservers

June 7, 2008 NSDI 2008 | DieCast 5

Load balancer

Web servers

Databaseservers

Switches

• Fidelity

• Reproducibility

• Efficiency

Page 6: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Server Consolidation (VMs)

June 7, 2008 NSDI 2008 | DieCast 6

Network emulation

• Fidelity

• Reproducibility

• Efficiency

Page 7: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Multiplexing Leads to Resource Partitioning

3 GHz CPU, 1 Gbps N/W, 15 Mbps disk I/O, 2 GB RAM

June 7, 2008 NSDI 2008 | DieCast 7

Split equally among 5 VMs

~ 600 MHz CPU, 200 Mbps N/W, 3 Mbps disk I/O, 400 MB RAM each

Page 8: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Time Dilation [NSDI 2006]

Real time(No dilation)

Events

1 sec

10 Mb

• Slow down passage of time within the OS• CPU, network, disk – all appear faster

Key idea: time is also a resource!

June 7, 2008 NSDI 2008 | DieCast 8

Perceived bandwidth = 10 Mb/s

Dilated time

Events

100 msec

10 Mb

Perceived bandwidth = 100 Mb/s

faster• Experiments take longer

Time Dilation Factor (TDF) = Real time/Virtual time

In this example, TDF = 1sec/100ms = 10

Page 9: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Multiplexing Under Time Dilation

3 GHz CPU, 1 Gbps N/W, 15 Mbps disk I/O, 2 GB RAM

June 7, 2008 NSDI 2008 | DieCast 9

~ 600 MHz CPU, 200 Mbps N/W, 3 Mbps disk I/O, 400-MB RAM, each

~ 3 GHz CPU, 1 Gbps N/W, 15 Mbps disk I/O?, 400 MB RAM each

TDF 5

Page 10: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Time Dilation: External Interactions

Dilated TimeFrame

June 7, 2008 NSDI 2008 | DieCast 10

NetworkExternal systems running in the real time frame

Page 11: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Disk I/O Scaling

• Invariant: perceived disk characteristics are preserved

– Seek time

– Read/write throughput– Read/write throughput

• Issues

– Low level functionality in firmware

– Different I/O models

– Per request scaling is difficult

June 7, 2008 NSDI 2008 | DieCast 11

Page 12: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Implementation Details

• Supported platforms

– Xen 2.0.7, 3.0.4, 3.1

– Can be ported to non-virtualized systems

• Support for unmodified guest OSes• Support for unmodified guest OSes

• Disk I/O scaling for different I/O models

– Fully virtualized: integration with DiskSim

– Paravirtualized: scaling in device driver

June 7, 2008 NSDI 2008 | DieCast 12

Page 13: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Disk I/O Scaling: Fully Virtualized VMs

ioemu

disksim

VM(Unmodified OS)

VM(Unmodified OS)

Domain-0Domain-0

Guest OS unaware that no real disk

exists

Request completion time in

simulated disk

June 7, 2008 NSDI 2008 | DieCast 13

VM diskimage

ioemu

Disk device driver

XenXen

exists

Guest OS filesystem

I/O emulation

Page 14: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Disk I/O Scaling: Fully Virtualized VMs

ioemu

disksim

VM(Unmodified OS)

VM(Unmodified OS)

Domain-0Domain-0

Required perceivedtime: Tsim

⇒Total real timeT = TDF*TActual time to

Service time in simulated disk:

TsimDiskSim running

time: Tdisksim

June 7, 2008 NSDI 2008 | DieCast 14

VM diskimage

ioemu

Disk device driver

XenXen

Treal = TDF*TsimActual time to service: Tioemu

Delay: Delay

Treal = Tioemu + Delay + Tdisksim

⇒Delay = (TDF*Tsim) – Tdisksim – Tioemu

Page 15: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Network I/O Scaling

Real Configuration Perceived Configuration

Invariant: Perceived network characteristics (bandwidths and latencies) must be preserved

10 Mb/s, 20ms RTT

Real Configuration Perceived Configuration

Original system(TDF 1)

10 Mb/s, 20 ms 10 Mb/s, 20 ms

Time Dilation (TDF 5)

10 Mb/s, 20 ms 50 Mb/s, 4 ms

DieCast (TDF 5) 2 Mb/s, 100 ms 10 Mb/s, 20 ms

June 7, 2008 NSDI 2008 | DieCast 15

Network emulation: ModelNet, Dummynet

Page 16: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Recap

• Multiplex VMs for efficiency

• Time dilation to scale resources

• Disk I/O scaling

• Network I/O scaling• Network I/O scaling

At this point, the scaled system almost looks like original system!

June 7, 2008 NSDI 2008 | DieCast 16

• Fidelity

• Reproducibility

• Efficiency

Page 17: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Validation• How well does DieCast scaled performance match the original system?– Application specific metrics

• Can a smaller system be configured to match the resources of a larger system?the resources of a larger system?– Resource utilization profiles

• Applications: RUBiS, BitTorrent, Isaac

• RUBiS– eBay like e-Commerce service

– Ships with workload generator

June 7, 2008 NSDI 2008 | DieCast 17

Page 18: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

RUBiS: Topology

4 DB

8 WebServers

4 DB

8 WebServers

Wide

Area

Link

June 7, 2008 NSDI 2008 | DieCast 18

16 Workload Generators

WideAreaLink

Page 19: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Experimental SetupBaselineconfiguration:40 physicalmachines

DieCast scaledConfiguration:4 physical machines,10 VMs each

• Xen 3.1, fully virtualized VMs

• Debian Etch, Linux 2.6.17, 256 MB RAM

• DiskSim emulating Seagate ST3217

• Network emulation using ModelNetJune 7, 2008 NSDI 2008 | DieCast 19

Page 20: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

RUBiS: Throughput

June 7, 2008 NSDI 2008 | DieCast 20

Page 21: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

RUBiS: Response Time

June 7, 2008 NSDI 2008 | DieCast 21

Page 22: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

RUBiS: Resource UsageCPU

June 7, 2008 NSDI 2008 | DieCast 22

Memory Network

Page 23: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Validation Recap

• Evaluated

– RUBiS– BitTorrent

– Isaac

Many more details in the paper

– Isaac

• Demonstrated

– Match application specific metrics

– Preserve resource utilization profile

June 7, 2008 NSDI 2008 | DieCast 23

Page 24: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Case study: Panasas

• Panasas builds scalable storage systems for high performance computing

– http://www.panasas.com

• Caters to variety of clients• Caters to variety of clients

• Difficult or even impossible to replicate deployment environment of all clients

• Limited resources for testing

June 7, 2008 NSDI 2008 | DieCast 24

Page 25: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

DieCast in Panasas• Custom OS• Integrated hw/sw offering • Not runnable on Xen• Porting DieCast to non-virtualized environments

Clients

June 7, 2008 NSDI 2008 | DieCast 25

Clients run Linux, can be virtualized

Dummynet for network scalingStorage cluster

Page 26: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Panasas: Evaluation SummaryBaseline DieCast scaled:

1 PM, 10 VMs

• Validation– Two benchmarks from standard test suite: IOZone, MPI-IO; varying block sizes

– Match performance metrics

June 7, 2008 NSDI 2008 | DieCast 26

Scaling: Used 100 machines to scale to 1000 clients

Page 27: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Limitations

• Memory scaling

• Long running workloads

• Specialized hardware appliances

• Fine grained timing• Fine grained timing

June 7, 2008 NSDI 2008 | DieCast 27

Page 28: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Summary

• DieCast: scalable testing

– Fidelity, Reproducibility, Efficiency

• Contributions

– Support for unmodified operating systems– Support for unmodified operating systems

– Implement disk I/O scaling (DiskSim integration)

– CPU scheduler enhancements for time dilation

– Comprehensive evaluation, including a commercial storage system

June 7, 2008 NSDI 2008 | DieCast 28

Page 29: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV.

Thanks!

Questions?

[email protected]