Top Banner
Measuring and Benchmarking Personal Clouds Advisors: Dr. Pedro García López Dr. Marc Sanchez Artigas M. Sc. Thesis Presentation Cristian Cotes González
36
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Thesis presentation

Measuring and Benchmarking Personal Clouds

Advisors: Dr. Pedro García López Dr. Marc Sanchez Artigas

M. Sc. Thesis Presentation

Cristian Cotes González

Page 2: Thesis presentation

Contents

1. Introduction

2. Background and Related Work

3. Measuring Personal Cloud Services

4. Benchmarking Personal Clouds Synchronization

5. Conclusions and Future Work

Page 3: Thesis presentation

Introduction

Page 4: Thesis presentation

Introduction

● 3S Personal Cloud definition:

The Personal Cloud is a unified digital locker for our personal data offering three key services: Storage, Synchronization and Sharing.

● Well-known Personal Clouds: Dropbox, Box, Google Drive...

Page 5: Thesis presentation

Introduction

● Little is known about the architecture of commercial Personal Cloud solutions.

● Open source solutions don’t met all the Personal Cloud requirements.

● Solution: We developed StackSync, an open source Personal Cloud.

Page 6: Thesis presentation

Motivation

● Very little is known about the Quality of Service (QoS) of Personal Clouds.

● After two years developing StackSync, we wanted to compare it with private and commercial solutions to understand how it performs.

● To make this comparison we needed:

○ Simulate user behavior.

○ Use a benchmarking framework for Personal Clouds.

Page 7: Thesis presentation

Contributions

● Analysis of the state-of-the-art of Personal Clouds.

● Measurement of Personal Cloud services (QoS).

● Improvement of an existing benchmarking framework.

● Generate realistic traces to simulate user behavior.

● Benchmarking of Personal Clouds synchronization protocol.

Page 8: Thesis presentation

Publications

● Raúl Gracia Tinedo, Marc Sánchez Artigas, Adrián Moreno Martínez, Cristian Cotes and Pedro García López. "Actively Measuring Personal Cloud Storage". In the 6th IEEE International Conference on Cloud Computing. 2013, Santa Clara Marriott, CA, USA.

● Pedro García López, Marc Sánchez Artigas, Sergi Toda and Cristian Cotes. "StackSync: Bringing Elasticity to Dropbox-like File Synchronization". ( Submitted to the 15th International Middleware Conference. December, 2014, Bordeaux, France).

Page 9: Thesis presentation

Background and Related Work

Page 10: Thesis presentation

Open source Personal Clouds

ownCloud● Uses a pull strategy to synchronize files.

● WebDAV protocol to discover new changes.

SparkleShare● Built on top of Git.

● Push notifications.

● Not prepared to process large binary files.

Syncany● Discover changes pulling the server.

● Metadata stored in files.

Page 11: Thesis presentation

StackSync

None of the current open source solutions fits well in a Personal Cloud definition. For this reason we developed StackSync.

StackSync is an open source Personal Cloud that synchronizes, stores and shares files.

Page 12: Thesis presentation

StackSync Architecture

● StackSync can be divided into four main blocks:

○ Clients: Synchronize files data and metadata (file size, filename...)

○ Sync service: Receives and process clients metadata. Also, notify them new changes.

○ Storage backend: Stores data files.

○ Communication middleware: Used to exchange metadata between clients and the sync service.

Page 13: Thesis presentation

Related Work

● Measurements and Benchmarks

○ Performance evaluation of Cloud services is a current hot topic.

○ Few works have turned attention to measure the performance of Cloud storage services.

● Synchronization Algorithms

○ Little is known about the design and implementation of commercial sync protocols.

○ Recent works from Idilio Drago characterize Dropbox: Inside Dropbox: Understanding Personal Cloud Storage Services.

Page 14: Thesis presentation

Measuring Personal Clouds Services

Page 15: Thesis presentation

Methodology and Platform

● Measure performance of: Dropbox, Box and SugarSync.

● Based on REST API.

● Two different platforms:○ University laboratories: 30 machines.

○ PlanetLab: 40 nodes divided into two geographic regions (Western Europe and North America)

Page 16: Thesis presentation

Workload Model

● Up/Down Workload:○ Objective: Measure up/down transfer speed.○ Upload files until the account is full.○ If the account is full: download and delete all files.

● Service Variability Workload:○ Objective: Maintain every node with a continuous transfer flow

to analyze the variability of the service over time.○ Each node had two threads:

■ Upload thread: Upload files continuously and delete some files when the account is full.

■ Download thread: Download files continuously.

Page 17: Thesis presentation

Transfer Speed: Download

● Dropbox and Box present a download speed faster than SugarSync.

● Dropbox exhibits the best download speed.

● SugarSync download transfer speed is constant and low.

● Small range of download bandwidth ([200,1300] KB/sec)

Page 18: Thesis presentation

Transfer Speed: Upload

● As in download, Dropbox and Box present an upload speed faster than SugarSync.

● Distributions present irregular shapes.

● Box presents the fastest upload.

● Upload transfer capacity better than download capacity due to pricing policies of Cloud providers (inbound traffic is free while outbound traffic is not).

Page 19: Thesis presentation

Transfer & Geographic location

● Results obtained during 3 weeks executing the up/down workload in PlanetLab.

● Better QoS in North America than in European countries due to datacenters location.

Page 20: Thesis presentation

Variability over time: SugarSync and Box

● Results obtained from the Service Variability workload.

● Box exhibits a stable service for downloads but upload transfer speed varies significantly.

● SugarSync exhibits a stable service for uploads and downloads.

● Downloads are more reliable and predictable.

Page 21: Thesis presentation

Variability over time: Dropbox

● Dropbox exhibits daily upload speed patterns.

● Upload transfer speed during nights is between 15% to 35% higher than during diurnal hours.

Page 22: Thesis presentation

Benchmarking Personal Clouds Synchronization

Page 23: Thesis presentation

Traces

● As there were no public traces containing files and the history of modifications we developed a trace generator.

● File size: We use the distribution presented in the article Understanding data characteristics and access patterns in a cloud storage system.

● 90% of the files are smaller than 4MB.

● To imitate real behavior of users, we create three different actions:○ ADD: File creation.○ UPDATE: File modification.○ REMOVE: File removal.

Page 24: Thesis presentation

Traces

● To determine the action to be performed, we applied the Markov Model proposed in Generating realistic datasets for deduplication analysis.

● We use the probabilities from the “Homes” dataset proposed in the same article.

● To modify a file, the tool supports 3 modification types:○ B: Beginning of the file○ E: End of the file○ M: Middle of the file

● Only files smaller than 4MB are modified.

Page 25: Thesis presentation

Traces

● The trace used for these experiments contains:○ 940 ADDs that generate a total data of 535 MB.

○ 72 UPDATEs

○ 228 REMOVEs

● The average file size is 583 KB.

Page 26: Thesis presentation

Benchmarking Framework

● Proposed by Drago et al. in the article “Benchmarking Personal Cloud Storage”

● As the initial tool was too simple, we implemented new functionalities to capture traffic while executing the generated trace.

● The test measures the overhead of the different file syncing protocols.

Page 27: Thesis presentation

Protocol Overhead

● In this test we compared the protocol overhead of StackSync with other commercial services.

● StackSync has a low overhead compared with Dropbox or Google Drive, which are the services with more overhead.

● Dropbox exhibits the highest overhead, sending up to 150 MB of additional traffic.

Page 28: Thesis presentation

StackSync vs Dropbox

● For a deeper understanding of the overhead, we run other experiments only for Dropbox and StackSync.

● This test grouped all the actions of the same type to generate 3 separate traces.

● In this image is depicted the overhead ratio generated by the storage traffic.

● For ADDs, StackSync transferred a total of 565 MB while Dropbox needed 660 MB.

● For UPDATEs, StackSync is negatively affected by static chunking mechanisms.

Page 29: Thesis presentation

StackSync vs Dropbox

● In this image is depicted the amount of MBytes generated by the control traffic.

● Dropbox produces a huge amount of control traffic when adding new files: 25 MB

● StackSync only needs 3.2 MB to add all the files.

● In UPDATEs and REMOVEs actions, Dropbox exhibits higher amounts of traffic than StackSync.

Page 30: Thesis presentation

StackSync vs ownCloud

● Unlike StackSync, ownCloud uses a pull-based synchronization protocol.

● In this test, we used 2 PCs:○ Uploader: Execute the trace.○ Downloader: Synchronize files

uploaded by the uploader.

● StackSync (Push):○ Uploader: 20 KB/min○ Downloader: 10 KB/min

● ownCloud (Pull):○ Uploader: 600-800 KB/min○ Downloader: 100-300 KB/min

Page 31: Thesis presentation

StackSync: Synchronization time

● We analyzed deeply synchronization time for StackSync.

● ADD and REMOVE actions follow a normal distribution.

● UPDATE actions has a median of 2.75 seconds, but most of the times are higher due to the static chunking.

● Files > 2 MB: Sync time increases linearly.

● Files < 2 MB: Sync time is constant due to processing time of the synchronization server.

Page 32: Thesis presentation

Conclusions andFuture Work

Page 33: Thesis presentation

Conclusions

In this Thesis, we have examined central aspects of Personal Cloud storages services to characterize their performance in two different ways:

○ Data transfers○ Synchronization protocols

Data transfers

● Transfer performance of commercial Personal Clouds varies from one provider to another.

● The variability of transfers depends on:○ Traffic type: Upload or download.○ Hour of the day

Page 34: Thesis presentation

Conclusions

Synchronization Protocols

● Personal Clouds generate overhead depending on their synchronization features and mechanisms (chunking, delta encoding, pull or push synchronization...)

● StackSync implements an efficient synchronization protocol.

Page 35: Thesis presentation

Future Work

● CPU and RAM monitorization for the benchmarking tool. This will provide information about the computation power needed by the desktop clients to process user actions.

● Generate realistic files. Now the benchmark synchronizes binary random files.

● Improvements in the StackSync desktop client. Try to reduce overhead when updating files using advanced synchronization mechanisms.

Page 36: Thesis presentation

Measuring and Benchmarking Personal Clouds

Advisors: Dr. Pedro García López Dr. Marc Sanchez Artigas

M. Sc. Thesis Presentation

Cristian Cotes González