Cloud Services, Networking, and Management

“9780471697558pre” — 2015/3/16 — 17:23 — page vi — #6

“9780471697558pre” — 2015/3/16 — 17:23 — page i — #1

CLOUD SERVICES,NETWORKING, AND

MANAGEMENT

“9780471697558pre” — 2015/3/16 — 17:23 — page ii — #2

IEEE Press445 Hoes Lane

Piscataway, NJ 08854

IEEE Press Editorial BoardTariq Samad, Editor in Chief

George W. Arnold Vladimir Lumelsky Linda ShaferDmitry Goldgof Pui-In Mak Zidong WangEkram Hossain Jeffrey Nanzer MengChu ZhouMary Lanzerotti Ray Perez George Zobrist

Kenneth Moore, Director of IEEE Book and Information Services (BIS)

“9780471697558pre” — 2015/3/16 — 17:23 — page iii — #3

CLOUD SERVICES,NETWORKING, AND

MANAGEMENT

Edited by

Nelson L. S. da Fonseca

Raouf Boutaba

“9780471697558pre” — 2015/3/16 — 17:23 — page iv — #4

Copyright © 2015 by The Institute of Electrical and Electronics Engineers, Inc.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey. All rights reservedPublished simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or byany means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permittedunder Section 107 or 108 of the 1976 United States Copyright Act, without either the prior writtenpermission of the Publisher, or authorization through payment of the appropriate per-copy fee to theCopyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978)750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should beaddressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030,(201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts inpreparing this book, they make no representations or warranties with respect to the accuracy or completenessof the contents of this book and specifically disclaim any implied warranties of merchantability or fitness fora particular purpose. No warranty may be created or extended by sales representatives or written salesmaterials. The advice and strategies contained herein may not be suitable for your situation. You shouldconsult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss ofprofit or any other commercial damages, including but not limited to special, incidental, consequential, orother damages.

For general information on our other products and services or for technical support, please contact ourCustomer Care Department within the United States at (800) 762-2974, outside the United States at (317)572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may notbe available in electronic formats. For more information about Wiley products, visit our web site atwww.wiley.com.

Library of Congress Cataloging-in-Publication Data

Fonseca, Nelson L. S. da.Cloud services, networking, and management / Nelson L. S. da Fonseca, Raouf Boutaba.

pages cmISBN 978-1-118-84594-3 (cloth)

1. Cloud computing. I. Boutaba, Raouf. II. Title.QA76.585.F66 2015004.67′82–dc23

2014037179

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

“9780471697558pre” — 2015/3/16 — 17:23 — page v — #5

For our families

“9780471697558pre” — 2015/3/16 — 17:23 — page vi — #6

“9780471697558pre” — 2015/3/11 — 16:22 — page vii — #7

CONTENTS

Preface xiiiContributors xvii

PART I BASIC CONCEPTS AND ENABLING TECHNOLOGIES 1

1 CLOUD ARCHITECTURES, NETWORKS, SERVICES,AND MANAGEMENT 31.1 Introduction 3

1.2 Part I: Introduction to Cloud Computing 4

1.3 Part II: Research Challenges—The Chapters in This Book 14

1.4 Conclusion 21

References 21

2 VIRTUALIZATION IN THE CLOUD 232.1 The Need for Virtualization Management in the Cloud 23

2.2 Basic Concepts 25

2.3 Virtualized Elements 26

2.4 Virtualization Operations 29

2.5 Interfaces for Virtualization Management 30

2.6 Tools and Systems 34

2.7 Challenges 40

References 44

3 VIRTUAL MACHINE MIGRATION 493.1 Introduction 49

3.2 VM Migration 51

3.3 Virtual Network Migration without Packet Loss 59

3.4 Security of Virtual Environments 61

3.5 Future Directions 66

3.6 Conclusion 68

References 68

vii

“9780471697558pre” — 2015/3/11 — 16:22 — page viii — #8

viii CONTENTS

PART II CLOUD NETWORKING AND COMMUNICATIONS 73

4 DATACENTER NETWORKS AND RELEVANT STANDARDS 754.1 Overview 75

4.2 Topologies 76

4.3 Network Expansion 82

4.4 Traffic 85

4.5 Routing 89

4.6 Addressing 93

4.7 Research Challenges 96

4.8 Summary 98

References 99

5 INTER-DATA-CENTER NETWORKS WITH MINIMUMOPERATIONAL COSTS 1055.1 Introduction 105

5.2 Inter-Data-Center Network Virtualization 108

5.3 IDC Network Design with Minimum Electric Bills 115

5.4 Inter-Data-Center Network Design with Minimum

Downtime Penalties 120

5.5 Overcoming Energy versus Resilience Trade-Off 123

5.6 Summary and Discussions 124

References 126

6 OPENFLOW AND SDN FOR CLOUDS 1296.1 Introduction 129

6.2 SDN, Cloud Computing, and Virtualization Challenges 130

6.3 Software-Defined Networking 132

6.4 Overview of Cloud Computing and OpenStack 138

6.5 SDN for Cloud Computing 142

6.6 Combining OpenFlow and OpenStack with OpenDaylight 145

6.7 Software-Defined Infrastructures 149

6.8 Research Trends and Challenges 150

6.9 Concluding Remarks 151

References 151

7 MOBILE CLOUD COMPUTING 1537.1 Introduction 153

7.2 Mobile Cloud Computing 155

7.3 Risks in MCC 163

“9780471697558pre” — 2015/3/11 — 16:22 — page ix — #9

CONTENTS ix

7.4 Risk Management for MCC 177

7.5 Conclusions 184

References 186

PART III CLOUD MANAGEMENT 191

8 ENERGY CONSUMPTION OPTIMIZATION IN CLOUDDATA CENTERS 1938.1 Introduction 193

8.2 Energy Consumption in Data Centers: Components and Models 195

8.3 Energy Efficient System-Level Optimization of Data Centers 198

8.4 Conclusions and Open Challenges 210

References 211

9 PERFORMANCE MANAGEMENT AND MONITORING 2179.1 Introduction 217

9.2 Background Concepts 219

9.3 Related Work 221

9.4 X-Cloud Application Management Platform 222

9.5 Implementation 229

9.6 Experiments and a Case Study 232

9.7 Challenges in Management on Heterogeneous Clouds 238

9.8 Conclusion 239

References 240

10 RESOURCE MANAGEMENT AND SCHEDULING 24310.1 Introduction 243

10.2 Basic Concepts 244

10.3 Applications 248

10.4 Problem Definition 249

10.5 Resource Management and Scheduling in Clouds 254

10.6 Challenges and Perspectives 262

10.7 Conclusion 264

References 264

11 CLOUD SECURITY 26911.1 Introduction 270

11.2 Technical Background 273

“9780471697558pre” — 2015/3/11 — 16:22 — page x — #10

x CONTENTS

11.3 Existing Solutions 274

11.4 Transforming to the New IDPS Cloud Security Solutions 278

11.5 FlowIPS: Design and Implementation 279

11.6 FlowIPS vs Snort/Iptables IPS 282

11.7 Network Reconfiguration 284

11.8 Performance Comparison 288

11.9 Open Issues and Future Work 290

11.10 Conclusion 291

References 291

12 SURVIVABILITY AND FAULT TOLERANCE IN THE CLOUD 29512.1 Introduction 295

12.2 Background 296

12.3 Failure Characterization in Cloud Environments 298

12.4 Availability-Aware Resource Allocation Schemes 299

12.5 Conclusion 307

References 307

PART IV CLOUD APPLICATIONS AND SERVICES 309

13 SCIENTIFIC APPLICATIONS ON CLOUDS 31113.1 Introduction 311

13.2 Background Information 313

13.3 Related Work 313

13.4 IWIR Workflow Model 314

13.5 Amazon SWF Background 315

13.6 RainCloud Workflow 317

13.7 IWIR-to-SWF Conversion 319

13.8 Experiments 324

13.9 Open Challenges 328

13.10 Conclusion 329

References 330

14 INTERACTIVE MULTIMEDIA APPLICATIONS ON CLOUDS 33314.1 Introduction 333

14.2 Delivery Models for Interactive Multimedia Services 335

14.3 Cloud Gaming 339

14.4 UGC Live Streaming 345

14.5 Time-Shifting Video Streaming 351

“9780471697558pre” — 2015/3/11 — 16:22 — page xi — #11

CONTENTS xi

14.6 Open Challenges 353

14.7 Conclusion 354

References 355

15 BIG DATA ON CLOUDS (BDOC) 36115.1 Introduction 361

15.2 Historical Perspective and State of the Art 362

15.3 Clouds—Supply and Demand of Big Data 364

15.4 Emerging Business Applications 365

15.5 Cloud and Service Availability 368

15.6 BDOC Security Issues 372

15.7 BDOC Legal Issues 379

15.8 Enabling Future Success—Stem Cultivation and Outreach 384

15.9 Open Challenges and Future Directions 385

15.10 Conclusions 388

References 388

Index 393

“9780471697558pre” — 2015/3/11 — 16:22 — page xii — #12

“9780471697558pre” — 2015/3/13 — 6:44 — page xiii — #13

PREFACE

With the wide availability of high-bandwidth, low-latency network connectivity the Inter-net has enabled the delivery of rich services such as social networking, content delivery,and e-commerce at unprecedented scales. This technological trend has led to the devel-opment of cloud computing, a paradigm that harnesses the massive capacities of datacenters to support the delivery of online services in a cost-effective manner. The NationalInstitute of Standards and Technology (NIST) provided a relatively complete and widelyaccepted definition of cloud computing as follows: “cloud computing is a model forenabling ubiquitous, convenient, on-demand network access to a shared pool of config-urable computing resources (e.g., networks, servers, storage, applications, and services)that can be rapidly provisioned and released with minimal management effort or serviceprovider interaction.” NIST further defined five essential characteristics as follows: (1)on-demand self-service, which states that a consumer can acquire resources based on ser-vice demand; (2) broad network access, which states that cloud services can be accessedremotely from heterogeneous client platforms (e.g., mobile phones); (3) resource pool-ing, where resources are pooled and shared by consumers in a multitenant fashion; (4)rapid elasticity, which states that cloud resources can be rapidly provisioned and releasedwith minimal human involvement; (5) measured service, which states that resources arecontrolled (and possibly priced) by leveraging a metering capability (e.g., pay per use)that is appropriate to the type of the service.

These characteristics provide a relatively accurate picture of how cloud computingsystems should look like. Furthermore, in a cloud computing environment, the tradi-tional role of service providers is divided into two: cloud providers who own the physicaldata centers and lease resources (e.g., virtual machines) to service providers; and ser-vice providers who use resources leased from cloud providers to execute applications.By leveraging the economies of scale of data centers, cloud computing can provide sig-nificant reduction in operational expenditure. At the same time, it also supports newapplications such as big data analytics (e.g., MapReduce) that process massive volumesof data in a scalable and efficient fashion. The rise of cloud computing has made aprofound impact on the development of the IT industry in recent years. While large com-panies like Google, Amazon, Facebook, and Microsoft have developed their own cloudplatforms and technologies, many small companies are also embracing cloud computingby leveraging open-source software and deploying services in public clouds.

This wide adoption of cloud computing is largely driven by successful deploymentof a number of enabling technologies currently subject to extensive research including

xiii

“9780471697558pre” — 2015/3/13 — 6:44 — page xiv — #14

xiv PREFACE

data center virtualization, cloud networking, data storage and management, MapReduceprogramming model, resource management, energy management, security, and privacy.

Data Center Virtualization—One of the main characteristics of cloud computing isthat the infrastructure (e.g., data centers) is often shared by multiple tenants (e.g., serviceproviders) running applications with different resource requirements and performanceobjectives. Hence, there is an emerging trend toward virtualizing physical infrastruc-tures, that is virtualizing not only servers but also data center networks. Similar to servervirtualization, network virtualization aims at creating multiple virtual networks on top ofa shared physical network, allowing each tenant to implement and manage his virtual net-work independently from the others. This raises the question regarding how virtualizeddata center resources should be allocated and managed by each tenant.

Cloud Networking—to ensure predictable performance over the cloud, it is of utmostimportance to design efficient networks that are able to provide guaranteed performanceand to scale with the ever-growing traffic volumes in the cloud. Therefore, extensiveresearch work is needed on designing new data center network architectures that enhanceperformance, fault tolerance, and scalability. Furthermore, the advent of software-definednetworking (SDN) technology brings new opportunities to redesign cloud networks.Thanks to the programmability offered by this technology it is now possible to dynam-ically adapt the configuration of the network based on the workload in order to achievepotential cloud providers’ objectives in terms of performance, utilization, survivability,and energy efficiency

Data Storage and Management—As mentioned previously one of the key drivingforces for cloud computing is the need to process large volumes of data in a scalableand efficient manner. As cloud data centers typically consist of commodity servers withlimited storage and processing capacities, it is necessary to develop distributed storagesystems that support efficient retrieval of desired data. At the same time, as failures arecommon in commodity machine-based data centers, the distributed storage system mustalso be resilient to failures. This usually implies each file block must be replicated onmultiple machines. This raises challenges regarding how the distributed storage systemshould be designed to achieve availability and high performance, while ensuring filereplicas remain consistent over time.

MapReduce Programming Model—Cloud computing has become the mostcost-effective technology for hosting Internet-scale applications. Companies like Googleand Facebook generate enormous volumes of data on a daily basis that need to be pro-cessed in a timely manner. To meet this requirement, cloud providers use computationalmodels such as MapReduce. However, despite its success, the adoption of MapReducehas implications on the management of cloud workload and cluster resources, whichis still largely unstudied. In particular, many challenges pertaining to MapReduce jobscheduling, task and data placement, resource allocation, and sharing require furtherexploration.

Resource Management—Resource management has always been a central theme ofcloud computing. Given the large variety of applications running in the cloud, it is a chal-lenging problem to determine how each application should be scheduled and managed ina scalable and dynamic manner. The scheduling of individual application component canbe formulated as a variant of the multidimensional vector bin-packing problem, which

“9780471697558pre” — 2015/3/13 — 6:44 — page xv — #15

PREFACE xv

is NP-hard in the general case. Furthermore, different applications may have differentscheduling needs. Therefore, finding a scheduling scheme that satisfy diverse applicationscheduling requirement is a challenging problem.

Energy Management—Data centers consume tremendous amount of energy not onlyfor powering up the servers and network devices but also for cooling down these com-ponents to prevent overheating conditions. It has been reported that energy cost accountsfor 15% of the average data center operation expenditure. At the same time, such largeenergy consumption also raises environmental concerns regarding the carbon emissionsfor energy generation. As a result, improving data center energy efficiency has becomea primary challenge for today’s data center operators.

Security and Privacy—Security is another major concern of cloud computing. Whilesecurity is not a critical concern in many private clouds, it is often a key barrier to theadoption of cloud computing in public clouds. Specifically, since service providers typ-ically do not have access to the physical security system of data centers, they mustrely on cloud providers to achieve full data security. The cloud provider, in this con-text, must provide solutions to achieve the following objectives: (1) confidentiality forsecure data access and transfer and (2) auditability for attesting whether security settingof applications has been tampered or not.

Despite the wide adoption of cloud computing in the industry the current cloud tech-nologies are still far from unleashing their full potential. In fact, cloud computing wasknown as a buzzword for several years and many IT companies were uncertain about howto make successful investment in cloud computing. With the recent adoption in indus-try and academia, cloud computing is evolving rapidly with advancements in almost allaspects, ranging from data center architectural design, scheduling and resource manage-ment, server and network virtualization, data storage, programming frameworks, energymanagement, pricing, and service connectivity to security and privacy

The goal of this book is to provide a general introduction to cloud services, network-ing, and management. We first provide an overview of cloud computing, describing itskey driving forces, characteristics, and enabling technologies. Then we focus on the dif-ferent characteristics of cloud computing systems and key research challenges that arecovered in the subsequent fourteen chapters of this book. Specifically, the chapters delveinto several topics related to cloud services, networking, and management includingvirtualization and SDN technologies, intra- and interdata center network architectures,resource, performance and energy management in the cloud, survivability, fault toleranceand security mobile cloud computing, and cloud applications notably big data, scientific,and multimedia applications. We hope that the readers find this journey through CloudServices, Networking, and Management inspirational and informative.

Nelson L. S. da FonsecaRaouf Boutaba

“9780471697558pre” — 2015/3/13 — 6:44 — page xvi — #16

“9780471697558pre” — 2015/3/11 — 16:22 — page xvii — #17

CONTRIBUTORS

Hadi Bannazadeh, Department of Electrical and Computer Engineering, University ofToronto, Toronto, Ontario, Canada

Marinho P. Barcellos, Institute of Informatics, Federal University of Rio Grande do Sul,Porto Alegre, Brazil

Joseph Betser, The Aerospace Corporation, El Segundo, CA, USA

Luiz F. Bittencourt, Institute of Computing, State University of Campinas, Campinas,São Paulo, Brazil

Raouf Boutaba, D.R. Cheriton School of Computer Science, University of Waterloo,Waterloo, Ontario, Canada

Pascal Bouvry, Faculty of Science, Technology and Communications, University ofLuxembourg, Luxembourg City, Luxembourg

Otto Carlos M. B. Duarte, Grupo de Teleinformática e Automação (GTA/UFRJ),PEE/COPPE - DEL/Poli, Universidade Federal do Rio de Janeiro, Rio de Janeiro,Brazil

Rafael Pereira Esteves, Institute of Informatics, Federal University of Rio Grande doSul, Porto Alegre, Brazil

Thomas Fahringer, Institute for Computer Science, University of Innsbruck, Innsbruck,Austria

Lyno Henrique G. Ferraz, Grupo de Teleinformática e Automação (GTA/UFRJ),PEE/COPPE - DEL/Poli, Universidade Federal do Rio de Janeiro, Rio de Janeiro,Brazil

Nelson L. S. da Fonseca, Institute of Computing, State University of Campinas,Campinas, São Paulo, Brazil

Luciano P. Gaspary, Institute of Informatics, Federal University of Rio Grande do Sul,Porto Alegre, Brazil

Fabrizio Granelli, Department of Information Engineering and Computer Science,University of Trento, Trento, Trentino, Italy

Lisandro Zambenedetti Granville, Institute of Informatics, Federal University of RioGrande do Sul, Porto Alegre, Brazil

Simon Gwendal, Telecom Bretagne, Institut Mines-Telecom, Paris, France

Myron Hecht, The Aerospace Corporation, El Segundo, CA, USA

xvii

“9780471697558pre” — 2015/3/11 — 16:22 — page xviii — #18

xviii CONTRIBUTORS

Dijiang Huang, School of Information Technology and Engineering, Arizona StateUniversity, Tempe, AZ, USA

Matthias Janetschek, Institute for Computer Science, University of Innsbruck,Innsbruck, Austria

B. Kantarci, Department of Electrical and Computer Engineering, Clarkson University,Potsdam, New York, USA

Dzmitry Kliazovich, Interdisciplinary Centre for Security, Reliability and Trust,University of Luxembourg, Luxembourg City, Luxembourg

Alberto Leon-Garcia, Department of Electrical and Computer Engineering, Universityof Toronto, Toronto, Ontario, Canada

Marin Litoiu, School of Information Technology, York University, Toronto, Ontario,Canada

Seng W. Loke, Department of Computer Science and Computer Engineering, LatrobeUniversity, Melbourne, Australia

Hongbin Lu, Department of Computer Science and Engineering, York University,Toronto, Ontario, Canada

Edmundo R. M. Madeira, Institute of Computing, State University of Campinas,Campinas, São Paulo, Brazil

Daniel S. Marcon, Institute of Informatics, Federal University of Rio Grande do Sul,Porto Alegre, Brazil

Diogo M. F. Mattos, Grupo de Teleinformática e Automação (GTA/UFRJ),PEE/COPPE - DEL/Poli, Universidade Federal do Rio de Janeiro, Rio de Janeiro,Brazil

Deep Medhi, Computer Science and Electrical Engineering Department, University ofMissouri-Kansas City, Kansas City, MO, USA

H. T. Mouftah, School of Information Technology and Engineering, University ofOttawa, Ottawa, Ontario, Canada

Rodrigo R. Oliveira, Institute of Informatics, Federal University of Rio Grande do Sul,Porto Alegre, Brazil

Simon Ostermann, Institute for Computer Science, University of Innsbruck, Innsbruck,Austria

Karine Pires, Telecom Bretagne, Institut Mines-Telecom, Paris, France

Radu Prodan, Institute for Computer Science, University of Innsbruck, Innsbruck,Austria

Haiyang Qian, China Mobile Technology, Milpitas, CA, USA

Karl Reed, Department of Computer Science and Computer Engineering, LatrobeUniversity, Melbourne, Australia

Javeria Samad, Department of Computer Science and Computer Engineering, LatrobeUniversity, Melbourne, Australia

“9780471697558pre” — 2015/3/11 — 16:22 — page xix — #19

CONTRIBUTORS xix

Mark Shtern, Department of Computer Science and Engineering, York University,Toronto, Ontario, Canada

Bradley Simmons, School of Information Technology, York University, Toronto,Ontario, Canada

Michael Smit, School of Information Management, Dalhousie University, Halifax, NovaScotia, Canada

Juliano Araujo Wickboldt, Institute of Informatics, Federal University of Rio Grandedo Sul, Porto Alegre, Brazil

Tianyi Xing, School of Computing, Informatics, and Decision systems Engineering,Arizona State University, Tempe, AZ, USA

Zhengyang Xiong, School of Computing, Informatics, and Decision systemsEngineering, Arizona State University, Tempe, AZ, USA

Qi Zhang, Department of Electrical and Computer Engineering, University of Toronto,Toronto, Ontario, Canada

Mohamed Faten Zhani, Department of Software and IT Engineering, École detechnologie supérieure, University of Quebec Montreal, Canada

“9780471697558pre” — 2015/3/11 — 16:22 — page xx — #20

“9780471697558part1” — 2015/3/20 — 11:02 — page 1 — #1

PART I

BASIC CONCEPTSAND ENABLINGTECHNOLOGIES

“9780471697558part1” — 2015/3/20 — 11:02 — page 2 — #2

“9780471697558c01” — 2015/3/20 — 11:03 — page 3 — #1

1CLOUD ARCHITECTURES,

NETWORKS, SERVICES, ANDMANAGEMENT

Raouf Boutaba1 and Nelson L. S. da Fonseca2

1D.R. Cheriton School of Computer Science, University of Waterloo, Waterloo,Ontario, Canada

2Institute of Computing, State University of Campinas, Campinas,São Paulo, Brazil

1.1 INTRODUCTION

With the wide availability of high-bandwidth, low-latency network connectivity, theInternet has enabled the delivery of rich services such as social networking, contentdelivery, and e-commerce at unprecedented scales. This technological trend has led tothe development of cloud computing, a paradigm that harnesses the massive capacitiesof data centers to support the delivery of online services in a cost-effective manner. Ina cloud computing environment, the traditional role of service providers is divided intotwo: cloud providers who own the physical data center and lease resources (e.g., vir-tual machines or VMs) to service providers; and service providers who use resourcesleased by cloud providers to execute applications. By leveraging the economies-of-scaleof data centers, cloud computing can provide significant reduction in operational expen-diture. At the same time, it also supports new applications such as big-data analytics(e.g., MapReduce [1]) that process massive volumes of data in a scalable and efficientfashion. The rise of cloud computing has made a profound impact on the development ofthe IT industry in recent years. While large companies like Google, Amazon, Facebook,

Cloud Services, Networking, and Management, First Edition.Edited by Nelson L. S. da Fonseca and Raouf Boutaba.© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

3

“9780471697558c01” — 2015/3/20 — 11:03 — page 4 — #2

4 CLOUD ARCHITECTURES, NETWORKS, SERVICES, AND MANAGEMENT

and Microsoft have developed their own cloud platforms and technologies, many smallcompanies are also embracing cloud computing by leveraging open-source software anddeploying services in public clouds.

However, despite the wide adoption of cloud computing in the industry, the cur-rent cloud technologies are still far from unleashing their full potential. In fact, cloudcomputing was known as a buzzword for several years, and many IT companies wereuncertain about how to make successful investment in cloud computing. Fortunately, withthe significant attraction from both industry and academia, cloud computing is evolvingrapidly, with advancements in almost all aspects, ranging from data center architecturaldesign, scheduling and resource management, server and network virtualization, datastorage, programming frameworks, energy management, pricing, service connectivity tosecurity, and privacy.

The goal of this chapter is to provide a general introduction to cloud networking,services, and management. We first provide an overview of cloud computing, describingits key driving forces, characteristics and enabling technologies. Then, we focus on thedifferent characteristics of cloud computing systems and key research challenges that arecovered in the subsequent 14 chapters of this book. Specifically, the chapters delve intoseveral topics related to cloud services, networking and management including virtual-ization and software-defined network technologies, intra- and inter- data center networkarchitectures, resource, performance and energy management in the cloud, survivability,fault tolerance and security, mobile cloud computing, and cloud applications notably bigdata, scientific, and multimedia applications.

1.2 PART I: INTRODUCTION TO CLOUD COMPUTING

1.2.1 What Is Cloud Computing?

Despite being widely used in different contexts, a precise definition of cloud computingis rather elusive. In the past, there were dozens of attempts trying to provide an accurateyet concise definition of cloud computing [2]. However, most of the proposed definitionsonly focus on particular aspects of cloud computing, such as the business model andtechnology (e.g., virtualization) used in cloud environments. Due to lack of consensus onhow to define cloud computing, for years cloud computing was considered a buzz word ora marketing hype in order to get businesses to invest more in their IT infrastructures. TheNational Institute of Standards and Technology (NIST) provided a relatively standardand widely accepted definition of cloud computing as follows: “cloud computing is amodel for enabling ubiquitous, convenient, on-demand network access to a shared poolof configurable computing resources (e.g., networks, servers, storage, applications, andservices) that can be rapidly provisioned and released with minimal management effortor service provider interaction.” [3]

NIST further defined five essential characteristics, three service models, and fourdeployment models, for cloud computing. The five essential characteristics include thefollowing:

1. On-demand self-service, which states that a consumer (e.g., a service provider)can acquire resources based on service demand;

“9780471697558c01” — 2015/3/20 — 11:03 — page 5 — #3

PART I: INTRODUCTION TO CLOUD COMPUTING 5

2. Broad network access, which states that cloud services can be accessed remotelyfrom heterogeneous client platforms (e.g., mobile phones);

3. Resource pooling, where resources are pooled and shared by consumers in amulti-tenant fashion;

4. Rapid elasticity, which states that cloud resources can be rapidly provisioned andreleased with minimal human involvement;

5. Measured service, which states that resources are controlled (and possibly priced)by leveraging a metering capability (e.g., pay-per-use) that is appropriate to thetype of the service.

These characteristics provide a relatively accurate picture of what cloud computingsystems should look like. It should be mentioned that not every cloud computing systemexhibits all five characteristics listed earlier. For example, in a private cloud, where theservice provider owns the physical data center, the metering capability may not be nec-essary because there is no need to limit resource usage of the service unless it is reachingdata center capacity limits. However, despite the definition and aforementioned char-acteristics, cloud computing can still be realized in a large number of ways, and henceone may argue the definition is still not precise enough. Today, cloud computing com-monly refers to a computing model where services are hosted using resources in datacenters and delivered to end users over the Internet. In our opinion, since cloud comput-ing technologies are still evolving, finding the precise definition of cloud computing atthe current moment may not be the right approach. Perhaps once the technologies havereached maturity, the true definition will naturally emerge.

1.2.2 Why Cloud Computing?

In this section, we present the motivation behind the development of cloud computing.We will also compare cloud computing with other parallel and distributed computingmodels and highlight their differences.

1.2.2.1 Key Driving Forces. There are several driving forces behind the successof cloud computing. The increasing demand for large-scale computation and big dataanalytics and economics are the most important ones. But other factors such as easyaccess to computation and storage, flexibility in resource allocations, and scalability playimportant roles.

Large-scale computation and big data: Recent years have witnessed the rise ofInternet-scale applications. These applications range from social networks (e.g., face-book, twitter), video applications (e.g., Netflix, youtube), enterprise applications (e.g.,SalesForce, Microsoft CRM) to personal applications (e.g., iCloud, Dropbox). Theseapplications are commonly accessed by large numbers of users over the Internet. Theyare extremely large scale and resource intensive. Furthermore, they often have high per-formance requirements such as response time. Supporting these applications requiresextremely large-scale infrastructures. For instance, Google has hundreds of computeclusters deployed worldwide with hundreds of thousands of servers. Another salient

“9780471697558c01” — 2015/3/20 — 11:03 — page 6 — #4


characteristic is that these applications also require access to huge volumes of data. Forinstance, Facebook stores tens of petabytes of data and processes over a hundred ter-abytes per day. Scientific applications (e.g., brain image processing, astrophysics, oceanmonitoring, and DNA analysis) are more and more deployed in the cloud. Cloud comput-ing emerged in this context as a computing model designed for running large applicationsin a scalable and cost-efficient manner by harnessing massive resource capacities in datacenters and by sharing the data center resources among applications in an on-demandfashion.

Economics: To support large-scale computation, cloud providers rely on inexpensivecommodity hardware offering better scalability and performance/price ratio than super-computers. By deploying a very large number of commodity machines, they leverageeconomies of scale bringing per unit cost down and allowing for incremental growth.On the other hand, cloud customers such as small and medium enterprises, which out-source their IT infrastructure to the cloud, avoid upfront infrastructure investment costand instead benefit from a pay-as-you-go pricing and billing model. They can deploy theirservices in the cloud and make them quickly available to their own customers resultingin short time to market. They can start small and scale up and down their infrastructurebased on their customers demand and pay based on usage.

Scalability: By harnessing huge computing and storage capabilities, cloud comput-ing gives customers the illusion of infinite resources on demand. Customers can startsmall and scale up and down resources as needed.

Flexibility: Cloud computing is highly flexible. It allows customers to specify theirresource requirements in terms of CPU cores, memory, storage, and networking capa-bilities. Customers are also offered the flexibility to customize the resources in terms ofoperating systems and possibly network stacks.

Easy access: Cloud resources are accessible from any device connected to the Inter-net. These devices can be traditional workstations and servers or less traditional devicessuch as smart phones, sensors, and appliances. Applications running in the cloud can bedeployed or accessed from anywhere at anytime.

1.2.2.2 Relationship with Other Computing Models. Cloud computing isnot a completely new concept and has many similarities with existing distributed andparallel computing models such as Grid computing and Cluster computing. But cloudcomputing also has some distinguishing properties that explain why existing models arenot used and justify the need for a new one. These can be explained according to twodimensions: scale and service-orientation. Both parallel computing and cloud, computingare used to solve large-scale problems often by subdividing these problems into smallerparts and carrying out the calculations concurrently on different processors. In the cloud,this is achieved using computational models such as MapReduce. However, while paral-lel computing relies on expensive supercomputers and massively parallel multi-processormachines, cloud computing uses cheap, easily replaceable commodity hardware. Gridcomputing uses supercomputers but can also use commodity hardware, all accessiblethrough open, general-purpose protocols and interfaces, and distributed managementand job scheduling middleware. Cloud computing differs from Grid computing in that

“9780471697558c01” — 2015/3/20 — 11:03 — page 7 — #5


it provides high bandwidth between machines, that is more suitable for I/O-intensiveapplications such as log analysis, Web crawling, and big-data analytics. Cloud comput-ing also differs from Grid computing in that resource management and job schedulingis centralized under a single administrative authority (cloud provider) and, unless thisevolves differently in the future, provides no standard application programming inter-faces (APIs). But perhaps the most distinguishing feature of cloud computing comparedto previous computing models is its extensive reliance on virtualization technologies toallow for efficient sharing of resources while guaranteeing isolation between multiplecloud tenants. Regarding the second dimension, unlike other computing models designedfor supporting applications and are mainly application-oriented, cloud computing exten-sively leverages service orientation providing everything (infrastructure, developmentplatforms, software, and applications) as a service.

1.2.3 Architecture

Generally speaking, the architecture of a cloud computing environment can be dividedinto four layers: the hardware/datacenter layer, the infrastructure layer, the platform layer,and the application layer, as shown in Figure 1.1. We describe each of them in detail inthe text that follows:

The hardware layer: This layer is responsible for managing the physical resourcesof the cloud, including physical servers, routers, and switches, and power, and cool-ing systems. In practice, the hardware layer is typically implemented in data centers.A data center usually contains thousands of servers that are organized in racks andinterconnected through switches, routers, or other fabrics. Typical issues at hardwarelayer include hardware configuration, fault-tolerance, traffic management, and powerand cooling resource management.

Resources managed at each layer

Business applications,

web services, multimedia

Examples:End users

Software as a

service (SaaS)

Platform as a

service (PaaS)

Infrastructure

as a service (IaaS)

Google Apps,

Facebook, YouTube

Saleforce.com

Microsoft Azure,

Google AppEngine,

Amazon SimpleDB/S3

Amazon EC2,GoGrid

Flexiscale

Data centers

Application

Platforms

Software framework (Java/Python/.Net)

storage (DB/file)

Computation (VM) storage (block)

Infrastructure

CPU, memory, disk, bandwidth

Hardware

Figure 1.1. Typical architecture in a cloud computing environment.

“9780471697558c01” — 2015/3/20 — 11:03 — page 8 — #6


The infrastructure layer: Also known as the virtualization layer, the infrastructurelayer creates a pool of storage and computing resources by partitioning the physicalresources using virtualization technologies such as Xen [4], KVM [5], and VMware [6].The infrastructure layer is an essential component of cloud computing, since manykey features, such as dynamic resource assignment, are only made available throughvirtualization technologies.

The platform layer: Built on top of the infrastructure layer, the platform layer con-sists of operating systems and application frameworks. The purpose of the platform layeris to minimize the burden of deploying applications directly into VM containers. Forexample, Google App Engine operates at the platform layer to provide API support forimplementing storage, database, and business logic of typical Web applications.

The application layer: At the highest level of the hierarchy, the application layerconsists of the actual cloud applications. Different from traditional applications, cloudapplications can leverage the automatic-scaling feature to achieve better performance,availability, and lower operating cost. Compared to traditional service hosting envi-ronments such as dedicated server farms, the architecture of cloud computing is moremodular. Each layer is loosely coupled with the layers above and below, allowing eachlayer to evolve separately. This is similar to the design of the protocol stack model fornetwork protocols. The architectural modularity allows cloud computing to support awide range of application requirements while reducing management and maintenanceoverhead.

1.2.4 Cloud Services

Cloud computing employs a service-driven business model. In other words, hardware andplatform-level resources are provided as services on an on-demand basis. Conceptually,every layer of the architecture described in the previous section can be implemented as aservice to the layer above. Conversely, every layer can be perceived as a customer of thelayer below. However, in practice, clouds offer services that can be grouped into threecategories: software as a service (SaaS), platform as a service (PaaS), and infrastructureas a service (IaaS).

1. Infrastructure as a service: IaaS refers to on-demand provisioning of infrastruc-tural resources, usually in terms of VMs. The cloud owner who offers IaaS iscalled an IaaS provider.

2. Platform as a service: PaaS refers to providing platform layer resources, includ-ing operating system support and software development frameworks.

3. Software as a service: SaaS refers to providing on-demand applications over theInternet.

The business model of cloud computing is depicted in Figure 1.2. According to thelayered architecture of cloud computing, it is entirely possible that a PaaS provider runsits cloud on top of an IaaS providers cloud. However, in the current practice, IaaS and

“9780471697558c01” — 2015/3/20 — 11:03 — page 9 — #7


End user

Web interface

Utility computing

Service provider (SaaS)

Infrastructure provider (IaaS, PaaS)

Figure 1.2. Cloud computing business model.

PaaS providers are often parts of the same organization (e.g., Google). This is why PaaSand IaaS providers are often called cloud providers [7].

1.2.4.1 Type of Clouds. There are many issues to consider when moving anenterprise application to the cloud environment. For example, some enterprises aremostly interested in lowering operation cost, while others may prefer high reliabilityand security. Accordingly, there are different types of clouds, each with its own benefitsand drawbacks:

• Public clouds: A cloud in which cloud providers offer their resources as servicesto the general public. Public clouds offer several key benefits to service providers,including no initial capital investment on infrastructure and shifting of risks tocloud providers. However, current public cloud services still lack fine-grained con-trol over data, network and security settings, which hampers their effectiveness inmany business scenarios.

• Private clouds: Also known as internal clouds, private clouds are designed forexclusive use by a single organization. A private cloud may be built and managedby the organization or by external providers. A private cloud offers the highestdegree of control over performance, reliability, and security. However, they areoften criticized for being similar to traditional proprietary server farms and do notprovide benefits such as no up-front capital costs.

• Hybrid clouds: A hybrid cloud is a combination of public and private cloud modelsthat tries to address the limitations of each approach. In a hybrid cloud, part ofthe service infrastructure runs in private clouds while the remaining part runs inpublic clouds. Hybrid clouds offer more flexibility than both public and privateclouds. Specifically, they provide tighter control and security over application datacompared to public clouds, while still facilitating on-demand service expansion

“9780471697558c01” — 2015/3/20 — 11:03 — page 10 — #8


and contraction. On the down side, designing a hybrid cloud requires carefullydetermining the best split between public and private cloud components.

• Community clouds: A community cloud refers to a cloud infrastructure that isshared between multiple organizations that have common interests or concerns.Community clouds are a specific type of cloud that relies on the common inter-est and limited participants to achieve efficient, reliable, and secure design of thecloud infrastructure.

Private cloud has always been the most popular type of cloud. Indeed, the develop-ment of cloud computing was largely due to the need of building data centers for hostinglarge-scale online services owned by large private companies, such as Amazon andGoogle. Subsequently, realizing the cloud infrastructure can be leased to other compa-nies for profits, these companies have developed public cloud services. This developmenthas also led to the creation of hybrid clouds and Community clouds, which represent dif-ferent alternatives to share cloud resources among service providers. In the future, it isbelieved that private cloud will remain to be the dominant cloud computing model. This isbecause as online services continue to grow in scale and complexity, it becomes increas-ingly beneficial to build private cloud infrastructure to host these services. In this case,private clouds not only provide better performance and manageability than public cloudsbut also reduced operation cost. As the initial capital investment on a private cloud canbe amortized across large number of machines over many years, in the long-term privatecloud typically has lower operational cost compared to public clouds.

1.2.4.2 SME’s Survey on Cloud Computing. The European Network andInformation Security Agency (ENISA) has conducted a survey on the adaption of thecloud computing model by small to medium enterprises (SMEs). The survey providesan excellent overview of the benefits and limitations of today’s cloud technologies. Inparticular, the survey has found that the main reason for adopting cloud computing isto reduce total capital expenditure on software and hardware resources. Furthermore,most of the enterprises prefer a mixture of cloud computing models (public cloud, pri-vate cloud), which comes with no surprise as each type of cloud has own benefits andlimitations. Regarding the type of cloud services, it seems that IaaS, PaaS, and SaaS allreceived similar scores, even though SaaS is slightly in favor compared to the other two.Last, it seems that data availability, privacy, and confidentiality are the main concerns ofall the surveyed enterprises. As a result, it is not surprising to see that most of the enter-prises prefer to have a disaster recovery plan when considering migration to the cloud.Based on these observations, cloud providers should focus more on improving the secu-rity and reliability aspect of cloud infrastructures, as they represent the main obstaclesfor adopting the cloud computing model by today’s enterprises.

1.2.5 Enabling Technologies

The success of cloud computing is largely driven by successful deployment of itsenabling technologies. In this section, we provide an overview of cloud enablingtechnologies and describe how they contribute to the development of cloud computing.

“9780471697558c01” — 2015/3/20 — 11:03 — page 11 — #9


1.2.5.1 Data Center Virtualization. One of the main characteristics of cloudcomputing is that the infrastructure (e.g., data centers) is often shared by multiple ten-ants (e.g., service providers) running applications with different resource requirementsand performance objectives. This raises the question regarding how data center resourcesshould be allocated and managed by each service provider. A naive solution that has beenimplemented in the early days is to allocate dedicated servers for each application. Whilethis “bare-metal” strategy certainly worked in many scenarios, it also introduced manyinefficiencies. In particular, if the server resource is not fully utilized by the applicationrunning on the server, the resource is wasted as no other application has the right toacquire the resource for its own execution. Motivated by this observation, the industryhas adopted virtualization in today’s cloud data centers. Generally speaking, virtualiza-tion aims at partitioning physical resources into virtual resources that can be allocated toapplications in a flexible manner. For instance, server virtualization is a technology thatpartitions the physical machine into multiple VMs, each capable of running applicationsjust like a physical machine. By separating logical resources from the underlying physi-cal resources, server virtualization enables flexible assignment of workloads to physicalmachines. This not only allows workload running on multiple VMs to be consolidated ona single physical machine, but also enables a technique called VM migration, which is theprocess of dynamically moving a VM from one physical machine to another. Today, vir-tualization technologies have been widely used by cloud providers such as Amazon EC2,Rackspace, and GoGrid. By consolidating workload using fewer machines, server virtu-alization can deliver higher resource utilization and lower energy consumption comparedto allocating dedicated servers for each application.

Another type of data center virtualization that has been largely overlooked in thepast is network virtualization. Cloud applications today are becoming increasingly data-intensive. As a result, there is a pressing need to determine how data center networksshould be shared by multiple tenants with diverse performance, security and man-ageability requirements. Motivated by these limitations, there is an emerging trendtowards virtualizing data center networks in addition to server virtualization. Simi-lar to server virtualization, network virtualization aims at creating multiple VNs ontop of a shared physical network substrate allowing each VN to be implemented andmanaged independently. By separating logical networks from the underlying physicalnetwork, it is possible to implement network resource guarantee and introduce cus-tomized network protocols, security, and management policies. Combining with servervirtualization, a fully virtualized data centers support the allocation in the form of vir-tual infrastructures or VIs (also known as virtual data centers (VDC)), which consistof VMs inter-connected by virtual networks. The scheduling and management of VIshave been studied extensively in recent years. Commercial cloud providers are alsopushing towards this direction. For example, the Amazon Virtual Private Cloud (VPC)already provides limited features to support network virtualization in addition to servervirtualization.

1.2.5.2 Cloud Networking. To ensure predictable performance over the cloud,it is of utmost importance to design efficient networks that are able to provide guaranteedperformance and to scale with the ever-growing traffic volumes in the cloud. Traditional

“9780471697558c01” — 2015/3/20 — 11:03 — page 12 — #10


data center network architectures suffer from many limitations that may hinder the per-formance of large-scale cloud services. For instance, the widely-used tree-like topologydoes not provide multiple paths between the nodes, and hence limits the scalability ofthe network and the ability to mitigate node and link congestion and failures. More-over, current technologies like Ethernet and VLANs are not well suited to support cloudcomputing requirements like multi-tenancy or performance isolation between differenttenants/applications. In recent years, several research works have focused on designingnew data center network architectures to overcome these limitations and enhance per-formance, fault tolerance and scalability (e.g., VL2 [38], Portland [9], NetLord [10]).Furthermore, the advent of software-defined networking (SDN) technology brings newopportunities to redesign cloud networks [11]. Thanks to the programmability offeredby this technology, it is now possible to dynamically adapt the configuration of the net-work based on the workload. It also makes it easy to implement policy-based networkmanagement schemes in order to achieve potential cloud providers’ objectives in termsof performance, utilization, survivability, and energy efficiency.

1.2.5.3 Data Storage and Management. As mentioned previously, one ofthe key driving forces for cloud computing is the need to process large volumes of datain a scalable and efficient manner. As cloud data centers typically consist of commod-ity servers with limited storage and processing capacities, it is necessary to developdistributed storage systems that support efficient retrieval of desired data. At the sametime, as failures are common in commodity machine-based data centers, the distributedstorage system must also be resilient to failures. This usually implies each file blockmust be replicated on multiple machines. This raises challenges regarding how the dis-tributed storage system should be designed to achieve availability and high performance,while ensuring file replicas remain consistent over time. Unfortunately, the famous CAPtheorem [12] states that simultaneously achieving all three objectives (consistency, avail-ability, and robustness to network failures) is not a viable task. As result, recently manyfile systems such Google File System [13], Amazon Dynamo [14], Cassandra [15] aretrying to explore various trade-offs among the three objectives based on applications’needs. For example, Amazon Dynamo adopts an eventual consistency model that allowreplicas to be temporary out-of-sync. By sacrificing consistency, Dynamo is able toachieve significant improvement in server response time. It is evident that these stor-age systems provide the foundations for building large-scale data-intensive applicationsthat are commonly found in today’s cloud data centers.

1.2.5.4 MapReduce Programming Model. Cloud computing has becomethe most cost-effective technology for hosting Internet-scale applications. Companieslike Google and Facebook generate enormous volumes of data on a daily basis that needto be processed in a timely manner. To meet this requirement, cloud providers use com-putational models such as MapReduce [1] and Dryad [16]. In these models, a job spawnsmany small tasks that can be executed concurrently on multiple machines, resulting insignificant reduction in job completion time. Furthermore, to cope with software andhardware exceptions frequent in large-scale clusters, these models provide built-in fault

“9780471697558c01” — 2015/3/20 — 11:03 — page 13 — #11


tolerance features that automatically restart failed tasks when exceptions occur. As aresult, these computational models are very attractive not only for running data-intensivejobs but also for computation-intensive applications. The MapReduce model, in par-ticular, is largely used nowadays in cloud infrastructures for supporting a wide rangeof applications and has been adapted to several computing and cluster environments.Despite this success, the adoption of MapReduce has implications on the management ofcloud workload and cluster resources, which is still largely unstudied. In particular, manychallenges pertaining to MapReduce job scheduling, task and data placement, resourceallocation, and sharing are yet to be addressed.

1.2.5.5 Resource Management. Resource management has always been acentral theme of cloud computing. Given the large variety of applications running in thecloud, it is a challenging problem to determine how each application should be scheduledand managed in a scalable and dynamic manner. The scheduling of individual applicationcomponent can be formulated as a variant of the multi-dimensional vector bin-packingproblem, which is already NP-hard in the general case. Furthermore, different applica-tions may have different scheduling needs. For example, individual tasks of a singleMapReduce job can be scheduled independently over time, whereas the servers of athree-tier Web application must be scheduled simultaneously to ensure service availabil-ity. Therefore, finding a scheduling scheme that satisfy diverse application schedulingrequirement is a challenging problem. The recent work on multi-framework scheduling(e.g., MESOS [17]) provides a platform to allow various scheduling frameworks, suchas MapReduce, Spark, and MPI to coexist in a single cloud infrastructure. The work ondistributed schedulers (e.g., Omega [18] and Sparrow [19]) also aim at improving thescalability of schedulers by having multiple schedulers perform scheduling in parallel.These technologies will provide the functionality to support a wide range of workload inthe cloud data center environments.

1.2.5.6 Energy Management. Data centers consume tremendous amount ofenergy, not only for powering up the servers and network devices, but also for cool-ing down these components to prevent overheating conditions. It has been reported thatenergy cost accounts for 15% of the average data center operation expenditure. At thesame time, such large energy consumption also raises environmental concerns regardingthe carbon emissions for energy generation. As a result, improving data center energyefficiency has become a primary concern for today’s data center operators. A widelyused metric for measuring energy efficiency of data centers is power usage effectiveness(PUE), which is computed as the ratio between the computer infrastructure usage and thetotal data center power usage. Even though none of the existing data centers can achievethe ideal PUE value of 1.0, many cloud data centers today have become very energyefficient with PUE less than 1.1.

There are many techniques for improving data center energy efficiency. At the infras-tructure level, many cloud providers leverage nearby renewable energy source (i.e., solarand wind) to reduce energy cost and carbon footprint. At the same time, it is also pos-sible to leverage environmental conditions (e.g., low temperature conditions) to reduce

“9780471697558c01” — 2015/3/20 — 11:03 — page 14 — #12


cooling cost. For example, Facebook recently announced the construction of a cloud datacenter in Sweden, right on the edge of the arctic circle, mainly due to the low air temper-ature that can reduce cooling cost. The Net-Zero Energy Data Center developed by HPlabs leverages locally generated renewable energy and workload demand managementtechniques to significantly reduce the energy required to operate data centers. We believethe rapid development of cloud energy management techniques will continue to push thedata center energy efficiency towards the ideal PUE value of 1.0.

1.2.5.7 Security and Privacy. Security is another major concern of cloud com-puting. While security is not a critical concern in many private clouds, it is often a keybarrier to the adoption of cloud computing in public clouds. Specifically, since serviceproviders typically do not have access to the physical security system of data centers,they must rely on cloud providers to achieve full data security. The cloud provider, in thiscontext, must achieve the following objectives: (1) confidentiality, for secure data accessand transfer, and (2) auditability, for attesting whether security setting of applicationshas been tampered or not. Confidentiality is usually achieved using cryptographic proto-cols, whereas auditability can be achieved using remote attestation techniques. Remoteattestation typically requires a trusted platform module (TPM) to generate nonforgeablesystem summary (i.e., system state encrypted using TPM private key) as the proof ofsystem security. However, in a virtualized environment like the clouds, VMs can dynam-ically migrate from one location to another, hence directly using remote attestation is notsufficient. In this case, it is critical to build trust mechanisms at every architectural layerof the cloud. First, the hardware layer must be trusted using hardware TPM. Second, thevirtualization platform must be trusted using secure VM monitors. VM migration shouldonly be allowed if both source and destination servers are trusted. Recent work has beendevoted to designing efficient protocols for trust establishment and management.

1.3 PART II: RESEARCH CHALLENGES—THE CHAPTERS IN THISBOOK

This book covers the fundamentals of cloud services, networking and management andfocuses on most prominent research challenges that have drawn the attention of theIT community in the past few years. Each of the 14 chapters of this book provides anoverview of some of the key architectures, features, and technologies of cloud services,networking and management systems and highlights state-of-the-art solutions and pos-sible research gaps. The chapters of the book are written by knowledgeable authors thatwere carefully selected based on their expertise in the field. Each chapter went througha rigorous review process, including external reviewers, the book editors Raouf Boutabaand Nelson Fonseca, and the series editors Tom Plevyak and Veli Sahin. In the following,we briefly describe the topics covered by the different chapters of this book.

1.3.1 Virtualization in the Cloud

Virtualization is one of the key enabling technologies that made cloud computing modela reality. Initially, virtualization technologies have allowed to partition a physical server

“9780471697558c01” — 2015/3/20 — 11:03 — page 15 — #13

PART II : RESEARCH CHALLENGES—THE CHAPTERS IN THIS BOOK 15

into multiple isolated environments called VMs that may eventually host different operat-ing systems and be used by different users or applications. As cloud computing evolved,virtualization technologies have matured and have been extended to consider not onlythe partitioning of servers but also the partitioning of the networking resources (e.g.,links, switches and routers). Hence, it is now possible to provide each cloud user witha VI encompassing VMs, virtual links, and virtual routers and switches. In this context,several challenges arise especially regarding the management of the resulting virtualizedenvironment where different types of resources are shared among multiple users.

In this chapter, the authors outline the main characteristics of these virtualizedinfrastructures and shed light on the different management operations that need to beimplemented in such environments. They then summarize the ongoing efforts towardsdefining open standard interfaces to support virtualization and interoperability in thecloud. Finally, the chapter provides a brief overview of the main open-source cloudmanagement platforms that have recently emerged.

1.3.2 VM Migration

One of the powerful features brought by virtualization is the ability to easily migrate VMswithin the same data center or even between geographically distributed data centers.This feature provides an unprecedented flexibility to network and data center opera-tors allowing them to perform several management tasks like dynamically optimizingresource allocations, improving fault tolerance, consolidating workloads, avoiding serveroverload, and scheduling maintenance activities. Despite all these benefits, VM migra-tion induces several costs, including higher utilization of computing and networkingresources, inevitable service downtime, security risks, and more complex managementchallenges. As a result, a large number of migration techniques have been recently pro-posed in the literature in order to minimize these costs and make VM migration a moreeffective and secure tool in the hand of cloud providers.

This chapter starts by providing an overview of VM migration techniques. It thenpresents, XenFlow, a tool based on Xen and OpenFlow, and allowing to deploy, isolateand migrate VIs. Finally, the authors discuss potential security threats that can arise whenusing VM migration.

1.3.3 Data Center Networks and Relevant Standards

Today’s cloud data centers are housing hundreds of thousands of machines that con-tinuously need to exchange tremendous amounts of data with stringent performancerequirements in terms of bandwidth, delay, jitter, and loss rate. In this context, the datacenter network plays a central role to ensure a reliable and efficient communicationbetween machines, and thereby guarantee continuous operation of the data center andeffective delivery of the cloud services. A data center network architecture is typicallydefined by the network topology (i.e., the way equipment are inter-connected) as wellas the adopted switching, routing, and addressing schemes and protocols (e.g., Ethernetand IP).

“9780471697558c01” — 2015/3/20 — 11:03 — page 16 — #14


Traditional data center network architectures suffer from several limitations and arenot able to satisfy new application requirements spawned by cloud computing model interms of scalability, multitenancy and performance isolation. For instance, the widelyused tree-like topology does not provide multiple paths between the nodes, and hencelimits the ability to survive node and link failures. Also, current switches have limitedforwarding table sizes, making it difficult for traditional data center networks to handlethe large number of VMs that may exist in virtualized cloud environments. Another issueis with the performance isolation between tenants as there is no bandwidth allocationmechanism in place to ensure predictable network performance for each of them.

In order to cope with these limitations, a lot of attention has been devoted in the pastfew years to study the performance of existing architectures and to design better solu-tions. This chapter dwells on these solutions covering data center network architectures,topologies, routing protocols and addressing schemes that have been recently proposedin the literature.

1.3.4 Interdata Center Networks

In recent years, cloud providers have largely relied on large-scale cloud infrastructures tosupport Internet-scale applications efficiently. Typically, these infrastructures are com-posed of several geographically distributed data centers connected through a backbonenetwork (i.e., an inter-data center network). In this context, a key challenge facing cloudproviders is to build cost-effective backbone networks while taking into account sev-eral considerations and requirements including scalability, energy efficiency, resilience,and reliability. To address this challenge, many factors should be considered. The scal-ability requirement is due to the fact that the volume of data exchanged between datacenters is growing exponentially with the ever-increasing demand in cloud environments.The energy efficiency requirement concerns how to minimize the energy consumption ofthe infrastructure. Such a requirement is not only crucial to make the infrastructure moregreen and environmental-friendly but also essential to cut down operational expenses.Finally, the resilience of the interdata center network requirement is fundamental tomaintain a continuous and reliable cloud services.

This chapter investigates the different possible alternatives to design and managecost-efficient cloud backbones. It then presents mathematical formulations and heuristicsolutions that could be adopted to achieve desired objectives in terms of energy effi-ciency, resilience and reliability. Finally, the authors discuss open issues and key researchdirections related to this topic.

1.3.5 OpenFlow and SDN for Clouds

The past few years have witnessed the rise of SDN, a technology that makes it possi-ble to dynamically configure and program networking elements. Combined with cloudcomputing technologies, SDN enables the design of highly dynamic, efficient, and cost-effective shared application platforms that can support the rapid deployment of Internetapplications and services.

“9780471697558c01” — 2015/3/20 — 11:03 — page 17 — #15


This chapter discusses the challenges faced to integrate SDN technology in cloudapplication platforms. It first provides a brief overview of the fundamental concepts ofSDN including OpenFlow technology and tools like Open vSwitch. It also introduces thecloud platform OpenStack with a focus on its Networking Service (i.e., Neutron project),and shows how cloud computing environments can benefit from SDN technology toprovide guaranteed networking resources within a data center and to interconnect datacenters. The authors also review major open source efforts that attempt to integrate SDNtechnology in cloud management platforms (e.g., OpenDaylight open source project) anddiscuss the notion of software-defined infrastructure (SDI).

1.3.6 Mobile Cloud Computing

Mobile cloud computing has recently emerged as a new paradigm that combines cloudcomputing with mobile network technology with the goal of putting the scalabilityand limitless resources of the cloud into the hands of mobile service and applicationproviders. However, despite of its potential benefits, the growth of mobile cloud com-puting in recent years was hampered by several technical challenges and risks. Thesechallenges and risks are mainly due to the inherent limitations of mobile devices suchas the scarcity of resources, the limited energy supply, the intermittent connectivity inwireless networks, security risks, and legal/environmental risks.

This chapter starts by providing an overview of mobile cloud computing applicationmodels and frameworks. It also defines risk management and identifies and analyzesprevalent risk factors found in mobile cloud computing environments. The authors alsopresent an analysis of mobile cloud frameworks from a risk management perspectiveand discusses the effectiveness of traditional risk approaches to address mobile cloudcomputing risks.

1.3.7 Resource Management and Scheduling

Resource allocation and scheduling are two crucial functions in cloud computing envi-ronments. Generally speaking, cloud providers are responsible for allocating resources(e.g., VMs) with the goal of satisfying the promised service-level agreement (SLA) whileincreasing their profit. This can be achieved by reducing operational costs (e.g., energycosts) and sharing resources among the different users. At the opposite side, cloud usersare responsible for application scheduling that aims at mapping tasks from applicationssubmitted by users to computational resources in the system. The goals of schedulinginclude maximizing the usage of the leased resources, and minimizing costs by dynami-cally adjusting the leased resources to the demand while maintaining the required qualityof service.

Resource allocation and scheduling are both vital to cloud users and providers, butthey both have their own specifics, challenges and potentially conflicting objectives.This chapter starts by a review of the different cloud types and service models and thendiscusses the typical objectives of cloud providers and their clients. The chapter pro-vides also mathematical formulations to the problems, VM allocation, and application

“9780471697558c01” — 2015/3/20 — 11:03 — page 18 — #16


scheduling. It surveys some of the existing solutions and discusses their strengths andweaknesses. Finally, it points out the key research directions pertaining to resourcemanagement in cloud environments.

1.3.8 Autonomic Performance Management for Multi-Clouds

The growing popularity of the cloud computing model have led to the emergence ofmulticlouds or clouds of clouds where multiple cloud systems are federated togetherto further improve and enhance cloud services. Multiclouds have several benefits thatrange from improving availability, to reducing lock-in, and optimizing costs beyond whatcan be achieved within a single cloud. At the same time, multi-clouds bring new chal-lenges in terms of the design, development, deployment, monitoring, and managementof multi-tier applications able to capitalize on the advantages of such distributed infras-tructures. As a matter of fact, the responsibility for addressing these challenges is sharedamong cloud providers and cloud users depending on the type of service (i.e., IaaS, PaaS,and SaaS) and SLAs. For instance, from an IaaS cloud provider’s perspective, manage-ment focuses mainly on maintaining the infrastructure, allocating resources requestedby clients and ensuring their high availability. By contrast, cloud users are responsiblefor implementing, deploying and monitoring applications running on top of resourcesthat are eventually leased from several providers. In this context, a compelling challengethat is currently attracting a lot of attention is how to develop sophisticated tools thatsimplify the process of deploying, managing, monitoring, and maintaining large-scaleapplications over multi-clouds.

This chapter focuses on this particular challenge and provides a detailed overviewof the design and implementation of XCAMP, the X-Cloud Application ManagementPlatform that allows to automate application deployment and management in multitierclouds. It also highlights key research challenges that require further investigation in thecontext of performance management and monitoring in distributed cloud environments.

1.3.9 Energy Management

Cloud computing environments mainly consist of data centers where thousands ofservers and other systems (e.g, power distribution and cooling equipment) are consumingtremendous amounts of energy. Recent reports have revealed that energy costs repre-sent more than 12% of the total data center operational expenditures, which translatesinto millions of dollars. More importantly, high energy consumption is usually syn-onymous of high carbon footprint, raising serious environmental concerns and pushinggovernments to put in place more stringent regulations to protect the environment. Con-sequently, reducing energy consumption has become one of the key challenges facingtoday’s data center managers. Recently, a large body of work has been dedicated to inves-tigate possible techniques to achieve more energy-efficient and environment-friendlyinfrastructures. Many solutions have been proposed including dynamic capacity provi-sioning and optimal usage of renewable sources of energy (e.g., wind power and solar).

“9780471697558c01” — 2015/3/20 — 11:03 — page 19 — #17


This chapter further details the trends in energy management solutions in cloud datacenters. It first surveys energy-aware resource scheduling and allocation schemes aimingat improving energy efficiency, and then provides a detailed description of GreenCloud,an energy-aware cloud data center simulator.

1.3.10 Survivability and Fault Tolerance in the Cloud

Despite the success of cloud computing, its widespread and full-scale adoption have beenhampered by the lack of strict guarantees on the reliability and availability of the offeredresources and services. Indeed, outages, failures and service disruption can be fatal formany businesses. Not only they incur significant revenue loss—as much as hundreds ofthousands of dollars per minute for some services—but they may also hurt the businessreputation in the long term and impact on customers’ loyalty and satisfaction. Unfortu-nately, major cloud providers like Amazon EC2, Google, and Rackspace are not yet ableto satisfy the high availability and reliability levels required for such critical services.

Consequently, a growing body of work has attempted to address this problem and topropose solutions to improve the reliability of cloud services and eventually provide morestringent guarantees to cloud users. This chapter provides a comprehensive literaturesurvey on this particular topic. It first lays out cloud computing and survivability-relatedconcepts, and then covers recent studies that analyzed and characterized the types offailures found in cloud environments. Subsequently, the authors survey and compare thesolutions put forward to enhance cloud services’ fault-tolerance and to guarantee highavailability of cloud resources.

1.3.11 Cloud Security

Security has always been a key issue for cloud-based services and several solutions havebeen proposed to protect the cloud from malicious attacks. In particular, intrusion detec-tion systems (IDS) and intrusion prevention systems (IPS) have been widely deployedto improve cloud security and have been recently empowered with new technologieslike SDN to further enhance their effectiveness. For instance, the SDN technology hasbeen leveraged to dynamically reconfigure the cloud network and services and betterprotect them from malicious traffic. In this context, this chapter introduces FlowIPS, anOpenFlow-based IPS solution for intrusion prevention in cloud environments. FlowIPSimplements SDN-based control functions based on Open vSwitch (OVS) and pro-vides novel Network Reconfiguration (NR) features by programming POX controllers.Finally, the chapter presents the performance evaluation of FlowIPS that demonstratesits efficiency compared to traditional IPS solutions.

1.3.12 Big Data on Clouds

Big data has emerged as a new term that describes all challenges related to the manipu-lation of large amounts of data including data collection, storage, processing, analysis,and visualization.

“9780471697558c01” — 2015/3/20 — 11:03 — page 20 — #18


This chapter articulates some of the success enablers for deploying Big Data onClouds (BDOC). It starts by providing some historical perspectives and by describingemerging Internet services and applications. It then describes some legal issues relatedto big data on clouds. In particular, it highlights emerging hybrid big data manage-ment roles, the development and operations (DevOps), and Site Reliability Engineering(SRE). Finally, the chapter discusses science, technology, engineering, and mathemat-ics (STEM) talent cultivation and engagement, as an enabler to technical succession andfuture success for global enterprises of big data on clouds.

1.3.13 Scientific Applications on Clouds

In order to cope with the requirements of scientific applications, cloud providers haverecently proposed new coordination and management tools and services (e.g., AmazonSimple WorkFlow or SWF) in order to automate and streamline task processes executedby the cloud applications. Such services allow to specify the dependencies between thetasks, their order of execution and make it possible to track their progress and the currentstate of each of them. In this context, a compelling challenge is to ensure the compatibil-ity between existing workflow systems and to provide the possibility to reuse scientificlegacy code.

This chapter presents a software engineering solution that allows the scientific work-flow community to use Amazon cloud via a single front-end converter. In particular, itdescribes a wrapper service for executing legacy code using Amazon SWF. The chap-ter also describes the experimental results demonstrating that the automatically SWFapplication generated by the wrapper provides a performance comparable to the nativemanually optimized workflow.

1.3.14 Interactive Multimedia Applications on Clouds

The booming popularity of cloud computing has led to the emergence of a large arrayof new applications such as social networking, gaming, live streaming, TV broadcast-ing, and content delivery. For instance, cloud gaming allows direct on-demand access togames whose content is stored in the cloud and streamed directly to end users throughthin clients. As a result, less powerful game consoles or computers are needed as mostof the processing is carried out in the hosting cloud, leveraging its seemingly unlimitedresources. Another prominent cloud application is the Massive user-generated content(UGC) live streaming that allows each simple Internet user to become a TV or con-tent provider. A similar application that has become extremely popular is time-shiftingon-demand TV as many services like catch-up TV (i.e., the content of a TV channel isrecorded for many days and can be requested on demand) and TV surfing (i.e., the pos-sibility of pausing, forwarding or rewinding of a video stream) have recently becamewidely demanded. Naturally, the cloud is the ideal platform to host such services asit provides the processing and storage capacity required to ensure a high quality ofservice. However, several challenges are not addressed yet especially because of thestringent performance requirements (e.g., delay) of such multimedia applications andthe increasing amounts of traffic they generate.

“9780471697558c01” — 2015/3/20 — 11:03 — page 21 — #19

REFERENCES 21

This chapter discusses the deployment of these applications over the cloud. It startsby laying out content delivery models in general, and then provides a detailed study of theperformance of three prominent multimedia cloud applications, namely cloud gaming,massive user-generated content live streaming and time-shifting on-Demand TV.

1.4 CONCLUSION

Editing and preparing a book on such an important topic is a challenging task requiringa lot of effort and time. As the editors of this book, we are grateful to many individu-als who contributed to its successful completion. We would like to thank the chapters’authors for their high-quality contributions, the reviewers for their insightful commentsand feedback, and the book series editors for their support and guidance. Finally, we hopethat the reader finds the topics and the discussions presented in this book informative,interesting, and inspiring and pave the way for designing new cloud platforms able tomeet the requirements of future Internet applications and services.

REFERENCES

1. J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,”Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.

2. L. M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner, “A break in the clouds: Towardsa cloud definition,” ACM SIGCOMM Computer Communication Review, vol. 39, no. 1, pp.50–55, 2008.

3. P. Mell and T. Grance, “The NIST definition of cloud computing (draft),” NIST SpecialPublication, vol. 800, no. 145, p. 7, 2011.

4. P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, andA. Warfield, “Xen and the art of virtualization,” ACM SIGOPS Operating Systems Review,vol. 37, no. 5, pp. 164–177, 2003.

5. A. Kivity, Y. Kamay, D. Laor, U. Lublin, and A. Liguori, “KVM: The linux virtual machinemonitor,” in Proceedings of the Linux Symposium, vol. 1, Dttawa, Dntorio, Canada, 2007, pp.225–230.

6. F. Guthrie, S. Lowe, and K. Coleman, VMware vSphere Design. John Wiley & Sons,Indianapolis, IN, 2013.

7. A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, andI. Stoica, “Above the clouds: A berkeley view of cloud computing,” Department of ElectricalEngineering and Computer Sciences, University of California, Berkeley, CA, Rep. UCB/EECS,vol. 28, p. 13, 2009.

8. A. Greenberg, J. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. Maltz, P. Patel, andS. Sengupta, “VL2: A scalable and flexible data center network,” in Proceedings ACMSIGCOMM, Barcelona, Spain, August 2009.

9. R. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya,and A. Vahdat, “PortLand: A scalable fault-tolerant layer 2 data center network fabric,” inProceedings ACM SIGCOMM, Barcelona, Spain, August 2009.

“9780471697558c01” — 2015/3/20 — 11:03 — page 22 — #20


10. J. Mudigonda, P. Yalagandula, B. Stiekes, and Y. Pouffary, “NetLord: A scalable multi-tenantnetwork architecture for virtualized datacenters,” in Proceedings ACM SIGCOMM, Toronto,Dntorio, Canada, August 2011.

11. N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker,and J. Turner, “Openflow: Enabling innovation in campus networks,” SIGCOMM ComputerCommunnication Review, vol. 38, no. 2, pp. 69–74, March 2008.

12. S. Gilbert and N. Lynch, “Brewer’s conjecture and the feasibility of consistent, available,partition-tolerant web services,” ACM SIGACT News, vol. 33, no. 2, pp. 51–59, 2002.

13. S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google file system,” in ACM SIGOPSOperating Systems Review, vol. 37, no. 5, pp. 29–43, 2003.

14. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasub-ramanian, P. Vosshall, and W. Vogels, “Dynamo: Amazon’s highly available key-value store,”in ACM SIGOPS Operating Systems Review, vol. 41, no. 6, pp. 205–220, 2007.

15. A. Lakshman and P. Malik, “Cassandra: A decentralized structured storage system,” ACMSIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010.

16. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: Distributed data-parallel pro-grams from sequential building blocks,” ACM SIGOPS Operating Systems Review, vol. 41,no. 3, pp. 59–72, 2007.

17. B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, andI. Stoica, “MESOS: A platform for fine-grained resource sharing in the data center,” in Pro-ceedings of the 8th USENIX Conference on Networked Systems Design and Implementation,Boston, MA, 2011, pp. 22–22.

18. M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes, “Omega: Flexible, scalableschedulers for large compute clusters,” in Proceedings of the 8th ACM European Conferenceon Computer Systems, Prague, Czech Republic. ACM, New York, 2013, pp. 351–364.

19. K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica, “Sparrow: Distributed, low latencyscheduling,” in Proceedings of the Twenty-Fourth ACM Symposium on Operating SystemsPrinciples, Farmington, PA. ACM, New York, 2013, pp. 69–84.

“9780471697558c02” — 2015/3/20 — 11:05 — page 23 — #1

2

VIRTUALIZATION IN THE CLOUDLisandro Zambenedetti Granville, Rafael Pereira Esteves, and

Juliano Araujo Wickboldt

Institute of Informatics, Federal University of Rio Grande do Sul,Porto Alegre, Brazil

2.1 THE NEED FOR VIRTUALIZATION MANAGEMENTIN THE CLOUD

Cloud infrastructures are aggregates of computing, storage, and networking resourcesdeployed along centralized or distributed data centers devoted to support companies’applications or, in the case of services being offered through the Internet, to support cloudcustomers’ applications. Companies such as Google, Amazon, Facebook, and Microsoftrely on cloud infrastructures to support various services such as Web search, e-mail,social networking, and e-commerce. By leasing physical infrastructure to external cus-tomers, cloud providers encourage the development of novel services and, at the sametime, generate revenue to cover deployment and operation costs of clouds. Cloud resourcesharing is then critical for the cloud computing model.

To allow multiple customers, cloud providers rely on virtualization technologiesto build virtual infrastructures (VIs) comprising logical instances of physical resources(e.g., servers, network, and storage). The provisioning of VIs must consider requirementsof both cloud providers and customers. While the main objective of cloud providers is


23

“9780471697558c02” — 2015/3/20 — 11:05 — page 24 — #2

24 VIRTUALIZATION IN THE CLOUD

to generate revenue by accommodating a large number of VIs, customers, in their turn,have specific needs, such as storage capacity, high availability, processing power (usuallyrepresented by the number of leased virtual machines or VMs), guaranteed bandwidthamong VMs, and load balancing. Inefficiencies in the provisioning process can leadto negative consequences for cloud providers, including customer defection, financialpenalties when service-level agreements (SLAs) are not satisfied, and low utilizationof the physical infrastructure. In summary, management of physical and VIs is vital toenabling proper cloud resource sharing.

Current cloud provisioning systems allow customers to select among differentresource configurations (e.g., CPU, memory, and disk) to build a VI. Customers are themain responsible for choosing the resources that will better fit their application’s needs.The cloud provider, in turn, either (a) allocates resources for the VI on physical data cen-ters, or (b) rejects the allocation if there are not enough resources to satisfy the customers’requirements. Cloud providers run allocation algorithms to find the best way to map VIsonto the physical substrate according to well-defined objectives, such as minimizing theallocation cost, reducing energy consumption, or maximizing residual capacity of theinfrastructure. Mapping virtual to physical resources is commonly referred to as embed-ding and has been extensively studied in the context of network virtualization [1–3].

Embedding is an example of a network virtualization aspect that needs to be prop-erly managed. Choosing the appropriate embedding algorithm, and deciding when itshould be triggered (e.g., when new VI requests arrive or when on-the-fly optimiza-tions of the physical substrate are needed) is a management activity that needs to beconsciously performed by the cloud management operator or team. Other virtualizationmanagement aspects encompass operations such as: monitoring, to detect abusing appli-cations/customers; configuration, to tune VI and physical substrate; and discovery, toidentify collaborating VIs that would be better placed closer to one another. In addition tomanagement operations, virtualization management requires understanding of the diver-sity of target elements because both VI and physical substrate are quite heterogeneousin regard to the resources they use/offer. That impacts the management operations them-selves, since, for example, monitoring and configuring a physical server can be quitedifferent than monitoring and configuring network devices and traffic. Operations andtarget elements are thus two important dimensions of cloud virtualization management.

In this chapter, we cover the management of virtualization in the cloud. Our observa-tions primarily take the perspective of cloud providers who need to manage their substrateand hosted VIs to guarantee that the services offered to customers are operating prop-erly. Virtualization management is a quite new discipline because virtualization itself, atleast as it has been employed these days, is also quite recent. Management is achieved byborrowing techniques from other areas, such as network management and parallel pro-cessing. We concentrate our discussion on the two management dimensions mentionedbefore, i.e., management operations and target elements. Although other dimensionsdo exist, we will focus on the operations and elements because they are the essentialdimensions a cloud manager needs to take into account in the first place.

The remaining of this chapter is organized as follows. In Section 2.2, we reviewsome basic concepts of virtualization in cloud computing environments. In Section 2.3,we describe the main elements of a virtualized cloud environment. In Section 2.4, we list

“9780471697558c02” — 2015/3/20 — 11:05 — page 25 — #3

BASIC CONCEPTS 25

the main virtualization-related operations that need to be supported by a cloud platform.In Section 2.5, we review some of the most important efforts towards the definition ofopen standard interfaces to support virtualization and interoperability in the cloud. InSection 2.6, we list some of the most important efforts currently targeted to build toolsand systems for virtualization and cloud resource management. Finally, in Section 2.7, welist key challenges that can guide future developments in the virtualization managementand also mention some ongoing research in the field.

2.2 BASIC CONCEPTS

Clouds can be public, private, or hybrid. Public clouds offer resources to any interestedtenant (e.g., Amazon EC2 and Windows Azure). Private clouds usually belong to a singleorganization and only the members of that organization have access to the resources.Hybrid clouds are combinations of different types of cloud. For example, a private cloudthat needs to temporarily extend its capacity can borrow resources from a public cloud,thus forming a hybrid cloud.

Cloud services are organized according to three basic business models. In infras-tructure as a service (IaaS), cloud providers offer logical instances of physical resources,such as VMs, virtual storage, and virtual links to interested tenants. In platform as aservice (PaaS), tenants can request a computing platform including an operating sys-tem and a development environment. The software as a service (SaaS) model offersend-applications (e.g., Google Drive and Dropbox) to customers. Other models, such asnetwork as a service (NaaS) are also possible in the cloud, but are not found so frequentlyin the literature [4].

Virtualization is the key technology to enable cloud computing. Virtualizationabstracts the internal details of physical resources and enables resource sharing. Usingvirtualization, a physical resource (e.g., server, router, link) can be shared among differentusers or applications. The core of cloud computing environments is based on virtual-ized data centers, which are facilities consisting of computing servers, storage, networkdevices, cooling, and power systems.

Virtualization can be accomplished by different technologies according to the targetelement. Server virtualization, for example, relies on a layer of software called hyper-visor, also known as VM monitor. The hypervisor is the component responsible foractually creating, removing, copying, migrating, and running VMs. Virtual links, in turn,can be created by configuring Ethernet VLANs between the physical nodes hosting thevirtual ones. Multiprotocol label switching (MPLS) label switched paths (LSPs) andgeneric routing encapsulation (GRE) tunnels are other candidates to establish virtuallinks.

Participants in the cloud comprise two main roles: the cloud provider, also knownas infrastructure provider, owns the physical resources that can be leased to one or moretenants, also known as service providers, who build VIs composed of virtual instancesof computing, storage, and networking resources. A VI can be also referred to as a cloudslice. After the instantiation of a VI, tenants can deploy a variety of applications that willrely on these virtual resources.

“9780471697558c02” — 2015/3/20 — 11:05 — page 26 — #4


A cloud platform is a software that allows tenants to request and instantiate VIs.Tenants can specify the amount of resources to build their VIs and the specific charac-teristics of each resource, such as CPU and memory for computing, disk size for storage,and bandwidth capacity for links. The cloud platform then interacts with the underlyingvirtualization software (hypervisor) to create and configure the VI. In order to facilitateresource management and allow interoperability, cloud platforms offer specific interfacesfor applications running in VIs. Such interfaces define operations that can be executedin the cloud platform.

2.3 VIRTUALIZED ELEMENTS

As stated in previous sections, virtualization plays a key role in modern cloud comput-ing environments by improving resource utilization and reducing costs. Typical elementsthat can be virtualized in a cloud computing environment include computing and stor-age. Recently, virtualization has been extended also to the networking domain and canovercome limitations of current cloud environments such as poor isolation and increasedsecurity risks [5]. In this section, we describe the main elements that can be virtualizedin a cloud computing environment.

2.3.1 Computing

The virtualization of computing resources (e.g., CPU and memory) is achieved by servervirtualization technologies (e.g., VMWare, Xen, and QEMU) that allow multiple vir-tual machines (VMs) to be consolidated in a single physical one. The benefits of servervirtualization for cloud computing include performance isolation, improved applicationperformance, and enhanced security.

Cloud computing providers deploy their infrastructure in data centers compris-ing several virtualized servers interconnected by a network. In the IaaS model, VMsare instantiated and allocated to customers (i.e., tenants) on-demand. Server virtualiza-tion adds flexibility to the cloud because VMs can be dynamically created, terminated,copied, and migrated to different locations without affecting existing tenants. In addition,the capacity of a VM (i.e., CPU, memory, and disk) can be adjusted to reflect changes intenants’ requirements without hardware changes.

Cloud operators have flexibility to decide where to allocate VMs in the physicalservers considering diverse criteria such as cost, energy consumption, and performance.In this regard, several VM allocation schemes have been proposed in the literature thatleverage VM flexibility to optimize resource utilization [6–8, 10–12, 20].

2.3.2 Storage

Storage virtualization consists of grouping multiple (possibly heterogeneous) storagedevices that are seen as a single virtual storage space. There are two main abstractions torepresent storage virtualization in clouds: virtual volumes and virtual data objects. The

“9780471697558c02” — 2015/3/20 — 11:05 — page 27 — #5

VIRTUALIZED ELEMENTS 27

virtualization of storage devices as virtual volumes is important in this context because itsimplifies the task of assigning disks to VMs. Furthermore, many implementations alsoinclude the notion of virtual volume pools, which represent different sources of availablevirtualizable storage spaces to allocate virtual volumes from (e.g., separate local physicalvolumes or a remote Network File System or NFS). On the other hand, cloud storage ofvirtual data objects enables scalable and redundant creation and retrieval of data objectsdirectly into/from the cloud. This abstraction is also often accompanied by the conceptof containers, which in general serve to create a hierarchical structure of data objectssimilar to files and folders on any operating system.

Storage virtualization for both volumes and data objects is of utmost importanceto enable the elasticity property of cloud computing. For example, VMs can have theirdisk space adjusted dynamically to support changes in cloud application requirements.Such adjustment is too complex and dynamic to be performed manually and, by virtu-alizing storage, cloud providers offer a uniform view to their users and reduce the needfor manual provisioning. Also, with storage virtualization, cloud users do not need toknow exactly where their data are stored. The details of which disks and partitions con-tain which objects or volumes are transparent to users, which also facilitates storagemanagement for cloud providers.

2.3.3 Networking

Cloud infrastructures rely on local and wide area networks to connect the physicalresources (i.e., servers, switches, and routers) of their data centers. Such networks arestill based on the current IP architecture that has a number of problems. These problemsare mainly related to the lack of isolation, which can allow that one VI or applicationinterferes with another, resulting in poor performance or, even worse, in security prob-lems. Another issue is the limited support for innovation, which hinders the developmentof new architectures that could suit better cloud applications.

To overcome the limitations of current network architectures, virtualization can alsobe extend to the cloud networks. ISP network virtualization has been a hot topic of inves-tigation in recent years [13, 14] and is now being considered in other contexts, such ascloud networking. Similar to virtualized ISP networks, in virtualized cloud networks,multiple virtual networks (VNs) share a physical network and run isolated protocolstacks. A VN is part of a VI that comprises VN nodes (i.e., switches and routers) andvirtual links.

The advantages of virtualization of cloud networks include network performanceisolation, improved security, and the possibility to introduce new protocols and address-ing schemes without disrupting production services. Figure 2.1 shows how virtualizationcan be tackled in cloud network infrastructures. In the substrate layer, physical nodesand links from different network administrative domains serve as a substrate for thedeployment of VNs. Physical nodes, at the core of the physical networks, represent net-work devices (e.g., switches and routers) that internally run virtual (or logical) routersinstantiated to serve VNs’ routing necessities.

In the virtualization layer, virtual nodes and links are created on the top of the sub-strate and combined to build VNs. A VN can use resources from different sources,

“9780471697558c02” — 2015/3/20 — 11:05 — page 28 — #6


Virtualization layer

Substrate layer

Physical link Virtual link

Physical node Virtual node (first level) Virtual node (second level)

Figure 2.1. Virtualized cloud network infrastructure.

including resources from other VNs, which in this case results in a hierarchy of VNs.VNs can also be entirely placed into a single physical node (e.g., physical end-host).In this case, since virtual links are not running on top of any physical counterpart, iso-lation, and performance guarantees should be offered, for example, through memoryisolation and scheduling mechanisms. In another setup, VNs can spread across differ-ent adjacent physical infrastructures (i.e., different administrative domains). In this case,network operators, at the substrate layer, must cooperate to provide a consistent view ofthe underlying infrastructure used by networks from the virtualization layer.

2.3.4 Management

The management of cloud infrastructures plays a key role to allow cloud providers to effi-ciently use the resources and increase revenue. At the virtual level, each VI can operateits own management protocols, resource allocation schemes, and monitoring tools. Forexample, one tenant can use Simple Network Management Protocol (SNMP) to managehis/her VIs, while other can use NETCONF or Web services.

Different resource allocation schemes tailored for specific cloud applications definehow virtual resources are mapped in the data center. Adaptive, application-drivenresource provisioning allows multiple tenants and a large diversity of applications toefficiently share a cloud infrastructure.

Monitoring is another management aspect that can be virtualized. Once a new VI iscreated, a set of monitoring tools need to be configured [15] in order to start monitoringthe computing resources that form the VI. The set of monitoring tool configurations and

“9780471697558c02” — 2015/3/20 — 11:05 — page 29 — #7

VIRTUALIZATION OPERATIONS 29

the corresponding monitored metrics is referred to as a monitoring slice. Every VI iscoupled with a monitoring slice [16]. To monitor the computing resources that form VIs,cloud operators generally use in their monitoring slices tools with native support to cloudplatforms.

2.4 VIRTUALIZATION OPERATIONS

Virtualization operations are structured according to the components described in theprevious section: computing, storage, networking, and management. A non-exhaustivelist of the main virtualization operations derived from existing cloud platforms[9, 17–19, 21] is described next.

• Computing (Virtual Machines)

° Create/Remove: defines/undefines the internal representation of a VM with itsspecified characteristics (e.g., CPU, memory, and guest image).

° Deploy/Undeploy: defines/undefines a VM within the hypervisor of a node ofthe cloud infrastructure, including the transfer/removal of the image file.

° Start/Stop/Suspend/Resume: basic operations to handle the state of the guestoperating system.

° Migrate: undefines a VM in one node and defines it on another. The destinationnode needs to be specified.

° Modify: modifies the attributes of a VM.

° Snapshot: creates a snapshot of a VM.

° Restore: restores a VM from a snapshot.

° List: lists currently deployed VMs.• Computing (Images)

° Create/Remove: defines/undefines a guest operating system image at the mainrepository of the platform, including the transfer/removal of the file.

• Storage (Virtual Volumes)

° Create/Remove: allocates/deletes chunks of storage on nodes.

° Attach Volume to VM: attaches a volume file to a given VM.• Storage (Virtual Volume Pools)

° Create/Remove: defines/undefines a pool for storing virtual volumes (typicallya local or remote/NFS directory).

° Add Volume: adds a virtual volume to a volume pool.• Storage (Virtual Data Objects)

° Create/Remove: allocates/deletes storage for data objects on nodes.

° Upload/Download: transfers the actual data in and out of the cloud environment.

° Stream: sends the content out to the general public, sometimes even adoptingmassive scale distribution employing concepts of Content Delivery Networks(CDNs).

“9780471697558c02” — 2015/3/20 — 11:05 — page 30 — #8


• Storage (Virtual Containers)

° Create/Remove: defines/undefines a container for storing data objects.

° Add Data Object: adds a virtual data object to a container.• Networking (Virtual Links)

° Create/Remove: defines/undefines the internal representation of a virtual linkthat connects point-to-point virtual interfaces of two virtual devices (i.e., VMsor virtual routers).

° Establish/Disable: establishes/disables the virtual link within the network,enabling/disabling traffic to flow between the connected devices.

° Configure: configures additional parameters of a virtual link (e.g., bandwidth).• Networking (Virtual Routers)

° Create/Remove: defines/undefines the internal representation of a virtual routerthat has multiple virtual ports to interconnect multiple virtual interfaces ofvirtual devices.

° Deploy/Undeploy: deploys/undeploys the virtual router into a node of theinfrastructure.

° Add/Edit/Remove Routes: defines/modifies/undefines routes for a virtual router.• Management (Virtual Devices)

° Monitor/Unmonitor: deploys/undeploys the monitoring infrastructure requiredto monitor a given virtual device.

° Get Monitoring Information: fetches monitoring information within the moni-toring system for a given virtual device.

• Management (Events)

° Create/Remove: defines/undefines the internal representation of an event thatbelongs to a specific slice or operates in global scope.

° Deploy/Undeploy: deploys/undeploys the event on the monitoring infrastructureto be triggered on demand.

• Management (Physical)

° Discover Resources: this is actually a collection of operations to discovernodes and network topology available on the infrastructure. This collection alsoretrieves information about resource allocation on these physical elements.

° Get Monitoring Information: fetches monitoring information within the moni-toring system for a given physical device (e.g., node or switch).

2.5 INTERFACES FOR VIRTUALIZATION MANAGEMENT

Today, there are many heterogeneous cloud platforms that support the provisioning ofvirtualized infrastructures under a plethora of different specifications and technologies.Each cloud provider chooses the platform that suits it better or designs its own platformsto provide differentiated services to its tenants. The problem with this heterogeneity is

“9780471697558c02” — 2015/3/20 — 11:05 — page 31 — #9

INTERFACES FOR VIRTUALIZATION MANAGEMENT 31

that it hinders interoperability and causes vendor lock-in for tenants. In order to allow theremote management of virtual elements, many platforms already offer specific interfaces(e.g., Amazon EC2/S3, Elastic Hosts, Flexiscale, Rackspace Cloud Servers, and VMwarevSphere) to communicate with external applications.

To cope with this variety of technologies and support the development of platform-agnostic cloud applications, some proposals use basically two different approaches: (1)employing proxy-style APIs in order to communicate with multiple providers using aset of technology-specific adapters and (2) creating standardized generic interfaces to beimplemented by cloud platforms. The first approach has a drawback of introducing anadditional layer of software in cloud systems, which results in overhead and increasedlatency. Nevertheless, there are libraries and tools that are widely employed, such asApache Deltacloud and Libcloud, which are further discussed in the next section. Thesecond approach, on the other hand, represents a more elegant solution to the problemby proposing some sort of lingua franca to communicate among cloud systems. Theproblem with standardization is to make participants to agree onto the same standard[22]. Ideally, a standardized interface should be open and extensible to allow widespreadadoption by cloud management platforms and application developers. In this section,we review some of the most important efforts towards the definition of open standardinterfaces to support virtualization and interoperability in the cloud.

2.5.1 Open Cloud Computing Interface

The Open Cloud Computing Interface (OCCI) [23] introduces a set of open, commu-nity-driven specifications to deal with cloud service resource management [24]. OCCIis supported by the Open Grid Forum and was originally conceived to create a remotemanagement API for IaaS platforms allowing interoperability for common tasks, such asdeployment, scaling, and monitoring virtual resources. Besides the definition of an openapplication programming interface (API), this specification also introduces a RESTfulProtocol for exchanging management information and actions. The current release ofOCCI is not anymore focused only in IaaS and includes other cloud business models,such as PaaS and SaaS.

The current version of the specification1 is designed to be modular and extensible,thus it is split in three complementary documents. The OCCI Core document (GFD.183)describes the formal definition of the OCCI Core Model. This document also describeshow the core model can be interacted with renderings (including associated behav-iors) and expanded through extensions. The second document is OCCI Infrastructure(GFD.184), which contains the definition of the OCCI infrastructure extension for theIaaS domain. This document also defines additional resource types, their attributes, andactions that each resource type can perform. The third document, OCCI HTTP Render-ing (GFD.185), defines means of interacting with the OCCI Core Model through theRESTful OCCI API. Moreover, this document defines how the OCCI Core Model canbe communicated and serialized over HTTP.

1As of the ending of 2013, the current version of OCCI is v1.1 (release date April 7, 2011)

“9780471697558c02” — 2015/3/20 — 11:05 — page 32 — #10


The OCCI Infrastructure document describes the modeling of virtual resources inIaaS as three basic element types: (1) compute that are information processing resources,(2) storage that are intended to handle information recording, and (3) network repre-senting L2 networking elements (e.g., virtual switches). Also, there is an abstraction forcreation of links between resources. Links can be of two types: i.e., Network Interfaceor Storage Link, depending on the type of resource they connect. It is also possible touse this specification to define Infrastructure Templates, which are predefined virtualresource specifications (e.g., small, medium, and large VM configurations). Moreover,the OCCI HTTP Rendering document complements these definitions by specifying man-agement operations, such as creating, retrieving, updating, and deleting virtual resources.The document also details general requirements for the transmission of information overHTTP, such as security and authentication.

OCCI is currently implemented in many popular cloud management platforms, suchas OpenStack, OpenNebula, and Eucalyptus. There are also base implementations inprogramming languages, such as rOCCI in Ruby and jclouds in Java, and automatedcompliance tests with doyouspeakOCCI. One particular effort aims to improve the inter-cloud networking standardization by proposing an extension to OCCI, called Open CloudNetworking Interface (OCNI) [25]. There is also a reference implementation of OCNIcalled pyOCNI, written as a Python framework including JSON serialization for resourcerepresentation.

2.5.2 Open Virtualization Format

The Open Virtualization Format (OVF) [26], currently in version 2.0.1, was introducedlate in 2008 within the Virtualization Management initiative of the Distributed Man-agement Task Force (DMTF), aiming to provide an open and extensible standard forpackaging and distribution of software to be run in VMs. Its main virtue is to allowportability of virtual appliances onto multiple platforms through so-called OVF Pack-ages, which may contain one or more virtual systems. The OVF standard is not tied toany particular hypervisor or processor architecture. Nevertheless, it is easily extensiblethrough the specification of vendor-specific metadata included in OVF Packages.

OVF Packages are a core concept of the OVF specification, which consist of sev-eral files placed into one directory describing the structure of the packed virtual systems.An OVF Package includes one OVF Descriptor, which is an XML document containingmetadata about the package contents, such as product details, virtual hardware require-ments, and licensing. The OVF Package may also include certificates, disk image files,or ISO images to be attached to virtual systems.

Within an OVF Package, an Envelope Element describes all metadata for the VMsincluded in the package. Among this metadata, a detailed Virtual Hardware Descrip-tion (based on CIM classes) can specify all types of virtual hardware resources requiredby a virtual system. This specification can be abstract or incomplete, allowing the vir-tualization platform to decide how to better satisfy the resource requirements, as longas the required virtual devices are deployed. Moreover, OVF Environment informationcan be added to define how the guest software and the deployment platform interact.

“9780471697558c02” — 2015/3/20 — 11:05 — page 33 — #11

INTERFACES FOR VIRTUALIZATION MANAGEMENT 33

This environment allows the guest software to access information about the deploymentplatform, such as the values specified for the properties defined in the OVF Descriptor.

This standard is present in many hypervisor implementations and has shown to bevery useful for migrating virtual systems information among many hypervisors or plat-forms, since it allows precise description of VMs and virtual hardware requirements.However, it is not within the objectives of OVF to provide detailed specification for com-plete VIs (i.e., detailing interconnections, communication requirements, and networkelements).

2.5.3 Cloud Infrastructure Management Interface

The Cloud Infrastructure Management Interface (CIMI) [27] standard is another DMTFproposal within the context of the Cloud Management initiative. This standard definesa model and a protocol for managing interactions between cloud IaaS providers andtenants. CIMI’s main objective is to provide tenants with access to basic managementoperations on IaaS resources (VMs, storage, and networking), facilitating portabilitybetween different cloud implementations that support this standard. CIMI also specifies aRESTful protocol over HTTP using both JSON or XML formats to represent informationand transmit management operations.

The model defined in CIMI includes basic types of virtualized resources, whereMachine Resources are used to represent VMs, Volume Resources for storage, and Net-work Resources for VN devices and ports. Besides, CIMI also defines a Cloud EntryPoint type of resource, which represents a catalog of virtual resources that can be queriedby a tenant. A System Resource in this standard gathers one or more Network, Volume,or Machine Resources, and can be operated as a single resource. Finally, a MonitoringResource is also defined to track progress of operations, metering, and monitoring ofother virtual resources.

The protocol relies on basic HTTP operations (i.e., PUT, GET, DELETE, HEAD,and POST) and uses either JSON or XML to transmit the message body. To manipulatevirtual resources, there are four basic create, read, update, and delete (CRUD) opera-tions. It is also possible to extend the protocol by creating or customizing operations tomanipulate the state of each particular resource. Moreover, the CIMI specification canalso be integrated with OVF, in which case VMs represented as OVF Packages can beused to create Machine Resources or System Resources.

Today, implementations of the CIMI standard are not so commonly found as OCCIor OVF are. One specific implementation that is worth noting is found within the ApacheDeltacloud2 project, which exposes a CIMI REST API to communicate with externalapplications supporting manipulation of Machine and Volume Resources abstractions.

2.5.4 Cloud Data Management Interface

The Cloud Data Management Interface (CDMI) [28] is a standard specifically targetedto define an interface to access cloud storage and to manage data objects. CDMI is

2 http://deltacloud.apache.org/cimi-rest.html

“9780471697558c02” — 2015/3/20 — 11:05 — page 34 — #12


comparable to Amazon’s S3 [29], with the fundamental difference that it is conceivedby the Storage Networking Industry Association (SNIA) to be an open standard targetedfor future ANSI and ISO certification. This standard also includes a RESTful API run-ning over HTTP to allow accessing capabilities of cloud storage systems, allocating andmanaging storage containers and data objects, handling users and group access rights,among other operations.

The CDMI standard defines a JSON serializable interface to manage data stored inclouds based on several abstractions. Data objects are fundamental storage componentsanalogous to files within a file system, which include metadata and value (contents).Container objects are intended to represent grouping of data, analogous to directories inregular file systems; this abstraction links together zero or more Data objects. Domainobjects represent the concept of administrative ownership of data stored within cloudsystems. This abstraction is very useful to facilitate billing, to restrict management oper-ations to groups of objects, and to represent hierarchies of ownership. Queue objectsprovide first-in, first-out access to store or retrieve data from the cloud system. Queu-ing provides a simple mechanism for controlling concurrency when reading and writingData objects in a reliable way. To facilitate interoperability, this standard also includesmechanisms for exporting data to other network storage platforms, such as iSCSI, NFS,and WebDAV.

Regarding implementations, CDMI is also not so commonly deployed in most pop-ular cloud management platforms. SNIA’s Cloud Storage Technical Working Group(TWG) provides a Reference Implementation for the standard, which is currently a work-ing draft and provides support only for version 1.0 of the specification. Some independentprojects, such as CDMI add-on for OpenStack Swift and the CDMI-Serve in Python,have implemented basic support for the CDMI standard but do not present much recentactivity.

Besides all the aforementioned efforts to create new standardized interfaces forvirtual resource management in cloud environments, other approaches, protocols, andmethods have been studied and may be of interest in particular situations [30]. Moreover,many organizations, such as OASIS, ETSI, ITU, NIST, and ISO, are currently engagedwith their cloud and virtualization related working groups on developing standards andrecommendations. We recommend the interested reader to look at DMTF’s maintainedwiki page Cloud-Standards.org3 to keep track of future standardization initiatives.

2.6 TOOLS AND SYSTEMS

In this section, we list some of the most important efforts currently targeted to build toolsand systems for virtualization and cloud resource management. Initially, we describeopen source cloud management platforms, which are in fact complete solutions to deployand operate private, public, or hybrid clouds. Afterwards, we discuss some tools and

3http://cloud-standards.org/

“9780471697558c02” — 2015/3/20 — 11:05 — page 35 — #13

TOOLS AND SYSTEMS 35

libraries to perform specific operations for virtual resource management and cloudintegration.

2.6.1 Open Source Cloud Management Platforms

2.6.1.1 Eucalyptus. Eucalyptus started as a research project in the ComputerScience Department at the University of California, Santa Barbara, in 2007, within aproject called Virtual Grid Application Development Software Project (VGrADS) fundedby the National Science Foundation. This is one of the first open source initiatives to buildcloud management platforms that allow users to deploy their own private clouds [18].Currently, Eucalyptus is in version 3.4 and comprises full integration with Amazon WebServices (AWS)—including EC2, S3, Elastic Block Store (EBS), Identity and AccessManagement (IAM), Auto Scaling, Elastic Load Balancing (ELB), and CloudWatch—enabling both private and hybrid cloud deployments.

Eucalyptus architecture is based on four high-level components: (1) Node Controllerexecutes at hosts and is responsible for controlling the execution of VM instances; (2)Cluster Controller works as a front-end at the cluster-level (i.e., Availability Zone) man-aging VM execution and scheduling on Node Controllers, controls cluster-level SLAs,and also manages VNs; (3) Storage Controller exists both at cluster-level and at cloud-level (Walrus) and implements a put/get SaaS solution based on Amazon’s S3 interface,providing a mechanism for storing and accessing VM images and user data; and (4) CloudController is the entry-point into the cloud for users and administrators, it implements anEC2-compatible interface and coordinates other components to perform high-level tasks,such as authentication, accounting, reporting, and quota management.

For networking, Eucalyptus offers four operating modes: (1) Managed, in which theplatform manages layers 2 and 3 VM isolation, employing a built-in DHCP service. Thismode requires a switch to forward a configurable range of VLAN-tagged packets; (2)Managed (no VLAN), in which only layer 3 VM isolation is possible; (3) Static, wherethere is no VM isolation, employs a built-in DHCP service for static IP assignment; and(4) System, where there is also no VM isolation and, in this case, no automatic addresshandling since Eucalyptus will rely on an existing external DHCP service. In version4.0, released in April 2014, Eucalyptus has introduced new functionality for networkingsupport through a new Edge Networking Mode.

The main technical characteristics of the Eucalyptus platform as the following:

• Programming Language: Written mostly in C and Java• Compatibility/Interoperability: Fully integrated with AWS• Supported Hypervisors: vSphere, ESXi, KVM, any AWS-compatible clouds• Identity Management: Role-Based Access Control mechanisms with Microsoft

Active Directory or LDAP systems• Resource Usage Control: resource quotas for users and groups• Networking: Basic support with four operating modes• Monitoring: CloudWatch

“9780471697558c02” — 2015/3/20 — 11:05 — page 36 — #14


• Version/Release: 3.4.1 (Released on December 16, 2013)• License: GPL v3.0

2.6.1.2 OpenNebula. In its early days OpenNebula was a research project atthe Universidad Complutense de Madrid. The first version of the platform was releasedunder an open source license in 2008 within the European Union’s Seventh FrameworkProgramme (FP7) project called RESERVOIR—Resources and Services Virtualizationwithout Barriers (2008–2011). Nowadays, OpenNebula (version 4.4 Retina releasedDecember 3, 2013) is a feature-rich platform used mostly for the deployment of pri-vate clouds, but is also capable of interfacing with other systems to work as hybrid orpublic cloud environment.

OpenNebula is conceptually organized in a three-layered architecture [31]. At thetop, the Tools layer comprises higher level functions, such as cloud-level VM scheduling,providing CLI and GUI access for both users and administrators, managing and sup-porting multi-tier services, elasticity and admission control, and exposing interfaces toexternal clouds through AWS and OCCI. At the Core layer, vital functions are performed,such as accounting, authorization, and authentication, as well as resource managementfor computing, storage, networking, and VM images. Also at this layer, the platformimplements resource monitoring by retrieving information available from hypervisors togather updated status of VMs and manages federations, enabling access to remote cloudinfrastructures, which can be either partner infrastructures governed by a similar platformor public cloud providers. At the bottom, the Drivers layer implements infrastructure andcloud drivers to provide an abstraction to communicate with the underlying devices orto enable access to remote cloud providers.

OpenNebula allows administrators to set up multiple zones and create federated VIsconsidering different federation paradigms (e.g., cloud aggregation, bursting, or bro-kering), in which case each zone operates their network configurations independently.From the user’s viewpoint, setting up a network in the OpenNebula platform is restrictedto the creation of a DHCP IP range that will be automatically configured in each VM.The administrator can change the way VMs connect to the physical ports of the hostmachine using one of many options, that is, VLAN 802.1Q to allow isolation, EBtablesand Open vSwitch to permit implementation of traffic filtering, and VMware VLANs,which isolate VMs running over VMware hypervisor. It is also possible to deploy Vir-tual Routers from OpenNebula’s Marketplace to work as an actual router, DHCP, or DNSserver.

The main technical characteristics of the OpenNebula platform as the following:

• Programming Language: C++ (Integration APIs in Ruby, JAVA, and Python)• Compatibility/Interoperability: AWS, OCCI, and XML-RPC API• Supported Hypervisors: KVM, Xen, and VMWare• Identity Management: Sunstone, EC2, OCCI, SSH, x509 certificates, and LDAP• Resource Usage Control: resource quotas for users and groups• Networking: IP/DHPC ranges customizable by users, many options for adminis-

trator require manual configuration

“9780471697558c02” — 2015/3/20 — 11:05 — page 37 — #15


• Monitoring: Internal, gathers information from hypervisors• Version/Release: 3.4.1 (Released on December 3, 2013)• License: Apache v2.0

2.6.1.3 OpenStack. OpenStack started as a joint project between RackspaceHosting and NASA around mid 2010, aiming to provide a cloud-software solution torun over commodity hardware [19]. Right after the first official release (beginning of2011), OpenStack was quickly adopted and packed within many Linux distributions,such as Ubuntu, Debian, and Red Hat. Today, it is the cloud management platform withthe most active community counting on more than 13,000 registered people from over130 countries. OpenStack is currently developed in nine parallel core projects (plus fourincubated) all coordinated by the OpenStack Foundation, which is embodied by 9,500individuals and 850 different organizations.

The OpenStack architecture consists of a myriad of interconnected components,each one developed under a separate project, to deliver a complete cloud infrastructuremanagement solution. Initially, only two components were present, Compute (Nova) andObject Storage (Swift), which respectively provide functionality for handling VMs and ascalable redundant object storage system. Adopting an incremental approach, incubated/community projects were gradually included in the core architecture, such as Dashboard(Horizon) to provide administration GUI access, Identity Service (Keystone) to supporta central directory of users mapped to services, and Image Service (Glance) to allowdiscovery, registration, and delivery of disk and server images. The current release ofOpenStack (Havana) includes advanced network configuration with Neutron, persistentblock-level storage with Cinder, a single point of contact for billing systems throughCeilometer, and a service to orchestrate multiple composite cloud applications via Heat.

As for networking, a community project called Quantum started in April 2011 andwas targeted to further develop the networking support of OpenStack by employing VNoverlays in a Connectivity as a Service perspective. From release Folsom on, Quantumwas added as a core project and renamed Neutron. Currently, this component lets admin-istrators to employ from basic networking configuration of IP addresses, allowing bothdedicated static address assignment and DHCP, to complex configuration with software-defined networking (SDN) technology like OpenFlow. Moreover, Neutron allows theaddition of plug-ins to introduce more complex functionality to the platform, such asquality of service, intrusion detection systems, load balancing, firewalls, and virtualprivate networks.

• Programming Language: Python• Compatibility/Interoperability: Nova and Swift are feature-wise compatible to

EC2 and S3 (applications need to be adapted though), OCCI support (underdevelopment)

• Supported Hypervisors: QEMU/KVM over libvirt (fully supported), VMware andXenAPI (partially supported), many others at nonstable development stages

• Identity Management: Local database, EC2/S3, RBAC, token-based, SSL, x509or PKI certificates, and LDAP

“9780471697558c02” — 2015/3/20 — 11:05 — page 38 — #16


• Resource Usage Control: configurable quotas per user (tenant) defined by eachproject

• Networking: several options via Neutron component, extensible with plug-ins• Monitoring: simple customizable dashboard relies on information provided by

other components• Version/Release: Havana (Released on October 17, 2013)• License: Apache v2.0

2.6.1.4 CloudStack. CloudStack started as a project from a startup companycalled VMOps in 2008, later renamed Cloud.com, and was first released as open sourcein mid 2010. After Cloud.com was acquired by Citrix, CloudStack was relicensed toApache 2.0 and incubated by the Apache Software Foundation in April 2012. Ever since,the project has developed a powerful cloud platform to orchestrate resources in highlydistributed environments for both private and public cloud deployments [21].

CloudStack deployments are organized into two basic building blocks, a Manage-ment Server and a Cloud Infrastructure. The Management Server is a central point ofconfiguration for the cloud (these servers might be clustered for reliability reasons). Itprovides a Web user interface and API access, manages the assignment of guest VMs tohosts, allocates public and private IP addresses to particular accounts, manages images,among other tasks. A Cloud infrastructure comprises distributed Zones (typically, datacenters) hierarchically organized into Pods, Clusters, Hosts, Primary and Secondary Stor-age. A CloudStack Cloud Infrastructure may also optionally include Regions (perhapsgeographically distributed), to aggregate multiple Zones, and each Region is controlledby a different set of Management Servers, turning the platform into a highly distributedand reliable system. Moreover, a separate Python tool called CloudMonkey is availableto provide CLI and shell environments for interacting with CloudStack-based clouds.

CloudStack offers two types of networking configurations: (1) Basic, which is anAWS-style networking providing a single network where guest isolation can be achievedthrough layer 3 means, such as security groups and (2) Advanced, where more sophis-ticated network topologies can be created. CloudStack also offers a variety of NaaSfeatures, such as creation of VPNs, firewalls, and load balancers. Moreover, this toolprovides the ability to create a Virtual Private Cloud, which is a private, isolated part ofCloudStack that can have its own VN topology. VMs in this VN can have any privateaddresses since they are completely isolated from others.

• Programming Language: Mostly Java• Compatibility/Interoperability: CloudStack REST API (XML or JSON)• Supported Hypervisors: XenServer/XCP, KVM, and/or VMware ESXi with

vSphere• Identity Management: Internal or LDAP• Resource Usage Control: Usage server separately installed provides records for

billing, resource limits per project• Networking: two operating modes, several networking as a service options in

advanced configurations

“9780471697558c02” — 2015/3/20 — 11:05 — page 39 — #17


• Monitoring: some performance indicators available through the API are displayedto users and administrators

• Version/Release: 4.2.0 (Released on October 1, 2013)• License: Apache v2.0

2.6.2 Specific Tools and Libraries

The following describes some tools and libraries mainly designed to deal with the diver-sity of technologies involved in cloud virtualization. Unlike cloud platforms, these toolsdo not intend to offer a complete solution for cloud providers. Nevertheless, they play akey role in integration and allow applications to be written in a more generic manner interms of virtual resource management.

2.6.2.1 Libcloud. Libcloud is a client Python library for interacting with themost popular cloud management platforms [32]. This library originally started beingdeveloped within Cloudkick (extinct cloud monitoring software project, now part ofRackspace) and today is an independent free software project licensed under the ApacheLicense 2.0. The main idea behind Libcloud is to create a programming environmentto facilitate developers on the task of building products that can be ported across a widerange of cloud environments. Therefore, much of the library is about providing a long listof drivers to communicate with different cloud platforms. Currently, Libcloud supportsmore than 26 different providers, including Amazon’s AWS, OpenStack, OpenNebula,and Eucalyptus, just to mention a few.

Moreover, this library also provides a unified Python API, offering a set of com-mon operations to be mapped to the appropriate calls to the remote cloud system. Theseoperations are divided into four abstractions: (1) Compute, which enables operationsfor handling VMs (e.g., list/create/reboot/destroy VMs) and its extension Block Stor-age to manage volumes attached to VMs (e.g., create/destroy volumes, attach volumeto VM); (2) Load Balancer, which includes operations for the management of load bal-ancers as a service (e.g., create/list members, attach/detach member or compute node)and is available in some providers; (3) Object Storage, which offers operations for creat-ing an environment for handling data objects in a cloud (list/create/delete containers orobjects, upload/download/stream object) and its extension for CDNs to assist providersthat support these operations (e.g., enable CDN container or object, get CDN containeror object URL); and (4) Domain Name System (DNS), which allows management oper-ations for DNS as a service (e.g., list zones or records, create/update zone or record) inproviders that support it, such as Rackspace Cloud DNS.

2.6.2.2 Deltacloud. Deltacloud follows a very similar philosophy as comparedto Libcloud. It is also an Apache Software Foundation project—left incubation inOctober 2011 and is now a top-level project—and is similarly targeted to provide an inter-mediary layer to let applications communicate with several different cloud managementplatforms. Nevertheless, instead of providing a programming environment through a spe-cific programming language, Deltacloud enables management of resources in different

“9780471697558c02” — 2015/3/20 — 11:05 — page 40 — #18


clouds by the use of one of three supported RESTful APIs [33]: (1) Deltacloud classic,(2) DMTF CIMI, (3) Amazon’s EC2.

Deltacloud implements drivers for more than 20 different providers and offers sev-eral operations divided into two main abstractions: (1) Compute Driver, which includesoperations for managing VMs, such as create/start/stop/reboot/destroy VM instances, listall/get details about hardware profiles, realms, images, and VM instances; and (2) Stor-age Driver, providing operations similar to Amazon S3 to manage data objects stored inclouds, such as create/update/delete buckets (analogous to folders), create/update/deleteblobs (analogous to data files), and read/write blobs data and attributes.

2.6.2.3 Libvirt. Libvirt is a toolkit for interacting with multiple virtualizationproviders/hypervisors to manage virtual compute, storage, and networking resources.It is a free collection of software available under GNU LGPL and is not particularlytargeted to cloud systems. Nevertheless, Libvirt has shown to be very useful to handlelow level virtualization operations and is actually used under the hood by cloud platformslike OpenStack to interface with some hypervisors. Libvirt supports several hypervisors(e.g., KVM/QEMU, Xen, VirtualBox, and VMware), creation of VNs (e.g., bridgingor NAT), and storage on IDE, SCSI, and USB disks and LVM, iSCSI, and NFS filesystems. It also provides remote management using TLS encryption, x509 certificates,and authentication through Kerberos or SASL.

Libvirt provides a C/C++ API with bindings to several other languages, such asPython, Java, PHP, Ruby, and C#. This API includes operations for managing virtualresources as well as retrieving information and capabilities from physical hosts andhypervisors. Virtual resource management operations are divided into three abstractions:(1) Domains, which are common VM-related operations, such as create, start, stop, andmigrate; (2) Storage, for managing block storage volumes or pools; and (3) Network,which includes operations such as, creating bridges, connecting VMs to these bridges,enabling NAT and DHCP. Note that network operations are all performed within thescope of a single physical host, that is, it is not possible to connect two VMs in separatehosts to the same bridged network, for example.

2.7 CHALLENGES

Because virtualization management in the cloud is still in its infant days, importantchallenges are in place. In this section, we list key challenges that can guide future devel-opments in the virtualization management area. We also mention some ongoing researchin the field.

2.7.1 Scalability

Although the benefits of virtualization enables the cloud model, from the manage-ment perspective, virtualization impacts the scalability of management solutions. The

“9780471697558c02” — 2015/3/20 — 11:05 — page 41 — #19

CHALLENGES 41

transition from the traditional management of physical infrastructures to virtual one is notsmooth in terms of scale because few physical devices can host a much larger number ofvirtual device, each one requiring management actions. The number of management ele-ments immediately explodes because such number not only duplicates but is proportionalto the number of virtual devices each physical one supports. Traditional managementapplications have not been conceived to support a so drastic increase in the number ofelements, and as a consequence, such solutions do not scale.

Novel management approaches need to be considered, or traditional approachesneed to be adapted (if possible) to the cloud context and scale. The problem is also exacer-bated because the managed environments (i.e., clouds) are much more dynamic, havingnew elements created very quickly, while older elements can be destroyed frequentlytoo. Virtual servers can go up and down (even forever) quite fast, which is unusual fortraditional management solutions. Adaptation is then required not only because of thenew scales of cloud environments but also because they are much more dynamic thantraditional IT infrastructures. Very distributed solutions have to be investigated, like theusage of peer-to-peer for management [34]. Autonomic management also becomes analternative, in order to reduce human intervention as much as possible [35].

2.7.2 Monitoring

Monitoring is a permanent challenging task in the cloud because of the large numberof resources in production cloud data centers. Centralized monitoring approaches sufferfrom low scalability and resilience. Cooperative monitoring [36] and gossiping [37] aimto overcome these limitations by enabling distributed and robust monitoring solutions forlarge scale environments. The goal is to minimize the negative impact of managementtraffic on the performance of the cloud. At the same time, finding a scalable solution foraggregating relevant monitoring information without hurting accuracy is a challenge thatneeds to be tackled by monitoring tools designed specifically for cloud data centers.

Usually, monitoring generates more management data than other activities. Withvirtualization in the cloud, and the aforementioned scalability issues, the overwhelm-ing amount of monitoring data can hinder proper observation of the cloud environment.As such, monitoring considering big data techniques may be a possible path to follow.Compressing of data structures [38], for example, can be convenient to find a reasonablebalance between amount of data and analysis precision.

2.7.3 Management Views

Since cloud computing creates an environment with different actors with particular man-agement roles, such different actors need different management views. Operators of acloud infrastructure need to have a broader, possibly complete view of the physical infras-tructure, but should be prevented of accessing management information that are solelyrelated to a tenant application, because of privacy issues. Cloud tenants also need to haveaccess to management information related to their rented VI, but must also be isolated

“9780471697558c02” — 2015/3/20 — 11:05 — page 42 — #20


from accessing management information of both other tenants and physical infrastructureoperator.

Different management views are already supported in traditional solutions, but inthe case of cloud environments, trust relationships between cloud provider and tenantsbecome more apparent. Since the management software runs in the cloud itself, tenantsaccessing their management view need to trust the cloud provider assuming that sensi-tive information is not available to the cloud operator. Tenants can also employ their ownmanagement system operating at his/her local IT infrastructure. In this case, manage-ment interfaces and protocols that connect the tenant management solution and remotemanaged virtual elements need to be present.

2.7.4 Energy Efficiency

Efficient energy management aims to reduce the operational cost of cloud infrastructures.A challenge in optimal energy consumption is to design energy-proportional data centerarchitectures, where energy consumption is determined by server and network utilization[39, 40]. ElasticTree [39], for example, attempts to achieve energy proportionality bydynamically powering off switches and links. In this respect, cloud network virtualizationcan further contribute to reduce power consumption through network consolidation (e.g.,through VN migration [41]).

Minimizing energy consumption, however, usually comes with the price of perfor-mance degradation. Energy efficiency and performance is often conflicting, represent-ing a tradeoff. Thus, designing energy-proportional data center architectures factoringin cloud virtualization, and finding good balance between energy consumption andperformance are interesting research questions.

2.7.5 Fault Management

Detection and handling of failures are requirements of any cloud, especially because incloud environments failures of a single physical resource can potentially affect multiplecustomers’ virtual resources. Because failures also tend to propagate, the damage causedby a faulty cloud physical device impacts much more severely the cloud business. Inadditional, the lack of faults in physical devices is not always a synonym that there is notfaulty virtual devices. As such, the traditional fault management needs to be expandedto consider faulty virtual devices too.

Most existing architectures rely on reactive failure handing approaches. One draw-back is the potentially long response time, which can negatively impact applicationperformance. Ideally, fault management should be implemented in a proactive manner,where the management system predicts the occurrence of failures and acts before theyoccur. In practice, proactive fault management is often ensured by means of redundancy,for example, provisioning backup paths. As such, offering high reliability without incur-ring excessive costs or energy consumption is a problem requiring further exploration.

“9780471697558c02” — 2015/3/20 — 11:05 — page 43 — #21

CHALLENGES 43

2.7.6 Security

Security issues are challenging in the context of cloud virtualization because of thecomplex interactions between tenants and cloud providers. Although the virtualizationof both servers and networks can improve security (e.g., limiting information leak-age, avoiding the existence of side channels, and minimizing performance interferenceattacks), today’s virtualization technologies are still in their infancy in terms of secu-rity. In particular, various vulnerabilities in server virtualization technologies, such asVMWare [42], Xen [43], and Microsoft Virtual PC and Virtual Server [44] have beenrevealed in the literature. Similar vulnerabilities are likely to occur in programmablenetwork components too. Thus, not only network virtualization techniques give no guar-anteed protection from existing attacks and threats to physical and VNs, but also lead tonew security vulnerabilities. For example, an attack against a VM may lead to an attackagainst a hypervisor of a physical server hosting the VM, subsequent attacks againstother VMs hosted on that server, and eventually, all VNs sharing that server [45]. Thisraises the issue of designing secure virtualization architectures immune to these securityvulnerabilities.

In addition to mitigating security vulnerabilities related to virtualization technolo-gies, there is a need to provide monitoring and auditing infrastructures, in order to detectmalicious activities from both tenants and cloud providers. It is known that data centernetwork traffic exhibits different characteristics than the traffic of traditional data net-works [46]. Thus, appropriate mechanisms may be required to detect network anomalies.On the other hand, auditability in cloud virtualization should be mutual between tenantsand cloud providers to prevent malicious behaviors from either party. However, there isoften an overhead associated with such infrastructures, especially in large-scale clouds.In Ref. [47], the authors showed that it is a challenge to audit Web services in cloudenvironments without deteriorating application performance. Much work remains to bedone on designing scalable and efficient mechanisms for monitoring and auditing cloudvirtualization.

2.7.7 Cloud Federations

The federation of virtualized infrastructures from multiple cloud providers enablesaccess to larger scale infrastructures. This is already happening with virtualized net-work testbeds, allowing researchers to conduct realistic network experiments at largescale, which would not have been possible otherwise. ProtoGENI [48] is an example offederation that allows cooperation among multiple organizations. However, guarantee-ing predictable performance for participating entities through SLA enforcement has notbeen properly addressed by current solutions and remains an open issue.

Cloud federations cannot be considered a wide reality in the cloud marketplace.Competition possibly prevents cloud providers to cooperate among one another in feder-ated environments, but the lack of proper technologies devoted to materialize federationsof clouds certainly does not improve the current situation either. As in other areas, solu-tions to federate different resources already exist, but an integrated, global solution to

“9780471697558c02” — 2015/3/20 — 11:05 — page 44 — #22


support federating heterogeneous resources between heterogeneous cloud providers alsoneeds further investigation.

2.7.8 Standard Management Protocols and Information Models

The VR-MIB module [49] described a set of SNMP management variables for the man-agement of physical routers with virtualization support. However, it did not progressin the IETF standardization track. More recently, the VMM-MIB module [50] is pro-gressing, but it is limited to manage virtualization of servers (network devices are notexplicitly considered); VMM-MIB is also devoted mainly for monitoring, and configura-tion is weakly supported. In general, the situation of SNMP-based management solutionsfor cloud environments are still weak.

Other existing management protocols are considered. NETCONF [51], for example,would be more appropriate for configuration aspects, while NetFlow/IPFIX [52] could beexpanded for virtual router monitoring. The WS-Management [53] suite, in turn, is moreappropriate for server management. A myriad of proprietary solutions is also present inthe market, but the large diversity of management interfaces and protocols forces cloudoperators to deal with too many different technologies. Although a protocol that fits everyneed is unlikely to exist or be largely accepted/adopted, there is a clear lack in this areatoday, which represents an interesting opportunity for research and standardization.

REFERENCES

1. M. Chowdhury, M.R. Rahman, and R. Boutaba. ViNEYard: Virtual Network EmbeddingAlgorithms with Coordinated Node and Link Mapping. IEEE/ACM Transactions on Networ-king, 20(1):206–219, February. 2012.

2. M. Yu, Y. Yi, J. Rexford, and M. Chiang. Rethinking Virtual Network Embedding: SubstrateSupport for Path Splitting and Migration. ACM Computer Communication Review, 38(2):17–29, April 2008.

3. X. Cheng, S. Su, Z. Zhang, K. Shuang, F. Yang, Y. Luo, and J. Wang. Virtual Network Embed-ding Through Topology Awareness and Optimization. Computer Networks, 56(6):1797–1813,2012.

4. A. Lenk, M. Klems, J. Nimis, S. Tai, and T. Sandholm. What’s Inside the Cloud? An Archi-tectural Map of the Cloud Landscape. In Proceedings of the 2009 ICSE Workshop on SoftwareEngineering Challenges of Cloud Computing (CLOUD ’09), pages 23–31, Washington, DC,2009. IEEE Computer Society.

5. M. F. Bari, R. Boutaba, R. Esteves, L. Granville, M. Podlesny, M. Rabbani, Q. Zhang, andF. Zhani. Data Center Network Virtualization: A Survey. IEEE Communications Surveys andTutorials, 15(2):909–928, 2012.

6. Q. Zhu and G. Agrawal. Resource Provisioning with Budget Constraints for AdaptiveApplications in Cloud Environments. In Proceedings HPDC 2010, Chicago, IL, 2010.

7. M. E. Frincu and C. Craciun. Multi-objective Meta-heuristics for Scheduling Applicationswith High Availability Requirements and Cost Constraints in Multi-Cloud Environments. InProceedings of the Fourth IEEE International Conference on Utility and Cloud Computing(UCC), pages 267–274, Victoria, NSW, December 2011.

“9780471697558c02” — 2015/3/20 — 11:05 — page 45 — #23

REFERENCES 45

8. J. Rao, X. Bu, K. Wang, and C.-Z. Xu. Self-adaptive Provisioning of Virtualized Resources inCloud Computing. In Proceedings SIGMETRICS 2011, 2011.

9. OpenNebula. The Open Source Solution for Data Center Virtualization, 2008. http://www.opennebula.org. Accessed on November 2013.

10. J. Rao, Y. Wei, J. Gong, and C.-Z. Xu. DynaQoS: Model-free Self-Tuning Fuzzy Control ofVirtualized Resources for QoS Provisioning. In 19th IEEE International Workshop on Qualityor Service (IWQoS), pages 1–9, San Jose, CA, June 2011.

11. J. Rao, X. Bu, C.-Z. Xu, and K. Wang. A Distributed Self-learning Approach for Elastic Provi-sioning of Virtualized Cloud Resources. In Proceedings IEEE MASCOTS 2011, pages 45–54,Singapore, July 2011.

12. J. Z. W. Li, M. Woodside, J. Chinneck, and M. Litoiu. CloudOpt: Multi-goal Optimization ofApplication Deployments across a Cloud. In Proceedings CNSM 2011, pages 1–9, October2011.

13. A. Khan, A. Zugenmaier, D. Jurca, and W. Kellerer. Network Virtualization: A Hypervisor forthe Internet? IEEE Communications Magazine, 50(1):136–143, January 2012.

14. N. Chowdhury and R. Boutaba. Network Virtualization: State of the Art and ResearchChallenges. IEEE Communications Magazine, 47(7):20–26, July 2009.

15. J. Montes, A. Sánchez, B. Memishi, M. S. Pérez, and G. Antoniu. GMonE: A CompleteApproach to Cloud Monitoring. Future Generation Computer Systems, 29(8):2026–2040,2013.

16. M. Carvalho, R. Esteves, G. Rodrigues, L. Z. Granville, and L. M. R. Tarouco. A Cloud Mon-itoring Framework for Self-Configured Monitoring Slices Based on Multiple Tools. In 9thInternational Conference on Network and Service Management 2013 (CNSM 2013), pages180–184, Zürich, Switzerland, October 2013.

17. Amazon. Amazon elastic compute cloud (Amazon EC2), 2013. http://aws.amazon.com/ec2/.Accessed on May. 2013.

18. Eucalyptus. The Open Source Cloud Platform, 2009. http://open.eucalyptus.com. Accessedon November 2013.

19. Rackspace Cloud Computing. OpenStack Cloud Software, 2010. http://openstack.org.Accessed on December 2013.

20. S. Islam, J. Keung, K. Lee, and A. Liu. Empirical Prediction Models for Adaptive ResourceProvisioning in the Cloud. Future Generation Comp. Syst., 28(1):155–162, January 2012.

21. Apache Software Foundation. Apache CloudStack: Open Source Cloud Computing, 2012.http://cloudstack.apache.org. Accessed on December 2013.

22. S. Ortiz. The Problem with Cloud-Computing Standardization. IEEE Computer, 44(7):13–16,2011.

23. Open Grid Forum. Open Cloud Computing Interface, 2012. http://occi-wg.org/. Accessed onSeptember 2012.

24. A. Edmonds, T. Metsch, A. Papaspyrou, and A. Richardson. Toward an Open Cloud Standard.Internet Computing, IEEE, 16(4):15–25, 2012.

25. H. Medhioub, B. Msekni, and D. Zeghlache. OCNI—Open Cloud Networking Interface. In22nd International Conference on Computer Communications and Networks (ICCCN), pages1–8, Nassau, Bahamas, 2013.

26. Distributed Management Task Force (DMTF). Open Virtualization Format (OVF)Specification—Version 2.0.1, Ago 2013. http://dmtf.org//standards/cloud. Accessed onDecember 2013.

“9780471697558c02” — 2015/3/20 — 11:05 — page 46 — #24


27. Distributed Management Task Force (DMTF). Cloud Infrastructure Management Interface(CIMI)—Version 1.0.0, 2013. http://dmff.org/standards/cloud. Accessed on May 2013.

28. Storage Networking Industry Association (SNIA). Cloud Data Management Interface(CDMI)—version 1.0.2, June 2012. http://www.snia.org/cdmi. Accessed on December 2013.

29. Storage Networking Industry Association (SNIA). S3 and CDMI: A CDMI Guide for S3Programmers—version 1.0, May 2013. http://www.snia.org/cdmi. Accessed on December2013.

30. R. P. Esteves, L. Z. Granville, and R. Boutaba. On the Management of Virtual Networks. IEEECommunications Magazine, 51(7):80–88, 2013.

31. R. Moreno-Vozmediano, R. S. Montero, and I. M. Llorente. IaaS Cloud Architecture: FromVirtualized Datacenters to Federated cloud infrastructures. IEEE Computer, 45(12):65–72,2012.

32. Apache Software Foundation. Apache Libcloud a Unified Interface to the Cloud, 2012. http://libcloud.apache.org/. Accessed on December 2013.

33. Apache Software Foundation. Apache DeltaCloud an API that Abstracts the Differencesbetween Clouds, 2011. http://deltacloud.apache.org/. Accessed on December 2013.

34. L. Z. Granville, D. M. da Rosa, A. Panisson, C. Melchiors, M. J. B. Almeida, andL. M. Rockenbach Tarouco. Managing Computer Networks Using Peer-to-Peer Technologies.Communications Magazine, IEEE, 43(10):62–68, 2005.

35. C. C. Marquezan and L. Z. Granville. On the Investigation of the Joint Use of Self-* Propertiesand Peer-to-Peer for Network Management. In 2011 IFIP/IEEE International Symposium onIntegrated Network Management (IM), pages 976–981, 2011.

36. K. Xu and F. Wang. Cooperative Monitoring for Internet Data Centers. In IEEE InternationalPerformence, Computing and Communications Conference (IPCC), pages 111–118, Austin,TX, December 2008.

37. F. Wuhib, M. Dam, R. Stadler, and A. Clemm. Robust Monitoring of Network-Wide Aggre-gates through Gossiping. IEEE Transactions on Network and Service Management, 6(2):95–109, 2009.

38. L. Quan, J. Heidemann, and Y. Pradkin. Trinocular: Understanding Internet ReliabilityThrough Adaptive Probing. In Proceedings of the ACM SIGCOMM 2013 Conference onSIGCOMM (SIGCOMM ’13), pages 255–266, Hong Kong, China, 2013. ACM, New York.

39. B. Heller, S. Seetharaman, P. Mahadevan, Y. Yiakoumis, P. Sharma, S. Banerjee, and N. McK-eown. ElasticTree: Saving Energy in Data Center Networks. In Proceedings USENIX NSDI,April 2010.

40. H. Yuan, C. C. J. Kuo, and I. Ahmad. Energy Efficiency in Data Centers and Cloud-basedMultimedia Services: An Overview and Future Directions. In Proceedings IGCC, Chicago,IL, August 2010.

41. Y. Wang, E. Keller, B. Biskeborn, J. van der Merwe, and J. Rexford. Virtual Routers onthe Move: Live Router Migration as a Network-Management Primitive. ACM ComputerCommunication Review, 38:231–242, August 2008.

42. VMware. VMware Shared Folder Bug Lets Local Users on the Guest OS Gain ElevatedPrivileges on the Host OS. VMWare vulnerability. http://securitytracker.com/alerts/2008/Feb/1019493.html, 2008.

43. Xen. Xen Multiple Vulnerabilities. Xen vulnerability. http://secunia.com/advisories/26986,2007.

“9780471697558c02” — 2015/3/20 — 11:05 — page 47 — #25

REFERENCES 47

44. Microsoft. Vulnerability in Virtual PC and Virtual Server Could Allow Elevation of Priv-ilege. Virtual PC Vulnerability. http://technet.microsoft.com/en-us/security/bulletin/MS07-049, 2007.

45. J. Szefer, E. Keller, R. B. Lee, and J. Rexford. Eliminating the Hypervisor Attack Surfacefor a More Secure Cloud. In Proceedings of the 18th ACM conference on Computer andCommunications Security (CSS), pages 401–412, Chicago, IL, October 2011.

46. T. Benson, A. Anand, A. Akella, and M. Zhang. Understanding Data Center Traffic Charac-teristics. ACM SIGCOMM Computer Communication Review, 40(1):92–99, 2010.

47. A. Chukavkin and G. Peterson. Logging in the Age of Web Services. IEEE Security andPrivacy, 7(3):82–85, June 2009.

48. ProtoGENI. ProtoGENI, Dec 2013. Available at: http://www.protogeni.net/trac/protogeni(Dec. 2013).

49. E. Stelzer, S. Hancock, B. Schliesser, and J. Laria. Virtual Router Management Informa-tion Base Using SMIv2. Internet-Draft draft-ietf-ppvpn-vr-mib-05 (obsolete), June, 2013.http://tools.ieff.org//html/draft-ietf-ppvpn-vr.mib-05. Accessed on December 2014.

50. H. Asai, M. MacFaden, J. Schoenwaelder, Y. Sekiya, K. Shima, T. Tsou, C. Zhou, and H. Esaki.Management Information Base for Virtual Machines Controlled by a Hypervisor. Internet-Draft draft-asai-vmm-mib-05 (work in progress), October 13, 2013.

51. R. Enns. RFC 4741: NETCONF Configuration Protocol, December 2006. http://tools.ietf.org//html/rfc4741. Accessed on December 10, 2014.

52. B. Claise, B. Trammell, and P. Aitken. RFC 7011: Specification of the IPFIX Protocol for theExchange of Flow Information, September 2013. http://tools.ieff.org//html/rfctoll. Accessedon December 2014.

53. Distributed Management Task Force (DMTF). Web Services for Management (WS-Management) Specification. DMTF, Ago 2012.

“9780471697558c02” — 2015/3/20 — 11:05 — page 48 — #26

“9780471697558c03” — 2015/3/20 — 11:06 — page 49 — #1

3

VIRTUAL MACHINE MIGRATIONDiogo M. F. Mattos, Lyno Henrique G. Ferraz, and

Otto Carlos M. B. Duarte

Grupo de Teleinformática e Automação (GTA/UFRJ), PEE/COPPE - DEL/Poli,Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil

3.1 INTRODUCTION

Cloud computing is experiencing an extraordinary growth [1–5]. Besides, virtualiza-tion technologies are widely adopted by companies to manage flexible computingenvironments and to run isolated virtual environments for each customer [6]. Virtual-ization also provides the means to accomplish efficient allocation of resources and toimprove management, reducing operating costs, improving application performance andincreasing reliability. Virtualization logically slices physical resources into virtual envi-ronments, which have the illusion of accessing the entire available physical resource.Hence, the physical machine resources are shared between multiple VMs, which runtheir own isolated environment with an operating system and applications. By decouplingVMs from their underlying physical realization, virtualization allows flexible allocationof VMs over physical resources. To this end, virtualization introduces a new manage-ment primitive: VM migration [7]. VM migration is the relocation of virtual machinesover the underlying physical machines, even if the VM is still running.


49

“9780471697558c03” — 2015/3/20 — 11:06 — page 50 — #2

50 VIRTUAL MACHINE MIGRATION

The VM migration primitive enhances user mobility, load balancing, fault manage-ment, and system management [8]. The migration that occurs without the interruption ofservices running is called live migration.

Virtual machine migration is similar to the process migration, but it migrates a com-plete operating system and its applications. Process migration moves a running processfrom one machine to another. Process migration is very difficult, or even impossibleto accomplish, because processes are strongly bound to operating systems, by means ofopen sockets, pointers, file descriptors and other resources [8]. Unlike process migration,VM migration moves the entire operating system along with all the running processes.Migrating an entire operating system with its applications is a more manageable proce-dure, and is facilitated by the hypervisor, which exposes an interface between physicalmachine and the VM operating system. The details of what is happening inside the VMcan be ignored during migration. The VM migration also has challenges inherent secu-rity to transfer the state of a VM across physical machines and to establish a trustworthycomputing environment on the destination physical machine.

In the context of virtualization, it is necessary to ensure that virtual environmentsare secure and trustworthy. Thus, the hypervisor, which is a software layer responsi-ble for creating the hardware abstraction to the virtual environment, must implementa trusted computing base (TCB) [9, 10]. Indeed, the TCB is divided into two parts:the hypervisor and an administrative domain, see (Figure 3.1). The hypervisor controlsthe hardware directly and executes at the highest privilege level of the processor. Theadministrative domain is a privileged VM that controls and monitors other VMs. Theadministrative domain have privileges to start and stop VMs, to run guest VM config-uration, to use and to monitor physical resources, and to run I/O operation directly on

Administrative

domain

VM commandsStart/stop

monitor

setup

migrate

Domain U1

Virtual machines

Application

OSOS

Virtual

devices

Virtual switch

Native

devicedriver

IO devices

I/O operations Hypervisor

Hardware

Domain UN

Application

OS

Virtual

devices

Priviledged areas/sensitive

Figure 3.1. General Xen-based virtualization architecture. The hachured areas, administra-

tive domain and hypervisor, indicate the most sensitive software modules because they run

on highest privilege level.

“9780471697558c03” — 2015/3/20 — 11:06 — page 51 — #3

VM MIGRATION 51

the physical devices for the virtualized domains. This common architecture for virtu-alized systems creates, however, security challenges, such as lack of privacy of guestVMs. Administrative domain runs in a privileged level to inspect the state of guest VMs,such as the contents of its registers into memory and vCPUs. This privilege can beusurped by attacks on the software stack in the administrative domain and by malicioussystem administrators [11]. Therefore, it is necessary to establish a trusted computingbase (TCB) on the hypervisor and on the administrative domain to ensure the security ofvirtualized environments.

Specific relevance is given to a hybrid virtualization system based on Xen and Open-Flow platforms, called XenFlow [12], which focuses on router virtualization, especiallyon the virtual router migration without packet losses. XenFlow provides migration of vir-tual topologies over the physical realization, performing both migration of virtual routersto another physical host and remapping virtual links on one or more physical links. Thisfeature allows to extent virtual-router migration when compared to the other proposalsin literature [13–15], because routes are remapped to any destination physical node bymeans of OpenFlow network.

This chapter presents the major VM migration techniques. This work highlightsthe benefits, costs and challenges for the realization of the live migration of VMs. Wehighlight I/O virtualization techniques and discuss how to migrate VM even if theydirectly access I/O devices or use I/O virtualization techniques. The chapter sets out themain security requirements to be ensured during the migration of virtual environments.Then, we examine various schemes of VM migration and discuss research directionsin virtualization security. The ultimate goal is to provide a deep understanding of thedevelopments and the future directions regarding virtualized environments migrationprimitive.

The rest of this chapter is organized as follows. Section 3.2 sets a background forunderstanding VM migration and its challenges. Virtual network migration is explainedon Section 3.3, in which we also present a proposal for migrating virtual routers withoutpacket losses. The main security requirements and proposals for virtualized environmentsare identified on Section 3.4. Future research directions and open challenges are dis-cussed in Section 3.5. Section 3.6 concludes this chapter.

3.2 VM MIGRATION

The procedure of migrating the operating system and applications from a physicalmachine to another physical machine is an important feature in a virtualized environ-ment. VM migrations encompass four main resource transferring: processor, memory,network, and storage [15]. During the migration process, the VM is paused on sourcehost and is resumed on the destination host only when all resources have already beenmigrated and configured into the new host. The VM stays offline during a period of time,called downtime, which corresponds to time when the VM is paused until its resumptionat the destination. The downtime period varies according to the resources available to theVM, to the workload submitted to VM, and to the migration technique: offline or livemigration.

“9780471697558c03” — 2015/3/20 — 11:06 — page 52 — #4


3.2.1 Offline and Live Migration

Offline migration transfers the VM to destination physical host while the VM is off.The offline migration introduces a great delay in services of VM, but it is the easiestto accomplish because it does not require the VM state preservation. As the VM is off,there is no network connections to preserve, and it is neither necessary to transfer theprocessor state nor the RAM content. The offline migration procedure just comprisesshutting down and restarting the VM into another location.

The storage migration, or disk migration, is accomplished by standard data transfertools and is the only network traffic generated. It takes a long time and a lot of networkbandwidth to transfer a whole disk. As a matter of fact, VM migration is usually accom-plished within a LAN with a network-attached storage (NAS) device that allows a VMto access its disk from anywhere in the network, which makes unnecessary to migratethe disk.

Live migration transfers the VM while it still runs. The live migration should notcause a perceptible downtime to the VM user. Assuming the source and destination phys-ical machines in the same LAN with a NAS, live migration only should transfer the stateof the processor, the state of the memory and network connections.

The processor live migration consists of creating a virtual CPU (vCPU) for the VM atthe destination and copying the vCPU state from source to destination physical machines.Nevertheless, this task becomes complex when the source and the destination host pro-cessors are different. Migrations between different processors of the same manufacturerrequire the same instruction sets to work properly. In these cases, as a consequence, it isnecessary to limit the instruction set of the virtual CPU to a common instructions set ofboth processors. This operation is called CPU mask.

The network live migration procedure should maintain the Internet Protocol (IP)address of the source VM to preserve all open transmission control protocol (TCP)connections. To keep the same IP address at the destination, it is very simple when thesource and destination physical machines are in the same local area network (LAN).In this case, the destination physical machine generates gratuitous Address ResolutionProtocol (ARP) replies to advertise the new physical location of the VM, that is, onlythe advertisement of the medium access control (MAC) address of the migrated VM isrequired. Otherwise, when the source and destination machines are not in the same LAN,network redirection mechanisms would be required due to the localization semantic ofthe IP address.

Live memory migration is the transfer of memory contents from the source to thedestination host taking into account memory changes during the migration procedure,called retransmission of dirty pages. Venkat [16] divided the memory migration intothree phases:

1. Push phase: While the source VM is running, its memory pages are transferred tothe destination VM. If a page is modified after being transferred, it is necessaryto resend this page to avoid failures.

2. Stop-and-copy phase: As the name suggests, the source VM is stopped, then thememory pages are transferred.

“9780471697558c03” — 2015/3/20 — 11:06 — page 53 — #5

VM MIGRATION 53

3. Pull phase: The destination VM is started and generates a page fault when ittries to access a page that was not copied yet. This fault requests the page to betransferred from the source to the destination.

Two live migration strategies [16]: pre-copy and post-copy, only use a combination oftwo of the above-mentioned phases.

The pre-copy live migration strategy applies the phases: push and stop-and-copy.First, an empty VM is created at the destination physical host and the migrating VMmemory pages are copied to the VM at destination physical machine, while the VMstill runs on the source host. During this process, the running VM rewrite the memorypages which are resent to destination host. This push phase ends when one of the twoconditions are reached: (1) The number of dirty pages per iteration are small enough tocause a short downtime period and (2) The push phase reaches a maximum number ofiterations. After the push phase, it comes the stop-and-copy phase, in which the VM issuspended at the source host, the remaining dirty pages are transferred to the destinationhost, and the VM is resumed on the destination host. The downtime varies according theworkload from tens or hundreds of milliseconds to a few seconds [15]. It is important tonotice that determining when to stop the push phase and start the stop-and-copy phaseis not trivial. Stopping the push phase too soon can result in longer downtime, as moredata will be transferred after suspending the VM. On the other hand, stopping too lateresults in longer total migration time and network bandwidth occupation, as more timewill be spent re-sending dirty pages. Therefore, there is a trade-off between total migra-tion time and downtime. The pre-copy procedure requires the verification of memorypages to send them to the destination through the network. These CPU and bandwidthconsumption should be monitored to minimize service degradation. Xen uses pre-copyas its live migration strategy [14].

The post-copy live migration strategy use the phase stop-and-copy first and then thepull phase. First, the VM is suspended at the source and few VM execution states aretransferred to the destination host, namely CPU registers and non-paged memory. TheVM is resumed at the destination despite the absence of many memory pages, which stillare at the source host. The source host begins to send the remaining memory pages. Thedestination host generates faulty memory accesses when the VM tries to access memorypages that were not transferred yet. These faulty memory accesses are sent back to thesource host, which prioritizes the requested memory pages to send. This process candegrade memory intensive application performance, but cause minimal downtime. Thereare some ways of handling page fetching in order to increase performance, such as thefollowing:

• Active pushing: the pages are proactively pushed from the source to the destination.Page faults are handled with priority over noncritical pages.

• Pre-paging: an estimation of memory access pattern is generated to allow theactive pushing of the pages that are most likely to generate faults.

Table 3.1 compares offline and live migrations.

“9780471697558c03” — 2015/3/20 — 11:06 — page 54 — #6

TAB

LE3.

1.Co

mpa

rison

ofof

fline

and

live

mig

ratio

nte

chni

ques

Tech

niqu

e

Cha

ract

eris

tic

Stor

age

mig

ratio

nM

emor

ym

igra

tion

Net

wor

km

igra

tion

Dow

ntim

eTo

talm

igra

tion

time

Off

line

mig

ratio

n.Sh

utdo

wn

VM

and

rest

arta

tde

stin

atio

nho

st.

Stan

dard

copy

ing

tool

s,if

mig

rate

d.N

ottr

ansf

ered

.Los

sof

vola

tile

data

.R

econ

figu

ratio

nat

dest

inat

ion

host

.N

etw

ork

conn

ectio

nsno

tmig

rate

d.

Lon

gpe

riod

oftim

e.V

Man

dse

rvic

esre

star

t(if

stor

age

isno

tm

igra

ted)

.

Equ

alto

dow

ntim

e(i

fst

orag

eis

not

mig

rate

d).

Liv

em

igra

tion.

Tra

nsfe

rru

nnin

gV

Mto

dest

inat

ion

host

.

Not

mig

rate

d.St

orag

eis

acce

ssab

leth

roug

hth

ene

twor

k.

Tra

nsfe

red.

Pre-

and

post

-cop

yan

dre

tran

smis

sion

ofup

date

s.

Rec

onfi

gura

tion

atde

stin

atio

nho

st.

Tra

nsfe

rof

netw

ork

stat

e.N

etw

ork

conn

ectio

nspr

eser

ved.

Shor

tper

iod

oftim

e.V

Mpa

use/

resu

me.

Equ

alto

dow

ntim

epl

usm

emor

ytr

ansf

ertim

e.V

ary

acco

rdin

gto

the

wor

kloa

dth

atre

quir

esre

tran

smis

sion

ofdi

rty

page

s.

“9780471697558c03” — 2015/3/20 — 11:06 — page 55 — #7

VM MIGRATION 55

3.2.2 I/O Virtualization and Migration of Pass-Through Devices

Input/Output (I/O) virtualization of network devices is challenging because current net-work interface controllers are unable to distinguish which specific VM is writing to orreading from the shared memory space. Therefore, a controller or a hypervisor mustredirect (multiplexing or demultiplexing) data to/from specific memory area in an admin-istrative domain from/to different VM shared memory areas. This procedure negativelyimpacts the performance, since it introduces extra memory copies, it centralizes theinterruption handling at administrative domain processing time slice, and it demandsexecution of software instructions for multiplexing data in administrative domain, suchas virtual bridges, as shown in Figure 3.2a. Thus, a technique to improve I/O device per-formance is the use of pass-through technologies to avoid the centralization and memorycopies by providing direct I/O procedures to/from the virtual domain from/to the physicaldevice. Although the pass-through technology improves I/O virtualization performance,the pass-through device belongs to a single VM and cannot be shared by other VMs, asshown in Figure 3.2b.

The main technique to provide direct I/O virtualization is single root I/O virtu-alization (SR-IOV) for Peripheral Component Interconnect Express (PCIe) [17]. Thespecification SR-IOV stands for how PCIe devices can share a single root I/O devicewith multiple VMs. Indeed, a SR-IOV enabled hardware provides several PCIe virtualfunctions to the hypervisor, which can be assigned directly to VMs as pass-throughdevices, as shown in Figure 3.3a. Besides SR-IOV, Intel also proposes VM DeviceQueues (VMDq) [4] for network I/O virtualization. VMDq technology enabled networkdevice has separated queues for VMs. The network interface classifies received packetsto the queue of a VM and fairly sends packets of all queues in round robin manner. AsVMDq applies a paravirtualized device driver, it uses shared pages to avoid packet copy-ing between the virtual network interface in the VM and the physical network queue. The

Administrativedomain

Virtual switchClassificate and

copy packets to

shared memory

Native

drivers

Net

Iface 1

Net

Iface 2

I/O operations Hypervisor

Virtual

Iface 1

Virtual

Iface 2

Paravirtualized

drivers

Hardware

Domain U

(a)

Native

driverNative

driver

Net

Iface 1

Net

Iface 2

HypervisorDirect IO

Virtual

Iface 1

Virtual

Iface 2

Hardware

Domain U

(b)

Figure 3.2. I/O virtualization modes. (a) Network I/O virtualization with paravirtualized drivers.

Administrative domain centralizes all I/O operations. (b) Direct I/O network virtualization.

A network interface card is directly connected to virtual machine.

“9780471697558c03” — 2015/3/20 — 11:06 — page 56 — #8


(a)

Adminis-trative

domain

VirtualIface 1

VirtualIface 1

SR-IOVvirtualdriver

SR-IOVvirtualdriver

SR-IOV

masterdriver

Physicalnet Iface 1

Virtual netIface 1.1

Virtual netIface 1.2

Domain U1 Domain U2

Hypervisor

Hardware

Virtual/functiondirect IO

(b)

Administrativedomain

VirtualIface 1

Paravirtdriver

ParavirtdriverVMDq

drivers

Queue 1 Queue 1Net

Iface 1

I/O operations

VirtualIface 1

Hypervisor

Hardware

Domain U1 Domain U2

Figure 3.3. Hardware-assisted network I/O virtualization modes. (a) Network I/O virtualization

with SR-IOV. Virtual machines directly access NIC virtual functions. (b) Network I/O virtualization

with VMDq. Virtual machines access device queues through a paravirtualized driver.

VM benefits from faster classification and a paravirtualized device driver, while SR-IOVtechnology exposes a unique device interface to the VM. The implementation of VMDqparavirtualized driver assures better performance than paravirtualized network drivers.Besides, VMDq paravirtualized driver support live migration in a similar way than whenusing common paravirtualized drivers [4], illustrated in Figure 3.3b.

VMWare and Intel propose Network Plug-In Architecture (NPA/NPIA) [4] to livemigrate pass-through devices. The proposal creates a new driver for VM, which allowsthe online switching between SR-IOV and paravirtualized devices. This technologydesigns two new software modules: a kernel shell and a plug-in for the VM. Kernel shellacts as an intermediate layer to manage pass-through devices, which implements a devicedriver for the SR-IOV device. Plug-in, in its turns, implements virtual functions of thedevice, as a device driver, but interfaces with kernel shell instead of directly controllingthe device, exposing a virtualized network interface card to the virtual domain. The kernelshell provides a hardware abstraction layer and the plug-in implements hardware com-munication through the kernel shell. Plug-in may be plugged or unplugged on the fly. Toreduce migration downtime, while performing plugging/unplugging actions, the hyper-visor employs an emulated network interface. This technology trivially supports livemigration because a virtual network interface can be unplugged while running the VM.On the other hand, a drawback of the this approach is the need for rewriting all thenetwork device drivers, which may limit its adoption [4].

Pass-through I/O virtualization technology improves virtualized device performanceby making a tight coupling between the VM and the hardware device. Thus, VM livemigration becomes more difficult because pass-through devices are totally controlledby VM and the hypervisor does not access the internal states of the device. Indeed,in pass-through I/O virtualization the hypervisor does not interfere into the communi-cation between the physical device and the VM. Therefore, the internal states of thephysical device must be migrated with VM, in order to accomplish a successful live VMmigration [4].

“9780471697558c03” — 2015/3/20 — 11:06 — page 57 — #9

VM MIGRATION 57

A way to migrate VM with pass-through devices is to let user stop everything usinga pass-through device, and then migrate and restore the VM into the destination physicalhost. Although this method works, it is not generic enough to fit all operating systems, itinvolves a greater downtime, it needs to be inside the VM, and it needs a lot of interven-tion of the user [18]. A generic solution to suspend the VM before migrating is AdvancedConfiguration and Power Interface (ACPI)1 S3 [18]. Sleep state S3 stands for the sleep orsuspend state of a machine, in which the operating system freezes all process, suspendsall I/O devices, and then goes to the sleep state, but the RAM remains powered. It isworth noting that in sleep state all context is lost, except for the system volatile memory.The major drawback of this approach is that whole system is affected, inducing a longservice downtime besides disabling the target device.

Migration of a pass-through I/O device may also be accomplished by the PCI hot-plug mechanism [18]. Migrating a VM using PCI hotplug work as follows. Before livemigrating, in the source host, the entity responsible for the migration triggers an eventof hot unplugging the virtual PCI pass-through device against the guest VM. Then, themigrating VM responds to the hot unplugging event, and stops using the device afterunloading its driver. Without running any pass-through device, the VM can be safelylive migrated to the destination host. After the live migration, in the destination host, ittriggers an event of hot plugging a virtual PCI pass-through device against the VM. Even-tually, the guest VM loads the appropriate driver and starts using the new pass-throughdevice. As the guest reinitializes a new device, that has nothing to do with the old one, itshould reconfigure it as the previous one.

CompSC proposes a live migration mechanism for VM using pass-through I/O vir-tualization [4]. The key idea of CompSC is to change as less as possible the code ofdrivers and prevent the hypervisor to have any specific knowledge about the migratigdevice. The hypervisor examines the list of registers of the network device and savesthem into the sharedpt memory area. The hypervisor does not know the list of regis-ters a priori. For this reason, the hypervisor gets this list of registers also from the sharedmemory area, where the device driver places it during the boot process. The device drivercompletes the state transferring between hosts. Every time before the driver releases aread lock, it stores enough information about the latest operations or set of operationsto achieve a successful resume. In the resume procedure, the device triggers the targethardware using the same saved state information. The proposal also provides a layer ofself-emulation, which can be placed in the hypervisor or in the device driver. Placingthe self-emulation layer in hypervisor, the hypervisor intercepts all accesses to emulatedregisters and returns the correct value. A layer of self-emulation in the driver processesthe fetched value and corrects it after the access. A layer of self-emulation in hypervisorrequires only the list of emulated registers and requires few code changes to the driver, butthe performance degrades due to interception of I/O operations. A layer of self-emulationin device driver requires less overhead, but produces more code changes [4]. Table 3.2summarizes the migration proposals of main pass-through I/O virtualization techniques.

1Advanced Configuration and Power Interface (ACPI) specification is an open standard for device configura-tion and power management by the operating system. This standard replaces some other standards bringingpower management under the control of the operating system instead of BIOS control as stated by the replacedstandards.

“9780471697558c03” — 2015/3/20 — 11:06 — page 58 — #10

TAB

LE3.

2.Co

mpa

rison

ofm

igra

ting

I/Ovi

rtua

lizat

ion

tech

niqu

es

Tech

niqu

e

Cha

ract

eris

tic

Pros

Con

sSu

mm

ary

SR-I

OV

Goo

dpe

rfor

man

ce.V

Mdi

rect

acce

ssto

devi

ce.

Har

dto

mig

rate

.H

yper

viso

rca

nnot

acce

ssde

vice

stat

e.H

ardw

are

prov

ides

mul

tiple

virt

ual

func

tions

.H

yper

viso

rse

tsvi

rtua

lfu

nctio

nsto

VM

s.V

Min

tera

cts

dire

ctly

with

hard

war

ede

vice

s.V

MD

qG

ood

perf

orm

ance

and

easy

tom

igra

te.P

acke

tcla

ssif

icat

ion

byha

rdw

are

and

conv

entio

nal

para

virt

ualiz

eddr

iver

.

Slig

htly

perf

orm

ance

degr

adat

ion.

Min

ordr

iver

dom

ain

part

icip

atio

nin

I/O

.

VM

Dq

driv

erw

rite

san

dre

ads

pack

ets

dire

ctly

onsh

ared

page

sin

driv

erdo

mai

nw

hich

avoi

dpa

cket

clas

sifi

catio

nan

dex

tra

pack

etco

pies

.

NPA

-NPI

AG

ood

perf

orm

ance

and

easy

tom

igra

te.H

otpl

ugof

SR-I

OV

virt

ual

func

tions

and

para

virt

ualiz

eddr

iver

s.

Har

dto

depl

oy.N

ewvi

rtua

lnet

wor

kde

vice

driv

ers

inV

Ms.

Itcr

eate

sa

pair

“Ker

nel

Shel

lan

dPl

ug-i

n,”

whi

chal

low

sPl

ug-i

nto

bem

igra

ted

carr

ying

all

devi

cest

ates

,w

hile

Ker

nel

Shel

lim

ple-

men

tsvi

rtua

lfun

ctio

nin

toth

edr

iver

.Pa

use/

resu

me

Eas

yto

depl

oy.I

tuse

scu

rren

tte

chno

logi

es.

Har

dto

mig

rate

and

loss

ofvo

latil

eda

ta.I

tdep

ends

onus

ers’

inte

ract

ion.

VM

susp

ensi

on.

VM

issu

spen

ton

sour

ceho

stan

d,af

ter,

itis

resu

med

onde

stin

atio

nho

st.

PCI

hotp

lug

Goo

dpe

rfor

man

cean

dea

syto

depl

oy.I

tuse

scu

rren

ttec

hnol

ogie

s.L

oss

ofvo

latil

eda

ta.

Pass

-thr

ough

devi

ces

hotp

lugg

ing.

Sour

ceho

stun

plug

sth

evi

rtua

lPC

Ipa

ss-

thro

ugh

devi

ceof

VM

.Aft

erm

igra

tion,

ane

wpa

ss-t

hrou

ghde

vice

islo

aded

and

reco

nfig

ured

onth

em

igra

ted

VM

.C

ompS

CG

ood

perf

orm

ance

and

easy

tom

igra

te.V

Mus

escu

rren

tte

chno

logi

es.E

asy

tode

ploy

.H

yper

viso

rus

esne

wliv

em

igra

tion

soft

war

e.

Slig

htly

perf

orm

ance

degr

adat

ion

duri

ngm

igra

tion.

Itus

esem

ulat

edvi

rtua

lnet

wor

kde

vice

driv

er.

Hyp

ervi

sor

save

spa

ss-t

hrou

ghde

vice

stat

esbe

fore

mig

ratio

n,an

dre

stor

esth

ede

vice

stat

eaf

ter

mig

ratio

n.

“9780471697558c03” — 2015/3/20 — 11:06 — page 59 — #11

VIRTUAL NETWORK MIGRATION WITHOUT PACKET LOSS 59

3.3 VIRTUAL NETWORK MIGRATION WITHOUT PACKET LOSS

Network virtualization is the technique that decouples network functions from their phys-ical substrate, enabling virtual networks to run logically separated and over the a physicalnetwork topology [19]. The logical separation enables virtual network migration, whichallows online physical topology changes avoiding reconfiguration, traffic disruption andlong convergence delays [13]. The virtual network migration consists of migrating thevirtual network element, also called virtual router, to another physical location, withoutpacket losses or losing connectivity. The key idea to avoid packet losses is the separationof control and data planes, the former responsible for performing control operations, suchas running routing protocols and defining QoS parameters, and the latter responsible forthe packet forwarding [13, 14]. As the virtual router should always forward the traffic,the data plane is copied to the physical host while the virtual router migrates. After themigration, the data plane in source host is deactivated, so the virtual router completelyruns in the new location.

Both Wang et al. and Pisa et al. use plane separation paradigm to migrate virtualrouters without packet losses [13, 14]. They assume an external mechanism for linkmigrations to preserve neighborhood after migration, such as maintaining the same set ofneighbors or tunneling. Pisa et al. assume all physical routers connect to the same localarea network (LAN) to facilitate link migration [14]. On the other hand, flow migrationon the OpenFlow platform is easy. Pisa et al. present an algorithm that is based on theredefinition of a flow path in the OpenFlow network [14]. This proposal has zero packetlosses and low overhead of network control messages. Although, this migration proposalis limited to OpenFlow switched networks, and it is not applicable to router virtualizationsystems.

Mattos and Duarte present XenFlow [12], a hybrid network virtualization systembased on plane separation paradigm with Xen and OpenFlow platforms [20, 21] to bothmigrate virtual routers and virtual links. VM act as the routers control plane runningrouting protocols, and data planes of all virtual routers run centrally in the Xen adminis-trative domain Domain 0. Physical machines have an OpenFlow switch to connect XenVMs to the physical network, and each Xen VM acts as generator of rules to theseswitches. The remapping of the virtual topologies is orchestrated by a network con-troller capable of acting on the OpenFlow switches and of triggering the migration ofVMs on any network node. Figure 3.4 presents this architecture. The architecture allowsto migrate virtual routers beyond a local area network, because routes are remapped toany destination physical node by means of OpenFlow network. However, the architec-ture forces all virtual networks to share the same data plane, violating the requirement ofisolation between virtual environments. Thus, XenFlow isolates virtual networks by twomechanisms: Address space isolation among virtual networks, which ensures VMs onlyaccess VMs that belong to the same virtual network; and virtual network resources shar-ing isolation, which prevents virtual networks against using resources of other virtualnetworks [22]. The system also offers quality of service through mapping parametersof service-level agreements, defined as control plane directives, to parameters of thedata plane. It controls the basic resources of virtual networks: processing, memory, andbandwidth, as those are the resources that can be locally controlled [23].

“9780471697558c03” — 2015/3/20 — 11:06 — page 60 — #12


Virtual router

Physical host

Source

Networkcontroller

Destination

Data path

Orchestrates

migration andremapping of the

virtual topologies

Openflow

switch

OpenFlowswitch

OpenFlowswitch

Xen virtual

machine

OpenFlowswitch

Control plane

Control plane

Control plane

Control plane

Data plane

DP

DP

DP

DP

Data plane copied

to physical host

Plane separation:

Figure 3.4. XenFlow architecture overview. Xen virtual router data plane is copied to physical

host OpenFlow switch. Network controller orchestrates virtual router and link migration.

Destination

router

Flow

table

Flow

table

Switching rules

definition

3. Link

migration

2. Data plane

rebuild

1. Control plane

migration

Physical link

Virtual link before migration

Virtual link after migration

Virtual network flows after migration

Virtual network flows before migration

Source

router

Flow

table

Figure 3.5. XenFlow virtual topology migration. (1) Virtual machine and all running routing

protocol migration. (2) Data plane reconstruction based on control plane information. (3) Link

migration by sending a predefined ARP Reply message.

The XenFlow routing function is performed by a flow table dynamically controlledby POX, an OpenFlow network controller [24]. Migration of virtual routers, shown inFigure 3.5, consists of three steps: migration of control plane, reconstruction of dataplane, and migration of virtual links. The control plane is migrated between two physi-cal network nodes through the live-migration mechanism of conventional Xen VMs [15].Then, the reconstruction of data plane is performed as follows. The virtual router sendsall routes to the Domain 0. When the virtual router detects a connection disruptioncaused by the migration, it reconnects to the Domain 0 in new physical host and sendsall information about the routing and ARP tables. Upon receiving such information,Domain 0 reconfigures the data plane according to the control plane of the migrated

“9780471697558c03” — 2015/3/20 — 11:06 — page 61 — #13

SECURITY OF VIRTUAL ENVIRONMENTS 61

virtual router. After migration of the control plane and reconstruction of the data plane,links are migrated. The links migration occurs in the OpenFlow switches instantiated inDomain 0 and other OpenFlow hardware switches. Link migration creates a switchedpath between the neighbors of the migrated virtual router to the physical host of vir-tual router after migration. The migrated virtual router sends an ARP reply packet witha predefined destination MAC address (AA:AA:AA:AA:AA:AA), which the networkcontroller captures and reconfigures the paths. This procedure updates the location ofa virtual router after the migration procedure, hence, the source physical host forwardpackets until the migration is complete, which results in a migration primitive of virtualrouters without packet loss or interruption of packet-forwarding services.

XenFlow ensures the virtual router migration without packet loss, but the newpath in the underlying substrate may introduce a greater or a smaller delay when com-pared to the original path. XenFlow does not control delay in forwarding nodes andalso the new path may comprise non-XenFlow nodes. Therefore, during virtual net-work migration, packets may be out of order or may be received after a bigger delayof the new path. We assume that this is not a constrain because transport protocols areresilient to delay variation, as currently occurs due to changes in routing path or networkcongestion.

3.4 SECURITY OF VIRTUAL ENVIRONMENTS

There are several vulnerabilities that are disclosed in the current implementation of livemigration of well know hypervisors, such as Xen and VMWare [25]. The biggest issueis that transferred data is not encrypted during migration procedure. Kernel memory,application state, sensitive data such as passwords and keys, and other migration data aretransferred clearly, resulting in no confidentiality. Other vulnerabilities are: no guaranteesthat the VM is migrating to a trusted destination platform, no authentication and no autho-rization of operations, no integrity guarantees of VM data, and bugs in the hypervisor andmigration module code that introduce security vulnerabilities. In this section, we argueabout the main security issues of machine virtualization, and we expose the main securityrequirements for a secure virtualization platform. We focus on securing VM migration,but we also highlight security issues that affect cloud computing environment based onmachine virtualization.

3.4.1 Requirements for a Secure Virtual Environment

A secure virtualization environment must ensure that processor, RAM, storage, andnetwork, the main resources of a VM, are invulnerable against other VMs or againstinfrastructure attacks. Therefore, we establish six security requirements that summarizethe needs of a secure virtualization environment. We also highlight that a secure livemigration should provide confidentiality, to guarantee that any VM data are not accessedby others while they are transferred from one host to another, and auditability, to securethat sensitive data have not been exposed or damaged [26]. The six secure virtualization

“9780471697558c03” — 2015/3/20 — 11:06 — page 62 — #14


requirements are following: availability and isolation; integrity; confidentiality; accesscontrol, authentication, and authorization; nonrepudiation; and replay resistance.

Availability and isolation stands for the fact that any VM should be neither capable toaccess nor interfere other VMs. Even though several VMs share the same infrastructure,one VM is not able to access other VMs data or change computing results [1]. Thus, asecure hypervisor ensures strong isolation between running VMs, running each VM intoa protected domain [27]. It is worth noting that isolation is achieved with confidentiality,integrity, and protection against denial of service.

Integrity aims that a virtual environment must provide the means to verify and provethe integrity and, therefore, it must be possible to identify if its processing, memory, andstorage were modified. Attacks against integrity intend to modify information from vir-tual environments or to modify running programs in a virtual environment. The migrationprocess should also be protected against integrity violation, because it clearly exposesthe VM memory through the network to attacks, such as man in the middle attack [28].In addition, a hardware module can run cryptographic functions to perform integrityverification and attestation. Attestation cryptographically ensures that a computing envi-ronment is trustworthy and the running application are not compromised [27]. Attestationmay also assure that a remote environment is trustworthy because it has the same crypto-graphic signature of an integer environment. Attestation is also important to assure thatafter a VM migration, the destination machine is trustworthy and the migrated VM keepsits integrity as its cryptographic signature remains the same of the one before migration.

VM atomicity ensures that only one instance of the VM runs at a time [10]. There-fore, VM migration should neither add new VMs nor eliminate anyone. Thus, aftersuccessful migration, the system removes the VM instance in source host, and in caseof migration failure, the system removes the VM instance in target host. The atomicityis crucial to ensure the integrity of the infrastructure for disaster recovery and to avoidgenerating duplicated copy of the same VM.

Confidentiality ensures that an attacker should not be able to intercept, to access orto modify the content of data transfer during the migration of a virtual machine. There-fore, system may use secure communication channel to transfer data between peer hosts.Moreover, the peers of the secure communication channel should be able to negotiateunique cryptographic keys and ensure that they are known only by the peers [10].

Access control, authentication, and authorization define that the system must ensurethat a VM migration is performed between two secure authenticated platforms, whichboth are authorized to perform the migration, and there is no one else between them (manin the middle). Authentication ensures the true identity of an entity, hence, other secu-rity requirements depend on successful authentication. Authentication is a key featurebecause other security requirements depend on the authentication such as authorization,to distinguish legitimate and authorized from illegitimate participants based on authen-tication. Authorization ensures that only authorized entities perform operations such asVM migration. Besides, the VM should be neither migrated to unauthorized host norfrom one [29].

Non-repudiation stands for the peers involved in migration cannot deny the migra-tion participation [10]. The system must guarantee the provision of conclusive evidencesof the migration event and peers participation, even when peers do not cooperate.

“9780471697558c03” — 2015/3/20 — 11:06 — page 63 — #15


Replay resistance aims that an attacker cannot reproduce the migration procedurewithout being detected. Hence, all migration packets are unique and lose validity aftermigration.

3.4.2 Vulnerabilities

In a virtual environment, multiple VMs running on top of the same physical machineincrease the efficiency of the system, but it also introduces software on sensitive areasof the system, which increases vulnerabilities. These vulnerabilities can be exploited bymalicious users to obtain sensitive information, such as passwords and encryption keys,or perform other types of attacks such as denial of service.

In internal attacks, the system administrator performs attacks on the VMs. In thiscase, the system is completely vulnerable, because the administrator is authenticated andauthorized to perform actions, neither cryptographic nor integrity techniques prevent theattacks. A malicious user who gains super-user privileges, via flaws in the authenticationand authorization modules, performs an internal attack.

In other attacks, the attacker exploits the flaws of the virtualization system sourcecode to inject malicious code and modify the system modules. This attack is possible dueto the complexity of virtualization systems that end up having security flaws [2].

The attack can also be originated from an infected VM (or a legitimate machinewith a malicious user) targeting other VMs sharing the same system. This type of attackrequires that the attacker and the target VM are in the same physical machine. Due tothe sharing of resources (e.g., CPU data cache), the attacker can steal cryptographickeys using techniques such as covert channel. This attack is facilitated when the net-work infrastructure indirectly allows the user to map the virtual networks and verifyco-residence with the target VM [1]. These procedures are facilitated when static IPsare used for virtual networks, associating them with the physical IPs, but it can also bechecked with IP common tools, such as traceroute.

The side channel attack is any attack that information, obtained to break the system,relies on information leaked by the hardware that are obtained by physical measurementsas a “side” or an alternative channel [30]. The attack only concerns the implementationof a cryptosystem, rather than cryptanalysis of the math of the algorithm or brute force.Examples of physical measurements used to build a side channel can be: time took forperforming different computations [31], varying power consumption [32], or leaked elec-tromagnetic radiation provided by the hardware during computations, and even soundproduced by the hardware. Therefore, assuming side attacks, the weakness of the securitysystem is not the algorithm but its implementation. Brumley and Boneh [33] have shownthat they succeeded to extract private keys from an OpenSSL-based Web server runningon a machine in the local network. They run a timing attack in which an attacker machinemeasures the decryption queries response time of an OpenSSL server, in order to extractthe private key stored on the server. They successfully performed the attack between twoVMs, then, their results invalidate the announced isolation provided by the hypervisor.As mentioned before, side channel attacks only concern the crypto algorithm implemen-tation and, thus, a virtualized system does not interfere on the weakness or strengthen ofan implementation. Otherwise, virtualization is a shared operating hardware environment

“9780471697558c03” — 2015/3/20 — 11:06 — page 64 — #16


and actions of one VM may cause effects in another VM. Therefore, a virtualized systemshould not facilitate the access to physical measurements and should fully isolate onevirtual environment from another virtual environment to prevent side channels attacks.

Covert channel is a type of security attack that creates and conveys informationthrough a hidden communication channel, which is able to transfer information betweenprocesses that violate the security policy. A covert channel is not a legitimate channel and,therefore, it depends upon an ingenious mechanism, which is a program scheme to hidethe way used to transfer the information from the source to the destination and requiresaccess to the file system. Hence, different from side channel attack, covert channel areillegitimate communication channel built on already compromised systems. Covert chan-nel requires viral infection of the system or a programming effort accomplished by theadministrator or other authorized user of the system. Covert channels are usually difficultto detect and low detectability, the capacity to stay hidden, is often the assumed mea-surement of effectiveness of a covert channel attack. The usual hardware-based securitymechanisms that underlie ultra-high-assurance secure operating systems cannot detector control covert channels because they do not employ legitimate data transfer mecha-nisms of the computer system such as read and write. Thus, the covert channel must notinterfere into legitimate operations to not be detected by security systems.

Intruders have limited options to get the data out of secured systems with IntrusionDetection Systems, Packet Anomaly Detection systems, and firewalls [34]. In this sce-nario, the intruder creates a covert channel. The communication media often used areordinary actions unnoticed by administrator and legitimate users such as use of header-or payload-embedded information, altering a store location, performing operations thatmodify the real response time, using of packet inter-arrival times, and so on. Adding datato the payload section of Ping packets or encoding data in the unused fields of packetheaders. A covert channel attack, which is the most difficult to detect, is to use inter-packet delay times to encode data. This means that the intruder does not necessarilyhave to create new traffic because he encodes the data by modulating the time betweenpackets of regular legitimate communication. Data exfiltration can be an indication thata computer has been compromised even when other intrusion detection schemes havefailed to detect a successful attack.

During the process of live migration, vulnerabilities may be exploited by attackers.Such vulnerabilities include authorization, integrity, and isolation failures.

Inappropriate access control policy: If access control policies are not defined prop-erly or the module responsible for regulating them does not act effectively, an attackercan acquire undue control of systems to perform internal attacks. When the attacker con-trols the migration operation, the attacker can cause a denial of service by migratingmultiple VMs to one physical machine to overload the communication link and the phys-ical machine itself. The attacker may also migrate a malicious VM to a target physicalmachine, or migrating a target VM to a malicious physical machine. In both cases, aftermigration, the attacker gains full control of the target machine (physical or virtual).

Unprotected channel transmission: If the migration channel does not guarantee theconfidentiality of the data, an attacker can steal or modify sensitive information, suchas passwords and encryption keys. Attacks can be done passively (sniffing) or actively(man-in-the-middle) using techniques such as ARP spoofing, DNS poisoning and route

“9780471697558c03” — 2015/3/20 — 11:06 — page 65 — #17


hijacking. Active attacks are usually more problematic since they violate integrity, andmay include modifications in the authentication services of the VM (sshd/login) andmanipulation of kernel memory.

Loopholes in the migration module: The contemporary virtualization software suchas Xen, VMware and KVM, have an extensive and complex code base, which tend tohave bugs. Perez-Botero et al. identified 59 vulnerabilities in Xen and 38 in KVM untilJuly 15, 2012, according to reports of CVE security vulnerability database [35]. Theseresults confirm the existence of vulnerabilities, which an attacker can exploit to obstructor access VMs.

3.4.3 Isolation, Access Control, and Availability

Several proposals aim to improve virtualization isolation, QoS provisioning, and virtualtopologies migration. Besides, some proposals use software-defined networking (SDN)to manage network migrations. There are proposals for developing security applicationson OpenFlow network infrastructures, as there are others that seek to ensure the securityof the infrastructure itself [36].

NetLord [37] introduces a software agent on each physical server, which encap-sulates packets of VMs with a new IP header. The new IP header whose semantics ofaddresses of layers 2 and 3 are overloaded to indicate to which virtual network the framesbelong to. Similarly, VL2 [38] encapsulates IP packets of a virtual network with anotherIP header. In this case, the semantics of the IP addresses indicate both the virtual networkand the localization of the physical host.

Distributed Overlay Virtual Ethernet (DOVE) [39] is a proposal of network virtual-ization that provides address space isolation by using a network identifier field of theenvelop DOVE header, creating an overlay network. Address space isolation is alsoachieved using Virtnal extended Local Area Network (VXLAN), encapsulation [73].VXLAN also adds to each Ethernet frame an outer Ethernet header, followed by anexternal IP, UDP, and VXLAN headers. Network Virtualization Generic Routing Encap-sulation (NVGRE) [41] also encapsulates to allow multi-tenancy in public or privateclouds. Both VXLAN and NVGRE use 24 bits to identify the virtual network that a framebelongs to. Nevertheless, these proposals create an overlay network that interconnects thenodes of the virtual network.

Houidi et al. propose an adaptive system that provides resources on demand for vir-tual networks [42]. It provides more resources for virtual networks as soon it detectsservice degradation or after a resource failure. The system uses a distributed multi-agentmechanism in physical infrastructure to negotiate requests, to fit the resources to the net-work needs, and to synchronize supplier nodes and virtual networks. Another proposal,OpenFlow Management Infrastructure (OMNI) [43] provides QoS to OpenFlow net-works [21]. OMNI manages all flows of the network and define QoS parameters to eachone. Besides, OMNI migrates flows to different paths without any packet losses. Kimet al. map QoS parameters of the virtual networks with different workloads on resourcesavailable on OpenFlow switches, such as queues and rate limiters [44]. The proposal’smain goal is to provide QoS to scenarios in which the physical infrastructure belongsto a cloud multi-tenant provider. Nevertheless, the control of QoS parameters and QoS

“9780471697558c03” — 2015/3/20 — 11:06 — page 66 — #18


mapping are centralized on the OpenFlow controller node. McIlroy and Sventek provideQoS to virtual networks with a new router architecture [45]. The router comprises multi-ple VMs, called Routelets. Each Routelet is isolated from others and their resources arelimited and guaranteed. Routelets that route QoS sensitive flows have access priority tosubstrate resources. Nevertheless, packet forwarding is performed by VMs, which limitsthe forwarding performance of Routelets.

Wang et al. propose a load balancer based on programming low cost OpenFlowswitches to multiplex requests among different server replicas [46]. The proposed solu-tion weightily fragments the IP address space of clients between server replicas. Thus,according to the client IP, it identifies the replica that serves a client. The proposal, how-ever, does not guarantee the reservation of resources, nor QoS of flows. Hao et al. presentthe infrastructure Virtually Clustered Open Router (VICTOR) which is based on creatinga cluster of datacenters via a virtualized network infrastructure [47]. The central idea ofthis approach is to use the OpenFlow as the basic network infrastructure of datacentersto allow moving a virtual machine from one physical location to another, as it is possi-ble to reconfigure network paths. This proposal optimizes the datacenter network usageperforming server migrations, but it does not guarantee QoS of each flow, and also doesnot isolate the use of resources from different virtualized servers.

3.5 FUTURE DIRECTIONS

The most important performance goal in VM live migration is a short VM downtime.Current migration approaches apply a combination of push and stop-and-copy strategiesfor VM live migration. The combined push and stop-and-copy strategy reduces the VMdowntime at the cost of increasing the total migration time and network traffic due tomigration. When transferring the VM storage during migration, total migration timeis also affected. Therefore, a main research topic is to decrease the total downtime,keeping memory and storage consistence and reducing network bandwidth. Downtimedirectly impacts on the virtualization performance and compromises the deployment ofVM migration on different scenarios.

Virtual Network Migration is another research topic. When moving a VM, its net-work connections should follow accordingly. VM migration between different LocalArea Networks demands mechanisms for IP address migration or for networks trafficredirection. Migration within the same datacenter can also present performance problemswhen datacenters are globally distributed in a wide geographical area. Current researchefforts focus on tunneling network traffic between source and destination host [22]. Inthis direction, there are proposal, such as NVGRE [41], VXLAN [73], and DOVE (Dis-tributed Overlay Virtual Ethernet) [39], that creates tunnels to maintain virtual networkconnectivity even in scenarios that sites are separated by a WAN. Moreover, NetLord [37]and VL2 [38] change IP semantics for isolating and creating virtual networks within adatacenter. Other proposals for handling VM mobility across the Internet is to use Loca-tor/Identifier Separation Protocol (LISP) [48, 49]. LISP uses two IP headers, one for thelocator and other for identifier of the host. LISP maintains a globally reachable servicethat maps locator into identifier, and vice versa, in order to ensure the correct location of

“9780471697558c03” — 2015/3/20 — 11:06 — page 67 — #19

FUTURE DIRECTIONS 67

VM no matter where it is hosted. After the VM migration, only the locator is changed,and all services remain online and reachable. Future trends also point to OpenFlow [21]as a possible approach for managing virtual network. Nevertheless, all aforementionedapproaches require adaptations or more sophisticated deployments to be fully functional.To achieve a seamless network migration, we believe that new standards should take placeto define a common way to migrate virtual network.

Storage is an important resource to virtualized servers, because it must be alwaysavailable and present high performance. When migrating a VM, its storage should bealso available at migration destination. Therefore, both source and destination sites sharethe storage service, or all VM storage must be sent over the network to destination host.EMC2, one of the world’s leader storage provider enterprise, provides a storage facil-ity focused on a distributed federation of data, which allows data to be accessed amonglocations over synchronous distances. The EMC2 distributed storage service is calledVPLEX.2 Moreover, Ceph is an open source project that aims to provide a distributedand redundant file system [50]. We agree that there are several initiatives for provid-ing distributed storage service that are a step ahead for an available file system for VMmigration. Nevertheless, these initiatives are new and immature. The proprietary oneshave a higher maturity grade, but still are expensive and demand large infrastructure.Providing a distributed and available storage service, that requires low investment intoinfrastructure and is backward compatible, is a key research area.

Automated migration is also a key research topic, because the VMs allocation intophysical servers is a np-hard problem. This scenario is aggravated considering big data-centers and multiple datacenters in a cloud provider’s environment, due to the size andunmanageability of the scenario. There are proposed heuristics [51]-based optimizationand others based on system modeling [52, 53], aiming to better use physical resources.An important factor to be considered in the use of optimization algorithms is the con-vergence time of the algorithm, which will directly interfere into the dynamics of thesystem. Proposals for optimization of use of physical resources are complementary toautomatic migration systems and can be used to manage migrations. Trends show that akey research theme is matching the tradeoff between optimizing physical resource usageand limiting the number of migrations into the network.

A major research topic that arises is securing VM migration. Our studies show thatthere is no proposal that achieves a complete secure live migration primitive. Securitymust be deployed all long the development of a virtualization system. It must be presentsince the hypervisor, which should be reliable, trustworthy, and should provide securevirtualized environment, till the migration procedure, which should authenticate peers,check the trust of the foreign peer, and ensure a confidential channel between peers fortransferring the VM. Security must also be ensured for all resources used by a VM.Isolation is a key challenge for network virtualization, as availability is another keychallenge for storage virtualization. Confidentiality is an open topic while virtualizingmemory. Trust warranting is a trend of research, in which we identified some worksproposing protocols and new approaches [54]. We believe that providing security for

2http://www.emc.com/campaign/global/vplex/index.htm.

“9780471697558c03” — 2015/3/20 — 11:06 — page 68 — #20


virtualized environments is a hot research topic, in which the proposals still are initialand immature. Therefore, trends show that new security mechanisms should be proposedfor guaranteeing a securer virtualizing system.

3.6 CONCLUSION

VM migration is one of the most useful primitive introduced by virtualization technique.VM migration stands for the relocation of virtual computing environments over the phys-ical infrastructure. The main idea of the migration primitive is to remap virtual resourcesinto physical resources without disrupting the function of the virtual resources. We con-sider Virtual Machine Migration of particular interest for cloud computing environmentsand for network virtualization approaches. We claim that migration is a powerful toolfor fitting computer capacity to dynamic workloads, facilitating user mobility, improv-ing energy savings, and managing failures. In a network virtualization scenario, VMmigration plays the whole of flexibly changing network topologies without constrain-ing the physical realization of the virtual topology. Nevertheless, VM migration is bothchallenging in its realization and in its security guarantees.

In this chapter, we explained that live migration is the key migration mechanismof most of current hypervisor. We identified that the key resource to migrate is the VMmemory, as it is constantly updated during the migration process. We also discussedhow to migrate storage service of VMs through WANs. Moreover, we present a networkvirtualization approach, called XenFlow, which focuses on migrating virtual networks,without losing packets or disrupting network services. Besides the technical difficultiesof migrating a VM, while it is running, we also highlighted how to assure that a VMmigration occurs in a secure environment.

REFERENCES

1. T. Ristenpart, E. Tromer, H. Shacham, and S. Savage, “Hey, you, get off of my cloud: Explor-ing information leakage in third-party compute clouds,” in Proceedings of the 16th ACMConference on Computer and Communications Security, ser. CCS ’09, 2009, pp. 199–212.[Online]. Available: http://doi.acm.org/10.1145/1653662.1653687. Accessed November 20,2014.

2. Z. Wang and X. Jiang, “Hypersafe: A lightweight approach to provide lifetime hypervisorcontrol-flow integrity,” in 2010 IEEE Symposium on Security and Privacy (SP), May 2010,pp. 380–395.

3. M. Pearce, S. Zeadally, and R. Hunt, “Virtualization: Issues, security threats, and solutions,”ACM Computing Survey, vol. 45, no. 2, pp. 17:1–17:39, 2013. [Online]. Available: http://doi.acm.org/10.1145/2431211.2431216. Accessed November 20, 2014.

4. Z. Pan, Y. Dong, Y. Chen, L. Zhang, and Z. Zhang, “Compsc: Live migration with pass-throughdevices,” SIGPLAN Notices, vol. 47, no. 7, pp. 109–120, 2012. [Online]. Available: http://doi.acm.org/10.1145/2365864.2151040. Accessed November 20, 2014.

“9780471697558c03” — 2015/3/20 — 11:06 — page 69 — #21

REFERENCES 69

5. L. H. G. Ferraz, D. M. F. Mattos, and O. C. M. B. Duarte, “A two-phase multipathing schemewith genetic algorithm for data center network,” IEEE Global Communications Conference -GLOBECOM, Austin, TX, December 2014.

6. O. C. M. B. Duarte and G. Pujolle, Virtual Networks: Pluralistic Approach for the NextGeneration of Internet. Hoboken, NJ: John Wiley & Sons, Inc. 2013.

7. I. M. Moraes, D. M. Mattos, L. H. G. Ferraz, M. E. M. Campista, M. G. Rubinstein, L. H. M.Costa, M. D. de Amorim, P. B. Velloso, O. C. M. Duarte, and G. Pujolle, “FITS: A flexiblevirtual network testbed architecture,” Computer Networks, Vol. 63, pp. 221–237, 2014.

8. H. T. Mouftah, H. T. Mouftah, and B. Kantarci, Communication Infrastructures for CloudComputing, 1st ed. Hershey, PA: IGI Global, 2013.

9. S. Berger, R. Cáceres, K. A. Goldman, R. Perez, R. Sailer, and L. van Doorn, “vtpm: Virtualiz-ing the trusted platform module,” in Proceedings of the 15th Conference on USENIX SecuritySymposium - Volume 15, ser. USENIX-SS’06, 2006.

10. X. Wan, X. Zhang, L. Chen, and J. Zhu, “An improved vtpm migration protocol based trustedchannel,” in 2012 International Conference on Systems and Informatics (ICSAI), May 2012,pp. 870–875.

11. M. Aslam, C. Gehrmann, and M. Bjorkman, “Security and trust preserving VM migrations inpublic clouds,” in 2012 IEEE 11th International Conference on Trust, Security and Privacyin Computing and Communications (TrustCom), June 2012, pp. 869–876.

12. D. M. F. Mattos and O. C. M. B. Duarte, “XenFlow: Seamless migration primitive and qualityof service for virtual networks,” IEEE Global Communications Conference - GLOBECOM,Austin, TX, December 2014.

13. Y. Wang, E. Keller, B. Biskeborn, J. van der Merwe, and J. Rexford, “Virtual routers onthe move: Live router migration as a network-management primitive,” in Proceedings ofthe ACM SIGCOMM 2008 Conference on Data Communication, ser. SIGCOMM ’08, 2008,pp. 231–242. [Online]. Available: http://doi.acm.org/10.1145/1402958.1402985. AccessedNovember 20, 2014.

14. P. Pisa, N. Fernandes, H. Carvalho, M. Moreira, M. Campista, L. Costa, and O. Duarte, “Open-flow and xen-based virtual network migration,” in Communications: Wireless in DevelopingCountries and Networks of the Future, ser. IFIP Advances in Information and CommunicationTechnology, A. Pont, G. Pujolle, and S. Raghavan, Eds. Boston, MA: Springer, 2010, vol. 327,pp. 170–181.

15. C. Clark, K. Fraser, S. Hand, J. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield, “Livemigration of virtual machines,” in Proceedings of the 2nd conference on Symposium onNetworked Systems Design & Implementation-Volume 2, 2005, pp. 273–286.

16. S. Venkatesha, S. Sadhu, and S. Kintali, “Survey of virtual machine migration techniques,”Department of Computer Science - University of California, Santa Barbara, CA, TechnicalReport, March 2009.

17. J. Suzuki, Y. Hidaka, J. Higuchi, T. Baba, N. Kami, and T. Yoshikawa, “Multi-root share ofsingle-root i/o virtualization (sr-iov) compliant pci express device,” in 2010 IEEE 18th AnnualSymposium on High Performance Interconnects (HOTI), August 2010, pp. 25–31.

18. E. Zhai, G. D. Cummings, and Y. Dong, “Live migration with pass-through device for linuxvm,” in OLS’08: The 2008 Ottawa Linux Symposium, 2008, pp. 261–268.

19. N. Fernandes, M. Moreira, I. Moraes, L. Ferraz, R. Couto, H. Carvalho, M. Campista,L. Costa, and O. Duarte, “Virtual networks: Isolation, performance, and trends,” Annals ofTelecommunications, vol. 66, pp. 1–17, 2010.

“9780471697558c03” — 2015/3/20 — 11:06 — page 70 — #22


20. N. Egi, A. Greenhalgh, M. Handley, M. Hoerdt, F. Huici, and L. Mathy, “Towards high per-formance virtual routers on commodity hardware,” in Proceedings of the 2008 ACM CoNEXTConference, 2008, pp. 1–12.

21. N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker,and J. Turner, “OpenFlow: Enabling innovation in campus networks,” SIGCOMM ComputerCommunication Review, vol. 38, pp. 69–74, 2008.

22. M. Bari, R. Boutaba, R. Esteves, L. Granville, M. Podlesny, M. Rabbani, Q. Zhang, andM. Zhani, “Data center network virtualization: A survey,” Communications Surveys Tutorials,IEEE, vol. 15, no. 2, pp. 909–928, 2013.

23. R. Sherwood, G. Gibb, K. Yap, G. Appenzeller, M. Casado, N. McKeown, and G. Parulkar,“Flowvisor: A network virtualization layer,” Technical Report OPENFLOW-TR-2009-01,OpenFlow Consortium, 2009.

24. M. Casado, T. Koponen, R. Ramanathan, and S. Shenker, “Virtualizing the network forwardingplane,” in Proceedings of the Workshop on Programmable Routers for Extensible Services ofTomorrow, 2010, p. 8.

25. V. Melvin, “Dynamic load balancing based on live migration of virtual machines: Secu-rity threats and effects,” Master’s thesis, B. Thomas Golisano College of Computing andInformation Sciences (GCCIS) - Rochester Institute of Technology, Rochester, NY, 2011.

26. Q. Zhang, L. Cheng, and R. Boutaba, “Cloud computing: State-of-the-art and research chal-lenges,” Journal of Internet Services and Applications, vol. 1, no. 1, pp. 7–18, 2010. [Online].Available: http://dx.doi.org/10.1007/s13174-010-0007-6. Accessed November 20, 2014.

27. T. Garfinkel, B. Pfaff, J. Chow, M. Rosenblum, and D. Boneh, “Terra: A virtual machine-based platform for trusted computing,” in Proceedings of the Nineteenth ACM Symposium onOperating Systems Principles, ser. SOSP ’03, 2003, pp. 193–206. [Online]. Available: http://doi.acm.org/10.1145/945445.945464. Accessed November 20, 2014.

28. J. Oberheide, E. Cooke, and F. Jahanian, “Empirical exploitation of live virtual machinemigration,” in Proceedings of BlackHat DC convention, 2008.

29. B. Danev, R. J. Masti, G. O. Karame, and S. Capkun, “Enabling secure VM-vTPM migra-tion in private clouds,” in Proceedings of the 27th Annual Computer Security ApplicationsConference, ser. ACSAC ’11, 2011, pp. 187–196. [Online]. Available: http://doi.acm.org/10.1145/2076732.2076759.Accessed November 20, 2014.

30. D. Agrawal, B. Archambeault, J. R. Rao, and P. Rohatgi, “The EM side—channel(s),” in Cryp-tographic Hardware and Embedded Systems - CHES 2002, ser. Lecture Notes in ComputerScience, B. S. Kaliski, c. K. Koç, and C. Paar, Eds. Springer Berlin Heidelberg, 2003, vol.2523, pp. 29–45. [Online]. Available: http://dx.doi.org/10.1007/3-540-36400-5_4. AccessedNovember 20, 2014.

31. P. C. Kocher, “Timing attacks on implementations of diffie-hellman, rsa, dss, and othersystems,” in Advances in Cryptology—CRYPTO’96, ser. Lecture Notes in Computer Sci-ence, N. Koblitz, Ed. Berlin: Springer, 1996, vol. 1109, pp. 104–113. [Online]. Available:http://dx.doi.org/10.1007/3-540-68697-5_9. Accessed November 20, 2014.

32. P. Kocher, J. Jaffe, and B. Jun, “Differential power analysis,” in Advances in Cryptology—CRYPTO’99, ser. Lecture Notes in Computer Science, M. Wiener, Ed. Berlin: Springer, 1999,vol. 1666, pp. 388–397. [Online]. Available: http://dx.doi.org/10.1007/3-540-48405-1_25.Accessed November 20, 2014.

33. D. Brumley and D. Boneh, “Remote timing attacks are practical,” Computer Networks, vol. 48,no. 5, pp. 701–716, 2005. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1389128605000125. Accessed November 20, 2014.

“9780471697558c03” — 2015/3/20 — 11:06 — page 71 — #23

REFERENCES 71

34. C. Fung, D. Lam, and R. Boutaba, “RevMatch: An efficient and robust decision model for col-laborative malware detection,” in IEEE/IFIP Network Operation and Management Symposium(NOMS14), 2014.

35. D. Perez-Botero, J. Szefer, and R. B. Lee, “Characterizing hypervisor vulnerabilities in cloudcomputing servers,” in Proceedings of the 2013 International Workshop on Security in CloudComputing, ser. Cloud Computing’13, 2013, pp. 3–10. [Online]. Available: http://doi.acm.org/10.1145/2484402.2484406. Accessed November 20, 2014.

36. D. Kreutz, F. M. Ramos, and P. Verissimo, “Towards secure and dependable software-definednetworks,” in Proceedings of the Second ACM SIGCOMM Workshop on Hot Topics inSoftware Defined Networking, ser. HotSDN ’13, 2013, pp. 55–60.

37. J. Mudigonda, P. Yalagandula, J. Mogul, B. Stiekes, and Y. Pouffary, “Netlord: A scalablemulti-tenant network architecture for virtualized datacenters,” in Proceedings of the ACMSIGCOMM 2011, ser. SIGCOMM ’11, 2011, pp. 62–73.

38. A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel,and S. Sengupta, “Vl2: A scalable and flexible data center network,” in Proceedings of theACM SIGCOMM 2009, ser. SIGCOMM’09, 2009, pp. 51–62.

39. K. Barabash, R. Cohen, D. Hadas, V. Jain, R. Recio, and B. Rochwerger, “A case for overlaysin DCN virtualization,” in Proceedings of the 3rd Workshop on Data Center—Converged andVirtual Ethernet Switching, ser. DC-CaVES ’11, 2011, pp. 30–37.

40. Y. Nakagawa, K. Hyoudou, and T. Shimizu, “A management method of IP multicast in overlaynetworks using openflow,” in Proceedings of the First Workshop on Hot Topics in SoftwareDefined Networks, ser. HotSDN ’12, 2012, pp. 91–96. [Online]. Available: http://doi.acm.org/10.1145/2342441.2342460. Accessed November 20, 2014.

41. M. Sridharan, K. Duda, I. Ganga, A. Greenberg, G. Lin, M. Pearson, and P. Thaler,“NVGRE: Network Virtualization using Generic Routing Encapsulation,” NVGRE, Inter-net Engineering Task Force, February 2013. [Online]. Available: http://tools.ietf.org/html/draft-sridharan-virtualization-nvgre-02. Accessed November 20, 2014.

42. I. Houidi, W. Louati, D. Zeghlache, P. Papadimitriou, and L. Mathy, “Adaptive virtual net-work provisioning,” in Proceedings of the Second ACM SIGCOMM Workshop on VirtualizedInfrastructure Systems and Architectures, 2010, pp. 41–48.

43. D. M. F. Mattos, N. C. Fernandes, V. T. da Costa, L. P. Cardoso, M. E. M. Campista,L. H. M. K. Costa, and O. C. M. B. Duarte, “OMNI: Openflow management infras-tructure,” in 2011 International Conference on the Network of the Future (NOF), 2011,pp. 52–56.

44. W. Kim, P. Sharma, J. Lee, S. Banerjee, J. Tourrilhes, S. Lee, and P. Yalagandula, “Automatedand scalable QoS control for network convergence,” Proceedings of the INM/WREN, April2010.

45. R. McIlory and J. Sventek, “Resource virtualisation of network routers,” in 2006 Workshopon High Performance Switching and Routing, 2006, 6 pp.

46. R. Wang, D. Butnariu, and J. Rexford, “Openflow-based server load balancing gone wild,”in Proceedings of the 11th USENIX Conference on Hot Topics in Management of Internet,Cloud, and Enterprise Networks and Services, 2011, pp. 12.

47. F. Hao, T. Lakshman, S. Mukherjee, and H. Song, “Enhancing dynamic cloud-based ser-vices using network virtualization,” in Proceedings of the 1st ACM Workshop on VirtualizedInfrastructure Systems and Architectures, 2009, pp. 37–44.

48. D. Phung, S. Secci, D. Saucez, and L. Iannone, “The openlisp control plane architecture,”IEEE Network, vol. 28, no. 2, pp. 34–40, 2014.

“9780471697558c03” — 2015/3/20 — 11:06 — page 72 — #24


49. P. Raad, S. Secci, D. C. Phung, A. Cianfrani, P. Gallard, and G. Pujolle, “Achieving sub-second downtimes in large-scale virtual machine migrations with lisp,” IEEE Transactions onNetwork and Service Management, vol. 11, no. 2, pp. 133–143, 2014.

50. S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn, “Ceph: A scalable,high-performance distributed file system,” in Proceedings of the 7th Symposium on OperatingSystems Design and Implementation, ser. OSDI ’06., 2006, pp. 307–320. [Online]. Available:http://dl.acm.org/citation.cfm?id=1298455.1298485. Accessed November 20, 2014.

51. I. Fajjari, N. Aitsaadi, G. Pujolle, and H. Zimmermann, “Vne-ac: Virtual network embed-ding algorithm based on ant colony metaheuristic,” in 2011 IEEE International Conferenceon Communications (ICC), 2011, pp. 1–6.

52. E. Rodriguez, G. Alkmim, D. Batista, and N. da Fonseca, “Live migration in green virtual-ized networks,” in 2013 IEEE International Conference on Communications (ICC), 2013, pp.2262–2266.

53. G. P. Alkmim, D. M. Batista, and N. L. S. da Fonseca, “Mapping virtual networks onto sub-strate networks,” Journal of Internet Services and Applications, vol. 4, no. 1, 2013. [Online].Available: http://dx.doi.org/10.1186/1869-0238-4-3. Accessed November 20, 2014.

54. L. H. G. Ferraz, P. B. Velloso, and O. C. M. Duarte, “An accurate and precise malicious nodeexclusion mechanism for ad hoc networks,” Ad Hoc Networks, vol. 19, pp. 142–155, 2014.[Online]. Available: http://www.sciencedirect.com/science/article/pii/S1570870514000468.Accessed November 20, 2014.

“9780471697558part2” — 2015/3/20 — 11:08 — page 73 — #1

PART II

CLOUD NETWORKINGAND COMMUNICATIONS

“9780471697558part2” — 2015/3/20 — 11:08 — page 74 — #2

“9780471697558c04” — 2015/3/20 — 11:09 — page 75 — #1

4

DATACENTER NETWORKS ANDRELEVANT STANDARDS

Daniel S. Marcon, Rodrigo R. Oliveira, Luciano P. Gaspary, andMarinho P. Barcellos

Institute of Informatics, Federal University of Rio Grande do Sul,Porto Alegre, Brazil

4.1 OVERVIEW

Datacenters are the core of cloud computing, and their network is an essential componentto allow distributed applications to run efficiently and predictably [1]. However, notall datacenters provide cloud computing. In fact, there are two main types of datacen-ters: production and cloud. Production datacenters are often shared by one tenant oramong multiple (possibly competing) groups, services, and applications, but with lowrate of arrival and departure. They run data analytics jobs with relatively little varia-tion in demands, and their size varies from hundreds of servers to tens of thousands ofservers. Cloud datacenters, in contrast, have high rate of tenant arrival and departure(churn) [2], run both user-facing applications and inward computation, require elasticity(since application demands are highly variable), and consist of tens to hundreds of thou-sands of physical servers [3]. Moreover, clouds can comprise several datacenters spreadaround the world. As an example, Google, Microsoft, and Amazon (three of the biggestplayers in the market) have datacenters in four continents; and each company has over900,000 servers.


75

“9780471697558c04” — 2015/3/20 — 11:09 — page 76 — #2

76 DATACENTER NETWORKS AND RELEVANT STANDARDS

This chapter presents an in-depth study of datacenter networks (DCNs), relevantstandards, and operation. Our goal here is three-fold: (i) provide a detailed view of thenetworking infrastructure connecting the set of servers of the datacenter via high-speedlinks and commodity off-the-shelf (COTS) switches [4]; (ii) discuss the addressing androuting mechanisms employed in this kind of network; and (iii) show how the nature oftraffic may impact DCNs and affect design decisions.

Providers typically have three main goals when designing a DCN [5]: scalability,fault tolerance, and agility. First, the infrastructure must scale to a large number ofservers (and preferably allow incremental expansion with commodity equipment and lit-tle effort). Second, a DCN should be fault tolerant against failures of both computing andnetwork resources. Third, a DCN ideally needs to be agile enough to assign any virtualmachine or, in short, VM (which is part of a service or application) to any server [6].As a matter of fact, DCNs should ensure that computations are not bottlenecked oncommunication [7].

Currently, providers attempt to meet these goals by implementing the networkas a multi-rooted tree [1], using LAN technology for VM addressing and two mainstrategies for routing: equal-cost multipath (ECMP) and valiant load balancing (VLB).The shared nature of DCNs among a myriad of applications and tenants and highscalability requirements, however, introduce several challenges for architecture design,protocols and strategies employed inside the network. Furthermore, the type of trafficin DCNs is significantly different from traditional networks [8]. Therefore, we also sur-vey recent proposals in the literature to address the limitations of technologies used intoday’s DCNs.

We structure this chapter as follows. First, we begin by examining the typicalmulti-rooted tree topology used in current datacenters and discuss its benefits anddrawbacks. Then, we take a look at novel topologies proposed in the literature, andhow network expansion can be performed in a cost-efficient way for providers. Afteraddressing the structure of the network, we look into the traffic characteristics of thesehigh-performance, dynamic networks and discuss proposals for traffic management ontop of existing topologies. Based on the aspects discussed so far, we present layer-2 andlayer-3 routing, its requirements and strategies typically employed to perform such task.We also examine existing mechanisms used for VM addressing in the cloud platformand novel proposals to increase flexibility and isolation for tenants. Finally, we discussthe most relevant open research challenges and close this chapter with a brief summaryof DCNs.

4.2 TOPOLOGIES

In this section, we present an overview of datacenter topologies. The topology describeshow devices (routers, switches and servers) are interconnected. More formally, this isrepresented as a graph, in which switches, routers and servers are the nodes, and linksare the edges.

“9780471697558c04” — 2015/3/20 — 11:09 — page 77 — #3

TOPOLOGIES 77

4.2.1 Typical Topology

Figure 4.1 shows a canonical three-tiered multi-rooted tree-like physical topology, whichis implemented in current datacenters [1, 9]. The three tiers are: (1) the access (edge)layer, comprising the top-of-rack (ToR) switches that connect servers mounted on everyrack; (2) the aggregation (distribution) layer, consisting of devices that interconnect ToRswitches in the access layer; and (3) the core layer, formed by routers that interconnectswitches in the aggregation layer. Furthermore, every ToR switch may be connected tomultiple aggregation switches for redundancy (usually 1+1 redundancy) and every aggre-gation switch is connected to multiple core switches. Typically, a three-tiered network isimplemented in datacenters with more than 8000 servers [4]. In smaller datacenters, thecore and aggregation layers are collapsed into one tier, resulting in a two-tiered datacentertopology (flat layer-2 topology) [9].

This multitiered topology has a significant amount of oversubscription, whereservers attached to ToR switches have significantly more (possibly an order of mag-nitude) provisioned bandwidth between one another than they do with hosts in otherracks [3]. Providers employ this technique in order to reduce costs and improve resourceutilization, which are key properties to help them achieve economies of scale.

This topology, however, presents some drawbacks. First, the limited bisection band-width1 constrains server-to-server capacity, and resources eventually get fragmented(limiting agility) [11, 12]. Second, multiple paths are poorly exploited (e.g., only a singlepath is used within a layer-2 domain by spanning tree protocol), which may poten-tially cause congestion on some links even though other paths exist in the network andhave available capacity. Third, the rigid structure hinders incremental expansion [13].

Core

Aggregation

Access (ToR)

Servers

Figure 4.1. A canonical three-tiered tree-like datacenter network topology.

1The bisection bandwidth of the network is the worst-case segmentation (i.e., with minimum bandwidth) ofthe network in two equally-sized partitions [10].

“9780471697558c04” — 2015/3/20 — 11:09 — page 78 — #4


Fourth, the topology is inherently failure-prone due to the use of many links, switchesand servers [14]. To address these limitations, novel network architectures have beenrecently proposed; they can be organized in three classes [15]: switch-oriented, hybridswitch/server and server-only topologies.

4.2.2 Switch-Oriented Topologies

These proposals use commodity switches to perform routing functions, and follow a clos-based design or leverage runtime reconfigurable optical devices. A clos network [16]consists of multiple layers of switches; each switch in a layer is connected to all switchesin the previous and next layers, which provides path diversity and graceful bandwidthdegradation in case of failures. Two proposals follow the Clos design: VL2 [6] andFat-Tree [4]. VL2, shown in Figure 4.2a, is an architecture for large-scale datacenters andprovides multiple uniform paths between servers and full bisection bandwidth (i.e., it isnon-oversubscribed). Fat-Tree, in turn, is a folded Clos topology. The topology, shownin Figure 4.2b, is organized in a non-oversubscribed k-ary tree-like structure, consistingof k-port switches. There are k two-layer pods with k/2 switches. Each k/2 switch in thelower layer is connected to k/2 servers, and the remaining ports are connected to k/2aggregation switches. Each of the (k/2)2k-port core switches has one port connected toeach of k pods. In general, a fat-tree built with k-port switches supports k3/4 hosts. Despitethe high capacity offered (agility is guaranteed), these architectures increase wiring costs(because of the number of links).

Optical switching architecture (OSA) [17], in turn, uses runtime reconfigurable opti-cal devices to dynamically change physical topology and one-hop link capacities (within10 milliseconds). It employs hop-by-hop stitching of multiple optical links to provideall-to-all connectivity for the highly dynamic and variable network demands of cloudapplications. This method is shown in the example of Figure 4.3. Suppose that demandschange from the left table to the right table in the figure (with a new highlighted entry).The topology must be adapted to the new traffic pattern, otherwise there will be at leastone congested link. One possible approach is to increase capacity of link F–G (by reduc-ing capacity of links F–D and G–C), so congestion can be avoided. Despite the flexibilityachieved, OSA suffers from scalability issues, since it is designed to connect only a few

Layer 1

(a)

Layer 2

Layer 3

Servers

Core

(b)

Aggregation

Access (ToR)

Servers

Pod 0 Pod 1 Pod 3Pod 2

Figure 4.2. Clos-based topologies. (a) VL2 and (b) Fat-tree.

“9780471697558c04” — 2015/3/20 — 11:09 — page 79 — #5

TOPOLOGIES 79

AG

H

C E B

FDA

BCDB

GHEFD

1010101010

AG

H

C ED

FDA

BCFB

GHEGD

1010102010

Figure 4.3. OSA adapts according to demands (adapted from Ref. [17]).

thousands of servers in a container, and latency-sensitive flows may be affected by linkreconfiguration delays.

4.2.3 Hybrid Switch/Server Topologies

These architectures shift complexity from network devices to servers, i.e., servers per-form routing, while low-end mini-switches interconnect a fixed number of hosts. Theycan also provide higher fault-tolerance, richer connectivity and improve innovation,because hosts are easier to customize than commodity switches. Two example topologiesare DCell [5] and BCube [18], which can arguably scale up to millions of servers.

DCell [5] is a recursively built structure that forms a fully connected graph usingonly commodity switches (as opposed to high-end switches of traditional DCNs). DCellaims to scale out to millions of servers with few recursion levels (it can hold 3.2 millionservers with only four levels and six hosts per cell). A DCell network is built as follows.A level-0 DCell (DCell0) comprises servers connected to a n-port commodity switch.DCell1 is formed with n + 1 DCell0; each DCell0 is connected to all other DCell0 withone bidirectional link. In general, a level-k DCell is constructed with n + 1 DCellk−1 in thesame manner as DCell1. Figure 4.4a shows an example of a two-level DCell topology. Inthis example, a commodity switch is connected with four servers (n = 4) and, therefore,a DCell1 is constructed with 5 DCell0. The set of DCell0 is interconnected in the followingway: each server is represented by the tuple (a1, a0), where a1 and a0 are level 1 and 0identifiers, respectively; and a link is created between servers identified by the tuples(i, j − 1) and (j, i), for every i and every j > i.

Similarly to DCell, BCube [18] is a recursively built structure that is easy to designand upgrade. Additionally, BCube provides low latency and graceful degradation ofbandwidth upon link and switch failure. In this structure, clusters (a set of servers inter-connected by a switch) are interconnected by commodity switches in a hypercube-basedtopology. More specifically, BCube is constructed as follows: BCube0 (level-0 BCube)consists of n servers connected by a n-port switch; BCube1 is constructed from n BCube0

and n n-port switches; and BCubek is constructed from n BCubek−1 and nk n-portswitches. Each server is represented by the tuple (x1, x2), where x1 is the cluster num-ber and x2 is the server number inside the cluster. Each switch, in turn, is represented bya tuple (y1, y2), where y1 is the level number and y2 is the switch number inside the level.Links are created by connecting the level-k port of the i-th server in the j-th BCubek−1

to the j-th port of the i-th level-k switch. An example of two-level BCube with n = 4(4-port switches) is shown in Figure 4.4b.

“9780471697558c04” — 2015/3/20 — 11:09 — page 80 — #6


(a)

DCell0[4]

(4,0) (4,1)

(4,3)

DC

ell0

[3]

(3,0

)(3

,1)

(3,3

)

DCell0[0]

(0,0)

(0,3)

DC

ell0

[1]

(1,0

)(1

,2)

(1,3

)

DCell0[2]

(2,3) (2,2) (2,1) (2,0)

(b)

<1,0>

<0,0>

0,0 0,1 0,2 0,3 1,0 1,1 1,2 1,3 2,0 2,1 2,2 2,3 3,0 3,1 3,2 3,3

<0,1> <0,2> <0,3>Level 0

Level 1

<1,1> <1,2> <1,2>

Figure 4.4. Hybrid switch/server topologies. (a) Two-level DCell and (b) two-level BCube.

Despite the benefits, DCell and BCube require a high number of NIC ports atend-hosts — causing some overhead at servers — and increase wiring costs. In par-ticular, DCell results in non-uniform multiple paths between hosts, and level-0 links aretypically more utilized than other links (creating bottlenecks). BCube, in turn, providesuniform multiple paths, but uses more switches and links than DCell [18].

4.2.4 Server-Only Topology

In this kind of topology, the network comprises only servers that perform all networkfunctions. An example of architecture is CamCube [19], which is inspired in ContentAddressable Network (CAN) [20] overlays and uses a 3D torus (k-ary 3-cube) topol-ogy with k servers along each axis. Each server is connected directly to 6 other servers,and the edge servers are wrapped. Figure 4.5 shows a 3-ary CamCube topology, result-ing in 27 servers. The three most positive aspects of CamCube are (1) providing robustfault-tolerance guarantees (unlikely to partition even with 50% of server or link failures);(2) improving innovation with key-based server-to-server routing (content is hashed to alocation in space defined by a server); and (3) allowing each application to define spe-cific routing techniques. However, it does not hide topology from applications, has highernetwork diameter O( 3

√N) (increasing latency and traffic in the network) and hinders

network expansion.

4.2.5 Summary of Topologies

Table 4.1 summarizes the benefits and limitations of these topologies by taking fourproperties into account: scalability, resiliency, agility and cost. The typical DCN topol-ogy has limited scalability (even though it can support hundreds of thousands of servers),as COTS switches have restricted memory size and need to maintain an entry in their For-warding Information Base (FIB) for each VM. Furthermore, it presents low resiliency,since it provides only 1+1 redundancy, and its oversubscribed nature hinders agility.

“9780471697558c04” — 2015/3/20 — 11:09 — page 81 — #7

TOPOLOGIES 81

(0,2

,2)

(0,2

,1)

(0,2

,0)

(1,2

,0)

(1,2

,1)

(2,2

,0)

(2,2

,1)

(2,1

,2)

(2,1

,1)

(1,1

,2)

(0,1

,2)

(0,1

,1)

(0,1

,0)

(0,0

,2)

(0,0

,1)

(0,0

,0)

(1,0

,0)

(1,0

,1)

(1,1

,0)

(1,1

,1)

(2,1

,0)

(1,0

,2)

(2,0

,2)

(2,0

,1)

(2,0

,0)

(1,2

,2)

(2,2

,2)

Figure 4.5. Example of 3-ary CamCube topology (adapted from Ref. [21]).

TABLE 4.1. Comparison among datacenter network topologies

ProposalProperties

Scalability Resiliency Agility Cost

Typical DCN Low Low No LowFat-Tree High Average Yes AverageVL2 High High Yes HighOSA Low High No HighDCell Huge High No HighBCube Huge High Yes HighCamCube Low High No Average

Despite the drawbacks, it can be implemented with only commodity switches, resultingin lower costs.

Fat-Tree and VL2 are both instances of a Clos topology with high scalability and fullbisection bandwidth (guaranteed agility). Fat-Tree achieves average resiliency, as ToRswitches are connected only to a subset of aggregation devices, and has average overallcosts (mostly because of increased wiring). VL2 scales through packet encapsulation,maintaining forwarding state only for switches in the network, achieves high resiliencyby providing multiple shortest paths and by relying on a distributed lookup entity forhandling address queries. As a downside, its deployment has increased costs (due towiring, significant amount of exclusive resources for running the lookup system and theneed of switch support for IP-in-IP encapsulation).

“9780471697558c04” — 2015/3/20 — 11:09 — page 82 — #8


OSA was designed taking flexibility into account in order to improve resiliency (i.e.,by using runtime reconfigurable optical devices to dynamically change physical topologyand one-hop link capacities). However, it has low scalability (up to a few thousands ofservers), no agility (as dynamically changing link capacities may result in congestedlinks) and higher costs (devices should support optical reconfiguration).

DCell and BCube aim at scaling to millions of servers while ensuring high resiliency(rich connectivities between end-hosts). In contrast to BCube, DCell does not provideagility, as the set set of non-uniform multiple paths may be bottlenecked by links atlevel-0. Finally, their deployment costs may be significant, since they require a lot ofwiring and more powerful servers in order to efficiently perform routing.

CamCube, in turn, is unlikely to partition even with 50% of server or link failures,thus achieving high resiliency. Its drawback, however, is related to scalability and agility;both properties can be hindered because of high network diameter, which indicates that,on average, more resources are needed for communication between VMs hosted by dif-ferent servers. CamCube also has average deployment costs, mainly due to wiring andthe need of powerful servers (to perform network functions).

As we can see, there is no perfect topology, since each proposal focus on specificaspects. Ultimately, providers are cost-driven: they choose the topology with the lowestcosts, even if it cannot achieve all properties desired for a datacenter network runningheterogenous applications from many tenants.

4.3 NETWORK EXPANSION

A key challenge concerning datacenter networks is dealing with the harmful effects thattheir ever-growing demand causes on scalability and performance. Because current DCNtopologies are restricted to 1+1 redundancy and suffer from oversubscription, they canbecome underprovisioned quite fast. The lack of available bandwidth, in turn, may causeresource fragmentation (since it limits VM placement) [11] and reduce server utilization(as computations often depend on the data received from the network) [2]. In conse-quence, the DCN can loose its ability to accommodate more tenants (or offer elasticityto the current ones); even worse, applications using the network may start performingpoorly, as they often rely on strict network guarantees2.

These fundamental shortcomings have stimulated the development of novel DCNarchitectures (seen in Section 4.2) that provide large amounts of (or full) bisection band-width for up to millions of servers. Despite achieving high bisection bandwidth, theirdeployment is hindered by the assumption of homogeneous sets of switches (with thesame number of ports). For example, consider a Fat-Tree topology, where the entirestructure is defined by the number of ports in switches. These homogeneous switcheslimit the structure in two ways: full bisection bandwidth can only be achieved with

2For example, user-facing applications, such as Web services, require low-latency for communication withusers, while inward computation (e.g., Map-Reduce) requires reliability and bisection bandwidth in the intra-cloud network.

“9780471697558c04” — 2015/3/20 — 11:09 — page 83 — #9

NETWORK EXPANSION 83

specific numbers of servers (e.g., 8,192 and 27,648) and incremental upgrade may requirereplacing every switch in the network [13].

In fact, most physical datacenter designs are unique; hence, expansions and upgradesmust be custom-designed and network performance (including bisection bandwidth, end-to-end latency and reliability) must be maximized while minimizing provider costs [11,12]. Furthermore, organizations need to be able to incrementally expand their networksto meet the growing demands of tenants [13]. These facts have motivated recent studies[7, 11–13] to develop techniques to expand current DCNs to boost bisection bandwidthand reliability with heterogeneous sets of devices (i.e., without replacing every routerand switch in the network). They are discussed next.

4.3.1 Legup

Focused on tree-like networks, Legup [12] is a system that aims at maximizing networkperformance at the design of network upgrades and expansions. It utilizes a linear modelthat combines three metrics (agility, reliability and flexibility), while being subject tothe cloud provider’s budget and physical constraints. In an attempt to reduce costs, theauthors of Legup develop the Theory of Heterogeneous Clos Networks to allow modernand legacy equipment to coexist in the network. Figure 4.6 depicts an overview of thesystem. Legup assumes an existing set of racks and, therefore, only needs to determineaggregation and core levels of the network (more precisely, the set of devices, how theyinterconnect, and how they connect to ToR switches). It employs a branch and boundoptimization algorithm to explore the solution space only for aggregation switches, ascore switches in a heterogeneous Clos network are restricted by aggregation ones. Givena set of aggregation switches in each step of the algorithm, Legup performs three actions.First, it computes the minimum cost for mapping aggregation switches to racks. Second,it finds the minimum cost distribution of core switches to connect to the set of aggregationswitches. Third, the candidate solution is bounded to check its optimality and feasibil-ity (by verifying if any constraint is violated, including provider’s budget and physicalrestrictions).

Branch and bound algorithm

Bounding

functionFeasibility check

Mapping of

aggregationswitches

Core switch

selection

DCN designDCN design,switch types

and physical details

Figure 4.6. Legup’s overview (adapted from Ref. [12]).

“9780471697558c04” — 2015/3/20 — 11:09 — page 84 — #10


665

4

3

(a) (b)

4

2

X

X

Figure 4.7. Comparison between (a) Fat-Tree and (b) Jellyfish with identical equipment

(adapted from Ref. [13]).

4.3.2 Rewire

Recent advancements in routing protocols may allow DCNs to shift from a rigid tree toa generic structure [11, 22–25]. Based on this observation, Rewire [11] is a frameworkthat performs DCN expansion on arbitrary topologies. It has the goal of maximizing net-work performance (i.e., finding maximum bisection bandwidth and minimum end-to-endlatency), while minimizing costs and satisfying user-defined constraints. In particular,Rewire adopts a different definition of latency: while other studies model it by theworst-case hop-count in the network, Rewire also considers the speed of links and theprocessing time at switches (because unoptimized switches can add an order of mag-nitude more processing delay). Rewire uses simulated annealing (SA) [26] to searchthrough candidate solutions and implements an approximation algorithm to efficientlycompute their bisection bandwidth. The simulated annealing, however, does not takethe addition of switches into account; it only optimizes the network for a given set ofswitches. Moreover, the process assumes uniform queuing delays for all switch ports,which is necessary because Rewire does not possess knowledge of network load.

4.3.3 Jellyfish

End-to-end throughput of a network is quantitatively proved to depend on two fac-tors: (1) the capacity of the network and (2) the average path length (i.e., throughput isinversely proportional to the capacity consumed to deliver each byte) [13]. Furthermore,as noted earlier, rigid DCN structures hinder incremental expansion. Consequently, adegree-bounded3 random graph topology among ToR switches, called Jellyfish [13], isintroduced, with the goal of providing high bandwidth and flexibility. It supports deviceheterogeneity, different degrees of oversubscription and easy incremental expansion (bynaturally allowing the addition of heterogeneous devices). Figure 4.7 shows a comparison

3Degree-bounded, in this context, means that the number of connections per node is limited by the number ofports in switches.

“9780471697558c04” — 2015/3/20 — 11:09 — page 85 — #11

TRAFFIC 85

of Fat-Tree and Jellyfish with identical equipment and same diameter (i.e., 6). Each ringin the figure contains servers reachable within the number of hops in the labels. We seethat Jellyfish can reach more servers in fewer hops, because some links are not use-ful from a path-length perspective in a Fat-Tree (e.g., links marked with “x”). Despiteits benefits, Jellyfish’s random design brings up some challenges, such as routing andthe physical layout. Routing, in particular, is a critical feature needed, because it allowsthe use of the topology’s high capacity. However, results show that the commonly usedECMP does not utilize the entire capacity of Jellyfish, and the authors propose the useof k-shortest paths and MultiPath TCP [25] to improve throughput and fairness.

4.3.4 Random Graph-Based Topologies

Singla et al. [7] analyze the throughput achieved by random graphs for topologies withboth homogeneous and heterogeneous switches, while taking optimization into account.They obtain the following results: random graphs achieve throughput close to the optimalupper-bound under uniform traffic patterns for homogeneous switches, and heteroge-neous networks with distinct connectivity arrangements can provide nearly identical highthroughput. Then, the acquired knowledge is used as a building block for designing large-scale random networks with heterogeneous switches. In particular, they utilize the VL2deployed in Microsoft’s datacenters as a case study, showing that its throughput can besignificantly improved (up to 43%) by only rewiring the same devices.

4.4 TRAFFIC

Proposals of topologies for datacenter networks presented in Sections 4.2 and 4.3 sharea common goal: provide high bisection bandwidth for tenants and their applications. It isintuitive that a higher bisection bandwidth will benefit tenants, since the communicationbetween VMs will be less prone to interference. Nonetheless, it is unclear how strong isthe impact of the bisection bandwidth. This section addresses this question by surveyingseveral recent measurement studies of DCNs. Then, it reviews proposals for dealing withrelated limitations. More specifically, it discusses traffic patterns—highlighting theirproperties and implications for both providers and tenants—and shows how literatureis using such information to help designing and managing DCNs.

Traffic can be divided in two broad categories: north/south and east/west communi-cation. North/south traffic (also known as extra-cloud) corresponds to the communicationbetween a source and a destination host where one of the ends is located outside the cloudplatform. By contrast, east/west traffic (also known as intra-cloud) is the communicationin which both ends are located inside the cloud. These types of traffic usually dependon the kind and mix of applications: user-facing applications (e.g., web services) typ-ically exchange data with users and, thus, generate north/south communication, whileinward computation (i.e., MapReduce) requires coordination among its VMs, generat-ing east/west communication. Studies [27] indicate that north/south and east-west trafficcorrespond to around 25% and 75% of traffic volume, respectively. They also point that

“9780471697558c04” — 2015/3/20 — 11:09 — page 86 — #12


both are increasing in absolute terms, but east/west is growing on a larger scale [27].Towards understanding traffic characteristics and how it influences the proposal of novelmechanisms, we first discuss traffic properties defined by measurement studies in the lit-erature [9, 28–30] and, then, examine traffic management and its most relevant proposalsfor large-scale cloud datacenters.

4.4.1 Properties

Traffic in the cloud network is characterized by flows; each flow is identified bysequences of packets from a source to a destination node (i.e., a flow is defined by aset packet header fields, such as source and destination addresses and ports and transportprotocol). Typically, a bimodal flow classification scheme is employed, using elephantand mice classes. Elephant flows comprise a large number of packets injected in the net-work over a short amount of time, are usually long-lived and exhibit bursty behavior. Incomparison, mice flows have a small number of packets and are short-lived [3]. Severalmeasurement studies [9, 28–31] were conducted to characterize network traffic and itsflows. We summarize their findings as follows:

• Traffic asymmetry. Requests from users to cloud services are abundant, but smallin most occasions. Cloud services, however, process these requests and typicallysend responses that are comparatively larger.

• Nature of traffic. Network traffic is highly volatile and bursty, with links run-ning close to their capacity at several times during a day. Traffic demands changequickly, with some transient spikes and other longer ones (possibly requiringmore than half the full-duplex bisection bandwidth) [32]. Moreover, traffic isunpredictable at long time scales (e.g., 100 seconds or more). However, it canbe predictable on shorter timescales (at 1 or 2 seconds). Despite the predictabil-ity over small timescales, it is difficult for traditional schemes, such as statisticalmultiplexing, to make a reliable estimate of bandwidth demands for VMs [33].

• General traffic location and exchange. Most traffic generated by servers (on aver-age 80%) stays within racks. Server pairs from the same rack and from differentracks exchange data with a probability of only 11% and 0.5%, respectively. Prob-abilities for intra- and extra-rack communication are as follows: servers either talkwith fewer than 25% or to almost all servers of the same rack; and servers com-municate with less than 10% or do not communicate with servers located outsideits rack.

• Intra- and inter-application communication. Most volume of traffic (55%) repre-sents data exchange between different applications. However, the communicationmatrix between them is sparse; only 2% of application pairs exchange data, withthe top 5% of pairs accounting for 99% of inter-application traffic volume. Conse-quently, communicating applications form several highly connected components,with few applications connected to hundreds of other applications in star-liketopologies. In comparison, intra-application communication represents 45% of thetotal traffic, with 18% of applications generating 99% of this traffic volume.

“9780471697558c04” — 2015/3/20 — 11:09 — page 87 — #13

TRAFFIC 87

• Flow size, duration, and number. Mice flows represent around 99% of the totalnumber of flows in the network. They usually have less than 10 kilobytes and lastonly a few hundreds of milliseconds. Elephant flows, in turn, represent only 1%of the number of flows, but account for more than half of the total traffic volume.They may have tens of megabytes and last for several seconds. With respect toflow duration, flows of up to 10 seconds represent 80% of flows, while flows of200 seconds are less than 0.1% (and contribute to less than 20% of the total trafficvolume). Further, flows of 25 seconds or less account for more than 50% of bytes.Finally, it has been estimated that a typical rack has around 10,000 active flowsper second, which means that a network comprising 100,000 servers can have over25,000,000 active flows.

• Flow arrival patterns. Arrival patterns can be characterized by heavy-tailed dis-tributions with a positive skew. They best fit a log-normal curve having ON andOFF periods (at both 15 and 100 milliseconds granularities). In particular, interarrival times at both servers and ToR switches have periodic modes spaced apart byapproximately 15 milliseconds, and the tail of these distributions is long (serversmay experience flows spaced apart by 10 seconds).

• Link utilization. Utilization is, on average, low in all layers but the core; in fact,in the core, a subset of links (up to 25% of all core links) often experience highutilization. In general, link utilization varies according to temporal patterns (timeof day, day of week and month of year), but variations can be an order of magnitudehigher at core links than at aggregation and access links. Due to these variationsand the bursty nature of traffic, highly utilized links can happen quite often; 86%and 15% of links may experience congestion lasting at least 10 and 100 seconds,respectively, while longer periods of congestion tend to be localized to a small setof links.

• Hot spots. They are usually located at core links and can appear quite frequently,but the number of hot spots never exceeds 25% of core links.

• Packet losses. Losses occur frequently even at underutilized links. Given the burstynature of traffic, an underutilized network (e.g., with mean load of 10%) can expe-rience lots of packet drops. Measurement studies found that packet losses occurusually at links with low average utilization (but with traffic bursts that go beyond100% of link capacity); more specifically, such behavior happens at links of theaggregation layer and not at links of the access and core layers. Ideally, topologieswith full bisection bandwidth (i.e., a Fat-Tree) should experience no loss, but theemployed routing mechanisms cannot utilize the full capacity provided by the setof multiple paths and, consequently, there is some packet loss in such networks aswell [28].

4.4.2 Traffic Management

Other set of papers [34–37] demonstrate that available bandwidth for VMs inside thedatacenter can vary by a factor of five or more in the worst-case scenario. Such variabilityresults in poor and unpredictable network performance and reduced overall application

“9780471697558c04” — 2015/3/20 — 11:09 — page 88 — #14


performance [1, 38, 39], since VMs usually depend on the data received from the networkto execute the subsequent computation.

The lack of bandwidth guarantees is related to two main factors. First, the canonicalcloud topology is typically oversubscribed, with more bandwidth available in leaf nodesthan in the core. When periods of traffic bursts happen, the lack of bandwidth up thetree (i.e., at aggregation and core layers) results in contention and, therefore, packet dis-cards at congested links (leading to subsequent retransmissions). Since the duration ofthe timeout period is typically one or two orders of magnitude more than the round-triptime, latency is increased, becoming a significant source of performance variability [3].Second, TCP congestion control does not provide robust isolation among flows. Conse-quently, elephant flows can cause contention in congested links shared with mice flows,leading to discarded packets from the smaller flows [2].

Recent proposals address this issue either by employing proportional sharing or byproviding bandwidth guarantees. Most of them use the hose model [40] for networkvirtualization and take advantage of rate-limiting at hypervisors [41], VM placement [42]or virtual network embedding [43] in order to increase their robustness.

Proportional sharing. Seawall [2] and NetShare [44] allocate bandwidth atflow-level based on weights assigned to entities (i.e., VMs or services running insidethese VMs) that generate traffic in the network. While both assign weights based onadministrator specified policies, NetShare also supports automatic weight assignment.Both schemes are work-conserving (i.e., available bandwidth can be used by any flowthat needs more bandwidth), provide max–min fair sharing and achieve high utilizationthrough statistical multiplexing. However, as bandwidth allocation is performed per flow,such methods may introduce substantial management overhead in large datacenter net-works (with over 10,000 flows per rack per second [9]). FairCloud [45] takes a differentapproach and proposes three allocation policies to explore the trade-off among networkproportionality, minimum guarantees and high utilization. Unlike Seawall and NetShare,FairCloud does not allocate bandwidth along congested links at flow-level, but in pro-portion to the number of VMs of each tenant. Despite the benefits, FairCloud requirescustomized hardware in switches and is designed specifically for tree-like topologies.

Bandwidth guarantees. SecondNet [46], Gatekeeper [47], Oktopus [1], Proteus [48],and Hadrian [49] provide minimum bandwidth guarantees by isolating applications invirtual networks. In particular, SecondNet is a virtualization architecture that distributesthe virtual-to-physical mapping, routing and bandwidth reservation state in server hyper-visors. Gatekeeper configures each VM virtual NIC with both minimum and maximumbandwidth rates, which allows the network to be shared in a work-conserving man-ner. Oktopus maps tenants’ virtual network requests (with or without oversubscription)onto the physical infrastructure and enforces these mappings in hypervisors. Proteusis built based on the observation that allocating the peak bandwidth requirements forapplications leads to underutilization of resources. Hence, it quantifies the temporalbandwidth demands of applications and allocates each one of them in a different virtualnetwork. Hadrian extends previous schemes by also taking inter-tenant communicationinto account and allocating applications according to a hierarchical hose model (i.e., perVM minimum bandwidth for intra-application communication and per tenant minimumguarantees for inter-tenant traffic). By contrast, a group of related proposals attempt to

“9780471697558c04” — 2015/3/20 — 11:09 — page 89 — #15

ROUTING 89

provide some level bandwidth sharing among applications of distinct tenants [50–52].The approach introduced by Marcon et al. [51] groups applications in virtual networks,taking mutually trusting relationships between tenants into account when allocating eachapplication. It provides work-conserving network sharing, but assumes that trust relation-ships are determined in advance. ElasticSwitch [52] assumes there exists an allocationmethod in the cloud platform and focuses on providing minimum bandwidth guaranteeswith a work-conserving sharing mechanism (when there is spare capacity in the net-work). Nevertheless, it requires two extra management layers for defining the amount ofbandwidth for each flow, which may add overhead. Finally, EyeQ [50] leverages highbisection bandwidth of DCNs to support minimum and maximum bandwidth rates forVMs. Therefore, it provides work-conserving sharing, but depends on the core of the net-work to be congestion-free. None of these approaches can be readily deployed, as theydemand modifications in hypervisor source code.

4.5 ROUTING

Datacenter networks often require specially tailored routing protocols, with differentrequirements from traditional enterprise networks. While the latter presents only ahandful of paths between hosts and predictable communication patterns, DCNs requiremultiple paths to achieve horizontal scaling of hosts with unpredictable traffic matri-ces [4, 6]. In fact, datacenter topologies (i.e., the ones discussed in Section 4.2) typicallypresent path diversity, in which multiple paths exist between servers (hosts) in the net-work. Furthermore, many cloud applications (ranging from Web search to MapReduce)require substantial (possibly full bisection) bandwidth [53]. Thus, routing protocols mustenable the network to deliver high bandwidth by using all possible paths in the structure.We organize the discussion according to the layer involved, starting with the networklayer.

4.5.1 Layer 3

To take advantage of the multiple paths available between a source and its destination,providers usually employ two techniques: ECMP [54] and VLB [6, 55, 56]. Both strate-gies use distinct paths for different flows. ECMP attempts to load balance traffic in thenetwork and utilize all paths which have the same cost (calculated by the routing pro-tocol) by uniformly spreading traffic among them using flow hashing. VLB randomlyselects an intermediate router (occasionally, a L3 switch) to forward the incoming flowto its destination.

Recent studies in the literature [46, 53, 57, 58] propose other routing techniques forDCNs. As a matter of fact, the static flow-to-path mapping performed by ECMP doesnot take flow size and network utilization into account [59]. This may result in saturatingcommodity switch L3 buffers and degrading overall network performance [53]. There-fore, a system called Hedera [53] is introduced to allow dynamic flow scheduling forgeneral multi-rooted trees with extensive path diversity. Hedera is designed to maximize

“9780471697558c04” — 2015/3/20 — 11:09 — page 90 — #16


Input Output

Fwd

process1

4

3

20 0

1 1

2 1 Virtual-port table

1 4 1 S-MAC1

D-MAC1

1

1

4 1 S-MAC2

D-MAC2

0

Figure 4.8. PSSR overview (adapted from Ref. [46]).

network utilization with low scheduling overhead of active flows. In general, the sys-tem performs the following steps: (1) detects large flows at ToR switches; (2) estimatesnetwork demands of these large flows (with a novel algorithm that considers bandwidthconsumption according to a max–min fair resource allocation); (3) invokes a placementalgorithm to compute paths for them; and (4) installs the set of new paths on switches.

Hedera uses a central OpenFlow controller4 [60] with a global view of the networkto query devices, obtain flow statistics and install new paths on devices after computingtheir routes. With information collected from switches, Hedera treats the flow-to-pathmapping as an optimization problem and uses a simulated annealing metaheuristic toefficiently look for feasible solutions close to the optimal one in the search space. SAreduces the search space by allowing only a single core switch to be used for each des-tination. Overall, the system delivers close to optimal performance and up to four timesmore bandwidth than ECMP.

Port-switching based source routing (PSSR) [46] is proposed for the SecondNetarchitecture with arbitrary topologies and commodity switches. PSSR uses source rout-ing, which requires that every node in the network knows the complete path to reacha destination. It takes advantage of the fact that a datacenter is administered by a sin-gle entity (i.e., the intra-cloud topology is known in advance) and represents a path as asequence of output ports in switches, which is stored in the packet header. More specif-ically, the hypervisor of the source VM inserts the routing path in the packet header,commodity switches perform the routing process with PSSR and the destination hyper-visor removes PSSR information from the packet header and delivers the packet to thedestination VM. PSSR also introduces the use of virtual ports, because servers may havemultiple neighbors via a single physical port (e.g., in DCell and BCube topologies). Theprocess performed by a switch is shown in Figure 4.8. Switches read the pointer field inthe packet header to get the exact next output port number (step 1), verify the next portnumber in the lookup virtual-port table (step 2), get the physical port number (step 3)and, in step 4, update the pointer field and forward the packet. This routing method intro-duces some overhead (since routing information must be included in the packet header),but, according to the authors, can be easily implemented on commodity switches usingMulti-Protocol Label Switching (MPLS) [61].

4We will not focus our discussion in OpenFlow in this chapter. It is discussed in Chapter 6.

“9780471697558c04” — 2015/3/20 — 11:09 — page 91 — #17

ROUTING 91

Bounded Congestion Multicast Scheduling (BCMS) [57], introduced to efficientlyroute flows in Fat-trees under the hose traffic model, aims at achieving bounded con-gestion and high network utilization. By using multicast, it can reduce traffic, thusminimizing performance interference and increasing application throughput [62]. BCMSis an online multicast scheduling algorithm that leverages OpenFlow to (1) collect band-width demands of incoming flows; (2) monitor network load; (3) compute routing pathsfor each flow; and (4) configure switches (i.e., installing appropriate rules to route flows).The algorithm has three main steps, as follows. First, it checks the conditions of uplinksout of source ToR switches (as flows are initially routed towards core switches). Second,it carefully selects a subset of core switches in order to avoid congestion. Third, it furtherimproves traffic load balance by allowing ToR switches to connect to core switches withmost residual bandwidth. Despite its advantages, BCMS relies on a centralized controller,which may not scale to large datacenters under highly dynamic traffic patterns such asthe cloud.

Like BCMS, Code-Oriented eXplicit multicast (COXcast) [58] also focuses on rout-ing application flows through the use of multicasting techniques (as a means of improvingnetwork resource sharing and reducing traffic). COXcast uses source routing, so all infor-mation regarding destinations are added to the packet header. More specifically, theforwarding information is encoded into an identifier in the packet header and, at eachnetwork device, is resolved into an output port bitmap by a node-specific key. COXcastcan support a large number of multicast groups, but it adds some overhead to packets(since all information regarding routing must be stored in the packet).

4.5.2 Layer 2

In the Spanning Tree Protocol (STP) [63], all switches agree on a subset of links to beused among them, which forms a spanning tree and ensures a loop-free network. Despitebeing typically employed in Ethernet networks, it does not scale, since it cannot usethe high-capacity provided by topologies with rich connectivities (i.e., Fat-Trees [24]),limiting application network performance [64]. Therefore, only a single path is usedbetween hosts, creating bottlenecks and reducing overall network utilization.

STP’s shortcomings are addressed by other protocols, including Multiple SpanningTree Protocol (MSTP) [65], Transparent Interconnect of Lots of Links (TRILL) [22]and Link Aggregation Control Protocol (LACP) [66]. MSTP was proposed in an attemptto use the path diversity available in DCNs more efficiently. It is an extension of STPto allow switches to create various spanning trees over a single topology. Therefore,different Virtual LANs (VLANs) [67] can utilize different spanning trees, enabling theuse of more links in the network than with a single spanning tree. Despite its objective,implementations only allow up to 16 different spanning trees, which may not be sufficientto fully utilize the high-capacity available in DCNs [68].

TRILL is a link-state routing protocol implemented on top of layer-2 technologies,but bellow layer-3, and is designed specifically to address limitations of STP. It discoversand calculates shortest paths between TRILL devices (called routing bridges or, in short,RBridges), which enables shortest path multihop routing in order to use all available paths

“9780471697558c04” — 2015/3/20 — 11:09 — page 92 — #18


in networks with rich connectivities. RBridges run Intermediate System to IntermediateSystem (IS-IS) routing protocol (RFC 1195) and handle frames in the following manner:the first RBridge (ingress node) encapsulates the incoming frame with a TRILL header(outer MAC header) that specifies the last TRILL node as the destination (egress node),which will decapsulate the frame.

Link Aggregation Control Protocol (LACP) is another layer-2 protocol used inDCNs. It transparently aggregates multiple physical links into one logical link knownas Link Aggregation Group (LAG). LAGs only handle outgoing flows; they have nocontrol over incoming traffic. They provide flow-level load balancing among links in thegroup by hashing packet header fields. LACP can dynamically add or remove links inLAGs, but requires that both ends of a link run the protocol.

There are also some recent studies that propose novel strategies for routing framesin DCNs, namely Smart Path Assignment in Networks (SPAIN) [24] and Portland [64].SPAIN [24] focuses on providing efficient multipath forwarding using COTS switchesover arbitrary topologies. It has three components: (1) path computation; (2) path setup;and (3) path selection. The first two components run on a centralized controller withglobal network visibility. The controller first pre-computes a set of paths to exploit therich connectivities in the DCN topology, in order to use all available capacity of thephysical infrastructure and to support fast failover. After the path computation phase,the controller combines these multiple paths into a set of trees, with each tree belongingto a distinct VLAN. Then, these VLANs are installed on switches. The third compo-nent (path selection) runs at end-hosts for each new flow; it selects paths for flows withthe goals of spreading load across the pre-computed routes (by the path setup compo-nent) and minimizing network bottlenecks. With this configuration, end-hosts can selectdifferent VLANs for communication (i.e., different flows between the same source anddestination can use distinct VLANs for routing). To provide these functionalities, how-ever, SPAIN requires some modification to end-hosts, adding an algorithm to chooseamong pre-installed paths for each flow.

PortLand [64] is designed and built based on the observation that Ethernet/IP proto-cols may have some inherent limitations when designing large-scale arbitrary topologies,such as limited support for VM migration, difficult management and inflexible commu-nication. It is a layer-2 routing and forwarding protocol with plug-and-play support formulti-rooted Fat-Tree topologies. PortLand uses a logically centralized controller (calledfabric manager) with global visibility and maintains soft state about network config-uration. It assigns unique hierarchical Pseudo MAC (PMAC) addresses for each VMto provide efficient, provably loop-free frame forwarding; VMs, however, do not havethe knowledge of their PMAC and believe they use their Actual MAC (AMAC). Themapping between PMAC and AMAC and the subsequent frame header rewriting is per-formed by edge (ToR) switches. PMACs are structured as pod.position.port.vmid, whereeach field respectively corresponds to the pod number of the edge switch, its positioninside the pod, the port number in which the physical server is connected to and theidentifier of the VM inside the server. With PMACs, PortLand transparently provideslocation-independent addresses for VMs and requires no modification in commodityswitches. However, it has two main shortcomings (1) it requires a Fat-Tree topology(instead of the traditional multi-rooted oversubscribed tree) and (2) at least half of the

“9780471697558c04” — 2015/3/20 — 11:09 — page 93 — #19

ADDRESSING 93

ToR switch ports should be connected to servers (which, in fact, is a limitation ofFat-Trees) [69].

4.6 ADDRESSING

Each server (or, more specifically, each VM) must be represented by a unique canoni-cal address that enables the routing protocol to determine paths in the network. Cloudproviders typically employ LAN technologies for addressing VMs in datacenters, whichmeans there is a single address space to be sliced among tenants and their applications.Consequently, tenants have neither flexibility in designing their application layer-2 andlayer-3 addresses nor network isolation from other applications.

Some isolation is achieved by the use of VLANs, usually one VLAN per tenant.However, VLANs are ill-suited for datacenters for four main reasons [51, 70–72]: (1) theydo not provide flexibility for tenants to design their layer-2 and layer-3 address spaces;(2) they use the spanning tree protocol, which cannot utilize the high-capacity availablein DCN topologies (as discussed in the previous section); (3) they have poor scalabil-ity, since no more than 4094 VLANs can be created, and this is insufficient for largedatacenters; and iv) they do not provide location-independent addresses for tenants todesign their own address spaces (independently of other tenants) and for performingseamless VM migration. Therefore, providers need to use other mechanisms to allowaddress space flexibility, isolation and location independence for tenants while multi-plexing them in the same physical infrastructure. We structure the discussion in threemain topics: emerging technologies, separation of name and locator and full addressspace virtualization.

4.6.1 Emerging Technologies

Some technologies employed in DCNs are: Virtual eXtensible Local Area Network(VXLAN) [73], Amazon Virtual Private Cloud (VPC) [74] and Microsoft Hyper-V [75].VXLAN [73] is an Internet draft being developed to address scalability and multipathusage in DCNs when providing logical isolation among tenants. VXLAN works by cre-ating overlay (virtual layer-2) networks on top of the actual layer-2 or on top of UDP/IP.In fact, using MAC-in-UDP encapsulation abstracts VM location (VMs can only viewthe virtual layer-2) and, therefore, enables a VXLAN network to be composed of nodeswithin distinct domains (DCNs), increasing flexibility for tenants using multi-datacentercloud platforms. VXLAN adds a 24-bit segment ID field in the packet header (allowingup to 16 million different logical networks), uses ECMP to distribute load along mul-tiple paths and requires Internet Group Management Protocol (IGMP) for forwardingframes to unknown destinations, or multicast and broadcast addresses. Despite the bene-fits, VXLAN header adds 50 bytes to the frame size, and multicast and network hardwaremay limit the usable number of overlay networks in some deployments.

Amazon VPC [74] provides full IP address space virtualization, allowing tenants todesign layer-3 logical isolated virtual networks. However, it does not virtualize layer-2,

“9780471697558c04” — 2015/3/20 — 11:09 — page 94 — #20


which does not allow tenants to send multicast and broadcast frames [71]. MicrosoftHyper-V [75] is a hypervisor-based system that provides virtual networks for tenants todesign their own address spaces; Hyper-V enables IP overlapping in different virtual net-works without using VLANs. Furthermore, Hyper-V switches are software-based layer-2network switches with capabilities to connect VMs among themselves, with other virtualnetworks and with the physical network. Hyper-V, nonetheless, tends to consume moreresources than other hypervisors with the same load [76].

4.6.2 Separation of Name and Locator

VL2 [6] and Crossroads [70] focus on providing location independence for VMs, so thatproviders can easily grow or shrink allocations and migrate VMs inside or across datacen-ters. VL2 [6] uses two types of addresses: location-specific addresses (LAs), which arethe actual addresses in the network, used for routing; and application-specific addresses(AAs), permanent address assigned to VMs that remain the same even after migrations.VL2 uses a directory system to enforce isolation among applications (through accesscontrol policies) and to perform the mapping between names and locators; each serverwith an AA is associated with the LA from the ToR it is connected to. Figure 4.9 depictshow address translation in VL2 is performed: the source hypervisor encapsulates theAA address with the LA address of the destination ToR for each packet sent; packetsare forwarded in the network through shortest paths calculated by the routing protocol,using both ECMP and VLB; when packets arrive at the destination ToR switch, LAsare removed (packets are decapsulated) and original packets are sent to the correct VMsusing AAs. To provide location-independent addresses, VL2 requires that hypervisorsrun a shim layer (VL2 agent) and that switches support IP-over-IP.

Crossroads [70], in turn, is a network fabric developed to provide layer agnos-tic and seamless VM migration inside and across DCNs. It takes advantage of

Datacenter network

Packet is sent torandom

intermediary switchfor load balancing

LA is decapsulated, andoriginal packet is sent to

destination VM using AA

Hypervisor encapsulatesAA addresses with

LA of destination ToR switch

LAs

LAs

AAs

Payload

LAs

AAs

Payload

AAs

Payload

AAs

Payload

ToR switch3

2

1

ToR switch

Figure 4.9. Architecture for address translation in VL2.

“9780471697558c04” — 2015/3/20 — 11:09 — page 95 — #21

ADDRESSING 95

the Software-Defined Networking (SDN) paradigm [77] and extends an OpenFlowcontroller to allow VM location-independence without modifications to layer-2 andlayer-3 network infrastructure. In Crossroads, each VM possess two addresses: aPMAC and a Pseudo IP (PIP), both with location and topological informationembedded in them. The first one ensures that traffic originated from one dat-acenter and en route to a second datacenter (to which the VM was migrated)can be maintained at layer-2, while the second guarantees that all traffic des-tined to a migrated VM can be routed across layer-3 domains. Despite its ben-efits, Crossroads introduces some network overhead, as nodes must be identi-fied by two more addresses (PMAC and PIP) in addition to the existing MACand IP.

4.6.3 Full Address Space Virtualization

Cloud datacenters typically provide limited support for multi-tenancy, since tenantsshould be able to design their own address spaces (similar to a private environment) [71].Consequently, a multi-tenant virtual datacenter architecture to enable specific-tailoredlayer-2 and layer-3 address spaces for tenants, called NetLord [71], is proposed. At hyper-visors, NetLord runs an agent that performs Ethernet+IP (L2+L3) encapsulation overtenants’ layer-2 frames and transfers them through the network using SPAIN [24] formultipathing, exploring features of both layers. More specifically, the process of encap-sulating/decapsulating is shown in Figure 4.10 and occurs in three steps, as follows: (1)the agent at the source hypervisor creates L2 and L3 headers (with source IP being atenant-assigned MAC address space identifier, illustrated as MAC_AS_ID) in order todirect frames through the L2 network to the correct edge switch; (2) the edge switchforwards the packet to the correct server based on the IP destination address in the vir-tualized layer-3 header; (3) the hypervisor at the destination server removes the virtualL2 and L3 headers and uses the IP destination address to deliver the original packetfrom the source VM to the correct VM. NetLord can be run on commodity switches andscale the network to hundreds of thousands of VMs. However, it requires an agent run-ning on hypervisors (which may add some overhead) and support for IP forwarding oncommodity edge (ToR) switches.

Datacenter network Port P2

VM

VM-S(Source VM)Tenant: TID

VM

VM-D(Dest VM)

Tenant: TIDPort P1

Ingress edge switch(ES-S)

Egress edge switch(ES-D)

Server

Pkt fromVM-S

Pkt fromVM-S

Pkt fromVM-S

D-IP: P2.TIDS-IP: MAC_AS_ID

D-MAC: ES-DS-MAC: ES-SVLAN: SPAIN

IP IPPkt from

VM-SPkt from

VM-STo

VM-DIP

IP

ETH ETH

ETHETH

D-MAC: Agent-DS-MAC: ES-D

+ +

PayLoad

D-MAC: VM-DS-MAC: VM-S

Server

SPA

INA

gent-S

(so

urc

e)

SPA

INA

gent-D

(De

st)

Figure 4.10. NetLord’s encapsulation/decapsulation process (adapted from Ref. [71]).

“9780471697558c04” — 2015/3/20 — 11:09 — page 96 — #22


4.7 RESEARCH CHALLENGES

In this section, we analyze and discuss open research challenges and future directionsregarding datacenter networks. As previously mentioned, DCNs (i) present some dis-tinct requirements from traditional networks (e.g., high scalability and resiliency); (ii)have significantly different (often more complex) traffic patterns; and (iii) may not befully utilized, because of limitations in current deployed mechanisms and protocols (forinstance, ECMP). Such aspects introduce some challenges, which are discussed next.

4.7.1 Heterogeneous and Optimal DCN Design

Presently, many Internet services and applications rely on large-scale datacenters toprovide availability while scaling in and out according to incoming demands. This isessential in order to offer low response time for users, without incurring excessive costsfor owners. Therefore, datacenter providers must build infrastructures to support largeand dynamic numbers of applications and guarantee quality of service (QoS) for ten-ants. In this context, the network is an essential component of the whole infrastructure,as it represents a significant fraction of investment and contributes to future revenuesby allowing efficient use of datacenter resources [15]. According to Zhang et al. [78],network requirements include (i) scalability, so that a large number of servers can beaccommodated (while allowing incremental expansion); (ii) high server-to-server capac-ity, to enable intensive communication between any pair of servers (i.e., at full speed oftheir NICs); (iii) agility, so applications can use any available server when they needmore resources (and not only servers located near their current VMs); (iv) uniform net-work utilization to avoid bottlenecks; and (v) fault tolerance to cope with server, switchand link failures. In fact, guaranteeing such requirements is a difficult challenge. Look-ing at these challenges from the providers viewpoint make them even more difficult toaddress and overcome, since reducing the cost of building and maintaining the networkis seen as a key enabler for maximizing profits [15].

As discussed in Section 4.2, several topologies (e.g., Refs. [4–6, 17, 18]) have beenproposed to achieve the desired requirements, with varying costs. Nonetheless, they(i) focus on homogeneous networks (all devices with the same capabilities); and (ii) donot provide theoretical foundations regarding optimality. Singla et al. [7], in turn, takean initial step towards addressing heterogeneity and optimality, as they (i) measure theupper-bound on network throughput for homogeneous topologies with uniform trafficpatterns; and (ii) show an initial analysis of possible gains with heterogeneous networks.Despite this fact, a lot remains to be investigated in order to enable the development ofmore efficient, robust large-scale networks with heterogeneous sets of devices. In sum-mary, very little is known about heterogeneous DCN design, even though current DCNsare typically composed of heterogenous equipment.

4.7.2 Efficient and Incremental Expansion

Providers need to be constantly expanding their datacenter infrastructures to accommo-date ever-growing demands. For instance, Facebook has been expanding its datacenters

“9780471697558c04” — 2015/3/20 — 11:09 — page 97 — #23

RESEARCH CHALLENGES 97

for some years [79–82]. This expansion is crucial for business, as the increase of demandmay negatively impact scalability and performance (e.g., by creating bottlenecks in thenetwork). When the whole infrastructure is upgraded, the network must be expandedaccordingly, with a careful design plan, in order to allow efficient utilization of resourcesand to avoid fragmentation. To address this challenge, some proposals in the litera-ture [7, 11–13] have been introduced to enlarge current DCNs without replacing legacyhardware. They aim at maximizing high bisection bandwidth and reliability. However,they often make strong assumptions (e.g., Legup [12] is designed for tree-like net-works, and Jellyfish [13] requires new mechanisms for routing). Given the importanceof datacenters nowadays (as home of hundreds of thousands of services and applica-tions), the need for efficient and effective expansion of large-scale networks is a keychallenge for improving provider profit, QoS offered to tenant applications and qualityof experience (QoE) provided for users of these applications.

4.7.3 Network Sharing and Performance Guarantees

Datacenters host applications with diverse and complex traffic patterns and differentperformance requirements. Such applications range from user-facing ones (i.e., Webservices and online gaming) that require low latency communication to inward com-putation (e.g., scientific computing) that need high network throughput. To gain betterunderstanding of the environment, studies [1, 9, 30, 49, 83] conducted measurementsand concluded that available bandwidth for VMs inside the cloud platform can vary bya factor of five or more during a predefined period of time. They demonstrate that suchvariability ends up impacting overall application execution time (resulting in poor andunpredictable performance). Several strategies (including Refs. [2, 47, 48, 52, 84]) havebeen proposed to address this issue. Nonetheless, they have one or more of the followingshortcomings: (i) require complex mechanisms, which, in practice, cannot be deployed;(ii) focus on network sharing among VMs (or applications) in a homogeneous infras-tructure (which simplifies the problem [85]); (iii) perform static bandwidth reservations(resulting in underutilization of resources); or (iv) provide proportional sharing (no strictguarantees). In fact, there is an inherent trade-off between providing strict guarantees(desired by tenants) and enabling work-conserving sharing in the network (desired byproviders to improve utilization), which may be exacerbated in a heterogenous network.We believe this challenge requires further investigation, since such high-performancenetworks ideally need simple and efficient mechanisms to allow fair bandwidth sharingamong running applications in a heterogeneous environment.

4.7.4 Address Flexibility for Tenants

While network performance guarantees require quantitative performance isolation,address flexibility needs qualitative isolation [71]. Cloud DCNs, however, typically pro-vide limited support for multi-tenancy, as they have a single address space dividedamong applications (according to their needs and number of VMs). Thereby, tenantshave no flexibility in choosing layer-2 and layer-3 addresses for applications. Note that,

“9780471697558c04” — 2015/3/20 — 11:09 — page 98 — #24


ideally, tenants should be able to design their own address spaces (i.e., they should havesimilar flexibility to a private environment), since already developed applications maynecessitate a specific set of addresses to correctly operate without source code mod-ification. Some proposals in the literature [6, 70, 71] seek to address this challengeeither by identifying end-hosts with two addresses or by fully virtualizing layer-2 andlayer-3. Despite adding flexibility for tenants, they introduce some overhead (e.g., hyper-visors need a shim layer to manage addresses, or switches must support IP-over-IP) andrequire resources specifically used for address translation (in the case of VL2). This isan important open challenge, as the lack of address flexibility may hinder the migrationof applications to the cloud platform.

4.7.5 Mechanisms for Load Balancing Across Multiple Paths

DCNs usually present path diversity (i.e., multiple paths between servers) to achievehorizontal scaling for unpredictable traffic matrices (generated from a large number ofheterogeneous applications) [6]. Their topologies can present two types of multiple pathsbetween hosts: uniform and non-uniform ones. ECMP is the standard technique used forsplitting traffic across equal-cost (uniform) paths. Nonetheless, it cannot fully utilizethe available capacity in these multiple paths [59]. Non-uniform multiple paths, in turn,complicate the problem, as mechanisms must take more factors into account (i.e., pathlatency and current load). There are some proposals in the literature [46, 53, 57, 58]to address this issue, but they either cannot achieve the desired response times (e.g.,Hedera) [86] or are developed for specific architectures (e.g., PSSR for SecondNet).Chiesa et al. [87] have taken an initial approach towards analyzing ECMP and proposealgorithms for improving its performance. Nevertheless, further investigation is requiredfor routing traffic across both uniform and non-uniform parallel paths, considering notonly tree-based topologies, but also newer proposals such as random graphs [7, 13]. Thisinvestigation should lead to novel mechanisms and protocols that better utilize availablecapacity in DCNs (e.g., eliminating bottlenecks at level-0 links in DCell).

4.8 SUMMARY

In this chapter, we have presented basic foundations of datacenter networks and rele-vant standards, as well as recent proposals in the literature that address limitations ofcurrent mechanisms. We began by studying network topologies in Section 4.2. First, weexamined the typical topology utilized in today’s datacenters, which consists of a multi-rooted tree with path diversity. This topology is employed by providers to allow richconnectivity with reduced operational costs. One of its drawbacks, however, is the lackof full bisection bandwidth, which is the main motivation for proposing novel topologies.We used a three-class taxonomy to organize the state-of-the-art datacenter topologies:switch-oriented, hybrid switch/server and server-only topologies. The distinct charac-teristic is the use of switches and/or servers: switches only (Fat-Tree, VL2 and OSA),switches and servers (DCell and BCube) and only servers (CamCube) to perform packetrouting and forwarding in the network.

“9780471697558c04” — 2015/3/20 — 11:09 — page 99 — #25

REFERENCES 99

These topologies, however, usually present rigid structures, which hinders incre-mental network expansion (a desirable property for the ever-growing cloud datacenters).Therefore, we took a look at network expansion strategies (Legup, Rewire and Jellyfish)in Section 4.3. All of these strategies have the goal of improving bisection bandwidth toincrease agility (the ability to assign any VM of any application to any server). Further-more, the design of novel topologies and expansion strategies must consider the natureof traffic in DCNs. In Section 4.4, we summarized recent measurement studies abouttraffic and discussed some proposals that deal with traffic management on top of a DCNtopology.

Then, we discussed routing and addressing in Sections 4.5 and 4.6, respectively.Routing was divided in two categories: layer-3 and layer-2. While layer-3 routing typi-cally employs ECMP and VLB to utilize the high-capacity available in DCNs through theset of multiple paths, layer-2 routing uses the spanning tree protocol. Despite the bene-fits, these schemes cannot efficiently take advantage of multiple paths. Consequently,we briefly examined proposals that deal with this issue (Hedera, PSSR, SPAIN andPortland). Addressing, in turn, is performed by using LAN technologies, which doesnot provide robust isolation and flexibility for tenants. Towards solving these issues, weexamined the proposal of a new standard (VXLAN) and commercial solutions developedby Amazon (VPC) and Microsoft (Hyper-V). Furthermore, we discussed proposals in theliterature that aim at separating name and locator (VL2 and Crossroads) and at allowingfull address space virtualization (NetLord).

Finally, we analyzed open research challenges regarding datacenter networks: (i) theneed to design more efficient DCNs with heterogeneous sets of devices, while con-sidering optimality; (ii) strategies for incrementally expanding networks with generaltopologies; (iii) network schemes with strict guarantees and predictability for tenants,while allowing work-conserving sharing to increase utilization; (iv) address flexibil-ity to make the migration of applications to the cloud easier; and (v) mechanismsfor load balancing traffic across different multiple parallel paths (using all availablecapacity).

Having covered the operation and research challenges of intra-datacenter networks,the next three chapters inside the networking and communications part discuss the fol-lowing subjects: inter-datacenter networks, an important topic related to cloud platformscomposed of several datacenters (e.g., Amazon EC2); the emerging paradigm of SDN, itspractical implementation (OpenFlow) and how these can be applied to intra- and inter-datacenter networks to provide fine-grained resource management; and mobile cloudcomputing, which seeks to enhance capabilities of resource-constrained mobile devicesusing cloud resources.

REFERENCES

1. Hitesh Ballani, Paolo Costa, Thomas Karagiannis, and Ant Rowstron. Towards predictabledatacenter networks. In ACM SIGCOMM, 2011.

2. Alan Shieh, Srikanth Kandula, Albert Greenberg, Changhoon Kim, and Bikas Saha. Sharingthe data center network. In USENIX NSDI, 2011.

“9780471697558c04” — 2015/3/20 — 11:09 — page 100 — #26


3. Dennis Abts and Bob Felderman. A guided tour of data-center networking. Communicationof the ACM, 55(6):44–51, 2012.

4. Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. A scalable, commodity datacenter network architecture. In ACM SIGCOMM, 2008.

5. Chuanxiong Guo, Haitao Wu, Kun Tan, Lei Shi, Yongguang Zhang, and Songwu Lu. Dcell:A scalable and fault-tolerant network structure for data centers. In ACM SIGCOMM, 2008.

6. Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim,Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. VL2: A scalable andflexible data center network. In ACM SIGCOMM, 2009.

7. Ankit Singla, P. Brighten Godfrey, and Alexandra Kolla. High throughput data center topologydesign. In USENIX NSDI, 2014.

8. Jian Guo, Fangming Liu, Xiaomeng Huang, John C.S. Lui, Mi Hu, Qiao Gao, and Hai Jin.On efficient bandwidth allocation for traffic variability in datacenters. In IEEE INFOCOM,2014a.

9. Theophilus Benson, Aditya Akella, and David A. Maltz. Network traffic characteristics ofdata centers in the wild. In ACM IMC, 2010.

10. Nathan Farrington, Erik Rubow, and Amin Vahdat. Data Center Switch Architecture in theAge of Merchant Silicon. In IEEE HOTI, 2009.

11. Andrew R. Curtis, Tommy Carpenter, Mustafa Elsheikh, Alejandro Lopez-Ortiz, andS. Keshav. Rewire: An optimization-based framework for unstructured data center networkdesign. In IEEE INFOCOM, 2012.

12. Andrew R. Curtis, S. Keshav, and Alejandro Lopez-Ortiz. Legup: Using heterogeneity toreduce the cost of data center network upgrades. In ACM Co-NEXT, 2010.

13. Ankit Singla, Chi-Yao Hong, Lucian Popa, and P. Brighten Godfrey. Jellyfish: Networkingdata centers randomly. In USENIX NSDI, 2012.

14. Yang Liu and Jogesh Muppala. Fault-tolerance characteristics of data center network topolo-gies using fault regions. In IEEE/IFIP DSN, 2013.

15. Lucian Popa, Sylvia Ratnasamy, Gianluca Iannaccone, Arvind Krishnamurthy, and Ion Stoica.A cost comparison of datacenter network architectures. In ACM Co-NEXT, 2010.

16. Charles Clos. A Study of non-blocking switching networks. BellSystem Technical Journal,32:406–424, 1953.

17. Kai Chen, Ankit Singlay, Atul Singhz, Kishore Ramachandranz, Lei Xuz, Yueping Zhangz,Xitao Wen, and Yan Chen. Osa: An optical switching architecture for data center networkswith unprecedented flexibility. In USENIX NSDI, 2012.

18. Chuanxiong Guo, Guohan Lu, Dan Li, Haitao Wu, Xuan Zhang, Yunfeng Shi, Chen Tian,Yongguang Zhang, and Songwu Lu. BCube: A high performance, server-centric networkarchitecture for modular data centers. In ACM SIGCOMM, 2009.

19. Hussam Abu-Libdeh, Paolo Costa, Antony Rowstron, Greg O’Shea, and Austin Donnelly.Symbiotic routing in future data centers. In ACM SIGCOMM, 2010.

20. Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker. A scalablecontent-addressable network. In ACM SIGCOMM, 2001.

21. Paolo Costa, Thomas Zahn, Ant Rowstron, Greg O’Shea, and Simon Schubert. Why shouldwe integrate services, servers, and networking in a data center? In ACM WREN, 2009.

22. Transparent Interconnection of Lots of Links (TRILL): RFCs 5556 and 6325, 2013. Availableat: http://tools.ietf.org/rfc/index. Accessed November 20, 2014.

“9780471697558c04” — 2015/3/20 — 11:09 — page 101 — #27

REFERENCES 101

23. Changhoon Kim, Matthew Caesar, and Jennifer Rexford. Floodless in seattle: A scalableethernet architecture for large enterprises. In ACM SIGCOMM, 2008.

24. Jayaram Mudigonda, Praveen Yalagandula, Mohammad Al-Fares, and Jeffrey C. Mogul.SPAIN: COTS data-center Ethernet for multipathing over arbitrary topologies. In USENIXNSDI, 2010.

25. Damon Wischik, Costin Raiciu, Adam Greenhalgh, and Mark Handley. Design, implementa-tion and evaluation of congestion control for multipath tcp. In USENIX NSDI, 2011.

26. Scott Kirkpatrick, C. Daniel Gelatt, and Mario P. Vecchi. Optimization by simulated annealing.Science, 220(4598):671–680, 1983.

27. Renato Recio. The coming decade of data center networking discontinuities. ICNC, August2012. keynote speaker.

28. Theophilus Benson, Ashok Anand, Aditya Akella, and Ming Zhang. MicroTE: Fine grainedtraffic engineering for data centers. In ACM CoNEXT, 2011.

29. Peter Bodík, Ishai Menache, Mosharaf Chowdhury, Pradeepkumar Mani, David A. Maltz,and Ion Stoica. Surviving failures in bandwidth-constrained datacenters. In ACM SIGCOMM,2012.

30. Srikanth Kandula, Sudipta Sengupta, Albert Greenberg, Parveen Patel, and Ronnie Chaiken.The nature of data center traffic: Measurements & analysis. In ACM IMC, 2009.

31. Xiaoqiao Meng, Vasileios Pappas, and Li Zhang. Improving the scalability of data centernetworks with traffic-aware virtual machine placement. In IEEE INFOCOM, 2010.

32. Chi H. Liu, Andreas Kind, and Tiancheng Liu. Summarizing data center network traf-fic by partitioned conservative update. IEEE Communications Letters, 17(11):2168–2171,2013.

33. Meng Wang, Xiaoqiao Meng, and Li Zhang. Consolidating virtual machines with dynamicbandwidth demand in data centers. In IEEE INFOCOM, 2011.

34. Alexandrm-Dorin Giurgiu. Network performance in virtual infrastrucures, February2010. Available at: http://staff.science.uva.nl/~delaat/sne-2009-2010/p29/presentation.pdf.Accessed November 20, 2014.

35. Dave Mangot. Measuring EC2 system performance, May 2009. Availabel at: http://bit.ly/48Wui. Accessed November 20, 2014.

36. Jörg Schad, Jens Dittrich, and Jorge-Arnulfo Quiané-Ruiz. Runtime measurements in thecloud: Observing, analyzing, and reducing variance. Proceedings of the VLDB Endowment,3(1–2):460–471, 2010.

37. Guohui Wang and T. S. Eugene Ng. The impact of virtualization on network performance ofamazon ec2 data center. In IEEE INFOCOM, 2010.

38. Haiying Shen and Zhuozhao Li. New bandwidth sharing and pricing policies to achieve Awin-win situation for cloud provider and tenants. In IEEE INFOCOM. 2014.

39. Eitan Zahavi, Isaac Keslassy, and Avinoam Kolodny. Distributed adaptive routing con-vergence to non-blocking DCN routing assignments. IEEE Journal on Selected Areas inCommunications, 32(1):88–101, 2014.

40. Nick G. Duffield, Pawan Goyal, Albert Greenberg, Partho Mishra, K. K. Ramakrishnan, andJacobus E. van der Merive. A flexible model for resource management in virtual privatenetworks. In ACM SIGCOMM, 1999.

41. Barath Raghavan, Kashi Vishwanath, Sriram Ramabhadran, Kenneth Yocum, and Alex C.Snoeren. Cloud control with distributed rate limiting. In ACM SIGCOMM, 2007.

“9780471697558c04” — 2015/3/20 — 11:09 — page 102 — #28


42. Joe W. Jiang, Tian Lan, Sangtae Ha, Minghua Chen, and Mung Chiang. Joint VM placementand routing for data center traffic engineering. In IEEE INFOCOM, 2012.

43. Minlan Yu, Yung Yi, Jennifer Rexford, and Mung Chiang. Rethinking virtual network embed-ding: Substrate support for path splitting and migration. SIGCOMM Compuer. CommunicationReview, 38:17–29, 2008.

44. Vinh The Lam, Sivasankar Radhakrishnan, Rong Pan, Amin Vahdat, and George Varghese.Netshare and stochastic netshare: Predictable bandwidth allocation for data centers. ACMSIGCOMM CCR, 42(3), 2012.

45. Lucian Popa, Gautam Kumar, Mosharaf Chowdhury, Arvind Krishnamurthy, Sylvia Rat-nasamy, and Ion Stoica. FairCloud: Sharing the network in cloud computing. In ACMSIGCOMM, 2012.

46. Chuanxiong Guo, Guohan Lu, Helen J. Wang, Shuang Yang, Chao Kong, Peng Sun, WenfeiWu, and Yongguang Zhang. SecondNet: A data center network virtualization architecture withbandwidth guarantees. In ACM CoNEXT, 2010.

47. Henrique Rodrigues, Jose Renato Santos, Yoshio Turner, Paolo Soares, and Dorgival Guedes.Gatekeeper: Supporting bandwidth guarantees for multi-tenant datacenter networks. InUSENIX WIOV, 2011.

48. Di Xie, Ning Ding, Y. Charlie Hu, and Ramana Kompella. The only constant is change:Incorporating time-varying network reservations in data centers. In ACM SIGCOMM, 2012.

49. Hitesh Ballani, Keon Jang, Thomas Karagiannis, Changhoon Kim, Dinan Gunawardena, andGreg O’Shea. Chatty tenants and the cloud network sharing problem. In USENIX NSDI,2013a.

50. Vimalkumar Jeyakumar, Mohammad Alizadeh, David Mazières, Balaji Prabhakar,Changhoon Kim, and Albert Greenberg. EyeQ: Practical network performance isolation atthe edge. In USENIX NSDI, 2013.

51. Daniel Stefani Marcon, Rodrigo Ruas Oliveira, Miguel Cardoso Neves, Luciana Salete Buriol,Luciano Paschoal Gaspary, and Marinho Pilla Barcellos. Trust-based Grouping for CloudDatacenters: Improving security in shared infrastructures. In IFIP Networking, 2013.

52. Lucian Popa, Praveen Yalagandula, Sujata Banerjee, Jeffrey C. Mogul, Yoshio Turner, andJose Renato Santos. ElasticSwitch: Practical work-conserving bandwidth guarantees for cloudcomputing. In ACM SIGCOMM, 2013.

53. Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, andAmin Vahdat. Hedera: dynamic flow scheduling for data center networks. In USENIX NSDI,2010.

54. C. Hopps. Analysis of an equal-cost multi-path algorithm, 2000. RFC 2992.

55. Albert Greenberg, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta.Towards a next generation data center architecture: Scalability and commoditization. In ACMPRESTO, 2008.

56. Leslie G. Valiant and Gordon J. Brebner. Universal schemes for parallel communication. InACM STOC, 1981.

57. Zhiyang Guo, Jun Duan, and Yuanyuan Yang. On-line multicast scheduling with bounded con-gestion in Fat-Tree data center networks. IEEE Journal on Selected Areas in Communications,32(1):102–115, 2014b.

58. Wen-Kang Jia. A scalable multicast source routing architecture for data center networks. IEEEJournal on Selected Areas in Communications, 32(1):116–123, 2014.

“9780471697558c04” — 2015/3/20 — 11:09 — page 103 — #29

REFERENCES 103

59. Sivasankar Radhakrishnan, Malveeka Tewari, Rishi Kapoor, George Porter, and Amin Vahdat.Dahu: Commodity switches for direct connect data center networks. In ACM/IEEE ANCS,2013.

60. Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, JenniferRexford, Scott Shenker, and Jonathan Turner. Openflow: Enabling innovation in campusnetworks. SIGCOMM Computer Communication Review, 38:69–74, 2008.

61. E. Rosen, A. Viswanathan, and R. Callon. Multiprotocol label switching architecture, 2001.RFC 3031.

62. Dan Li, Mingwer Xu, Ying Liu, Xia Xie, Yong Cui, Jingyi Wang, and Gihai Chen. Reli-able multicast in data center networks., 2014. IEEE Transactions Computers, 63: 2011-2024,2014.

63. 802.1D - MAC Bridges, 2013. Available at: http://www.ieee802.org/1/pages/802.1D.html.Accessed November 20, 2014.

64. Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri,Sivasankar Radhakrishnan, Vikram Subramanya, and Amin Vahdat. PortLand: A scalablefault-tolerant layer 2 data center network fabric. In ACM SIGCOMM, 2009.

65. 802.1s - Multiple Spanning Trees, 2013. Available at: http://www.ieee802.org/1/pages/802.1s.html. Accessed November 20, 2014.

66. 802.1ax - Link Aggregation Task Force, 2013. Available at: http://ieee802.org/3/axay/.Accessed November 20, 2014.

67. 802.1Q - Virtual LANs, 2013. Available at: http://www.ieee802.org/1/pages/802.1Q.html.Accessed November 20, 2014.

68. Understanding Multiple Spanning Tree Protocol (802.1s), 2007. Available at: http://www.cisco.com/en/US/tech/tk389/tk621/technologies_white_paper09186a0080094cfc.shtml.Accessed November 20, 2014.

69. Md. Faizal Bari, Raonf Boutaba, Rafael Esteves, Lisandro Z. Granville, Maxim Podlesny,Md. Golam Rabbani, Qi Zhang, and Mohamed F. Zhani. Data center network virtualization:A survey. IEEE Communications Surveys Tutorials, 15(2):909–928, 2013.

70. Vijay Mann, Ailkumar Vishnoi, Kalapriya Kannan, and Shivkumar Kalyanaraman. Cross-Roads: Seamless VM mobility across data centers through software defined networking. InIEEE/IFIP NOMS, 2012.

71. Jayaram Mudigonda, Praveen Yalagandula, Jeff Mogul, Bryan Stiekes, and Yanick Pouffary.NetLord: A scalable multi-tenant network architecture for virtualized datacenters. In ACMSIGCOMM, 2011.

72. Brent Stephens, Alan Cox, Wes Felter, Colin Dixon, and John Carter. PAST: Scalable ethernetfor data centers. In ACM CoNEXT, 2012.

73. VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Net-works, 2013. Available at: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02.Accessed November 20, 2014.

74. Amazon Virtual Private Cloud, 2013. Available at: http://aws.amazon.com/vpc/. AccessedNovember 20, 2014.

75. Microsoft Hyper-V Server 2012, 2013a. Available at: http://www.microsoft.com/en-us/server-cloud/hyper-v-server/. Accessed November 20, 2014.

76. Hyper-V Architecture and Feature Overview, 2013b. Available at: http://msdn.microsoft.com/en-us/library/dd722833(v=bts.10).aspx. Accessed November 20, 2014.

“9780471697558c04” — 2015/3/20 — 11:09 — page 104 — #30


77. Teemu Koponen, Martin Casado, Natasha Gude, Jeremy Stribling, Leon Poutievski, Min Zhu,Rajiv Ramanathan, Yuichiro Iwata, Hiroaki Inoue, Takayuki Hama, et al. Onix: A distributedcontrol platform for large-scale production networks. In USENIX OSDI, 2010.

78. Yan Zhang and Nirwan Ansari. On architecture design, congestion notification, tcp incast andpower consumption in data centers. Communications Surveys Tutorials, IEEE, 15(1):39–64,2013.

79. Facebook to Expand Prineville Data Center, 2010. Available at: https://www.facebook.com/notes/prineville-data-center/facebook-to-expand-prineville-data-center /411605058132.Accessed November 20, 2014.

80. Tad Andersen. Facebook’s Iowa expansion plan goes before council, 2014. Avail-able at: http://www.kcci.com/news/facebook-just-announced-new-expansion-plan-in-iowa/25694956#!0sfWy. Accessed November 20, 2014.

81. David Cohen. Facebook eyes expansion of oregon data center, 2012. Available at: http://allfacebook.com/prineville-oregon-data-center-expansion_b97206. Accessed November 20,2014.

82. John Rath. Facebook Considering Asian Expansion With Data Center in Korea, 2013.Available at: http://www.datacenterknowledge.com/archives/2013/12/31/asian-expansion-has-facebook-looking-at-korea/. Accessed November 20, 2014.

83. Ryan Shea, Feng Wang, Haiyang Wang, and Jiangchuan Liu. A deep investigation intonetwork performance in virtual machine based cloud environment. In IEEE INFOCOM. 2014.

84. Katrina LaCurts, Shuo Deng, Ameesh Goyal, and Hari Balakrishnan. Choreo: Network-awaretask placement for cloud applications. In ACM IMC, 2013.

85. Fei Xu, Fangming Liu, Hai Jin, and A.V. Vasilakos. Managing performance overhead of virtualmachines in cloud computing: A survey, state of the art, and future directions. Proceedings ofthe IEEE, 102(1):11–31, 2014.

86. Andrew R. Curtis, Jeffrey C. Mogul, Jean Tourrilhes, Praveen Yalagandula, Puneet Sharma,and Sujata Banerjee. Devoflow: Scaling flow management for high-performance networks.In Proceedings of the ACM SIGCOMM 2011 Conference, SIGCOMM ’11, pp. 254–265,New York, 2011. ACM. Available at: http://doi.acm.org/10.1145/2018436.2018466. AccessedNovember 20, 2014.

87. Marco Chiesa, Guy Kindler, and Michael Schapira. Traffic engineering with equal-cost-multiPath: an algorithmic perspective. In IEEE INFOCOM, 2014.

“9780471697558c05” — 2015/3/20 — 11:13 — page 105 — #1

5INTER-DATA-CENTER

NETWORKS WITH MINIMUMOPERATIONAL COSTS

B. Kantarci1 and H. T. Mouftah2

1Department of Electrical and Computer Engineering, Clarkson University,Potsdam, New York, USA

2School of Information Technology and Engineering, University of Ottawa,Ottawa, Ontario, Canada

5.1 INTRODUCTION

Cloud computing enables users to receive infrastructure/platform/software as a service(XaaS) via a shared pool of resources based on the pay-as-you-go fashion [1]. Automatedservice provisioning, virtual machine migration, data security, reliability, and energymanagement have been pointed as the challenges faced by cloud providers [2], whereasenergy management and reliability appear as two important issues that impact the opera-tional expenditures (Opex) of the operators. As data centers are the main hosts of physicalresources, they play the key role in the delivery of cloud services. Hence, interconnec-tion of data centers over a backbone network is one of the major challenges affecting theperformance of the cloud system, as well as the Opex of the service providers.

As illustrated in Figure 5.1, inter-data-center (IDC) networks are considered to beaccommodated within the public telecom network that consists of heterogeneous net-work segments such as wireless backhaul networks, wireline local area networks (LANs),wireless sensor networks (WSNs), wireline Multiprotocol Label Switching (MPLS)networks, legacy IP networks, and so on [3]. In the Cloud era, the volume of the traffic


105

“9780471697558c05” — 2015/3/20 — 11:13 — page 106 — #2

106 INTER-DATA-CENTER NETWORKS WITH MINIMUM OPERATIONAL COSTS

Network controland

management

Datacenter

Data

center

WirelineMPLS

Wirelesssensornetwork

Highcapacity

opticaltransportnetwork

WirelineO-CDMA

Lan

Produced bymicrosoft visio

IP

network

Wirelessaccess

network

Figure 5.1. Heterogeneous inter-data-center network [3].

between data centers increases tremendously due to on-demand accessing to shared poolof resources by large number of users. This phenomenon increases the capacity demandsof the IDC networks introducing challenges related to capacity scaling and operationalexpenses [4]. Furthermore, virtual machine migration within and between the data cen-ters or massive arrival of new cloud resource requests can lead to frequent reconfigurationof the network between the servers in data centers, as well as the network interconnectingthe data centers [5]. High bandwidth and low energy cost are reported as the two cru-cial requirements of IDC networks which make optical networks the leading transporttechnology [6, 7].

Optical IDC networks call for intelligent design schemes by considering contentreplicas, as well as the location and number of data centers in order to ensure survivabilityagainst failures [8, 9]. Furthermore, energy-efficient design of the IDC network is crucialto minimize the operational expenses of the network and data center providers as high

“9780471697558c05” — 2015/3/20 — 11:13 — page 107 — #3

INTRODUCTION 107

energy consumption leads to increased the electric bills. Here, energy-efficiency denotespower saving design and planning of the network, as well as reducing the nonrenewableenergy consumption in powering the data centers and the inter-data-center network [10].

Virtualization of the network is a key concern in network design as connectivity canbe guaranteed to the cloud customers by offering network as a service (NaaS) [11]. Inthe same study, Baroncelli et al. define virtualization as mapping the cloud services withthe corresponding end addresses where cloud requests are submitted. Besides virtualiza-tion, communication mode is another factor which affects Opex of the network and datacenter operators. In conventional networks, unicast and multicast communication modesare used. However, in a virtualized cloud environment, requests can be routed towardvirtual resources based on anycast or manycast paradigm. In an IDC network consistingof N nodes, anycast is denoted by <s, d ∈ D> where s and d denote source and desti-nation addresses, respectively, whereas D is the set of candidate destination addresses.Thus, reaching at any of the candidate destination addresses is sufficient to provisionthe corresponding request. On the other hand, manycast is denoted by <s,D′ ⊆ D>where reaching at a subset of the candidate destinations is sufficient to provision a sub-mitted request. Anycast and manycast communication modes provide the flexibility ofallocating resources in different data centers; hence energy efficiency and resilience canbe ensured by adopting these communication modes [12–14].

As mentioned earlier, energy efficiency impacts the electric bills of the operators;therefore design schemes considering electricity prices based on location and time arealso emergent. According to recent research results, electricity price-aware design ofthe inter-data-center network can enable Opex savings if and only if electricity price-aware inter-data-center workload migration is enabled along with provisioning thedemands in the data centers where electricity prices are low at the time of provisioning[15, 16].

An IDC network design with the objective of energy efficiency (or minimum electricbills) is different from an energy-efficient transport network design due to the differencebetween energy consumption levels of the network components and data centers. Themost power hungry components of the transport networks are reported to be the IP routerports, while the power consumption of a cloud data center is at least ten to hundred timesof that of a corporate data center. An IP router port consumes around 1 kW [17], whereasthe total power consumption of a cloud data center can reach up to multi-mega-watts(MMW). Recent research reports that 61.4 MMW of total energy consumption of USdata centers (according to the report in 2006 [18]) has dramatically increased by the endof 2013 [19].

Indeed, when designing a virtual IDC network, resilience is at the expenses of energysavings as reported in Ref. [20]. Therefore, in order to address this trade-off, there hasbeen proposals such as the resilient virtual infrastructure design under 1:1 protection forlightpaths and virtual servers [20] and IDC workload migration-enabling virtual networkdesign schemes [21].

This chapter provides a reference on the design methods for operational cost-efficient design of a cloud backbone through demand profile-based networkvirtualization where the data centers are located at the core nodes of a transport net-work. Addressing energy-efficiency in a cloud backbone helps reducing the Opex of thenetwork and data center operators. Another factor that affects the Opex of the operators

“9780471697558c05” — 2015/3/20 — 11:13 — page 108 — #4


is the downtime of cloud services that can be denoted by resiliency, availability, and/orreliability. This chapter considers two major components of the operational costs of anIDC network: (1) electric bills of the operators, (2) Downtime penalties due to serviceunavailability, i.e., outage. Therefore, the design methods that aim at cutting the electricbills of the operators, as well as the methods that aim at reducing the outage probability,are covered in the following sections. Furthermore, the approaches that jointly considerthese challenges and overcome the related challenges are studied, as well. At the end ofthe chapter, a brief summary of the studied schemes is complemented by a comprehen-sive comparison in terms of various aspects of Opex and other performance parametersaffecting it.

In Section 5.2, we introduce IDC network virtualization and a generic virtualiza-tion scheme. Section 5.3 introduces virtual IDC network design with the objective ofminimum electric bills by presenting mixed integer linear programming (MILP)-basedoptimization and heuristic solutions. Section 5.4 introduces IDC network design with theobjective of minimum downtime penalties. As mentioned earlier, there exists a trade-offbetween these two Opex elements. Therefore, Section 5.5 presents a solution to addressthis trade-off. The chapter is summarized along with discussions for open issues andchallenges in Section 5.6.

5.2 INTER-DATA-CENTER NETWORK VIRTUALIZATION

In the cloud dominated era, virtualization and infrastructure as a service (IaaS) enablesproviding several portions of the physical infrastructure as a service by different oper-ators where infrastructure denotes computing and storage resources in data centers, aswell as the communication infrastructure interconnecting the data centers [22]. Further-more, by taking advantage of transparent optical devices, virtualization of an opticalnetwork enables bypassing IP routers that are the most power hungry components in thebackbone [17].

Figure 5.2 presents a minimalist illustration of virtualization of an IDC network.Each data center is associated with a backbone node where backbone nodes are inter-connected via fiber links. If a lightpath can be established between two nodes, the twobackbone nodes with the allocated resources in the associated data centers are said tobe virtually linked. Thus, in the virtual infrastructure, a virtual node is denoted by thevirtualized resources of a data center and its associated backbone node.

In the literature, planning of the virtual infrastructure denotes mapping the virtualnetwork onto the physical topology. The physical infrastructure is considered to be theset of data centers, optical backbone nodes, and fiber links interconnecting them, whereasthe virtual infrastructure is a subset of the physical infrastructure consisting of a set ofvirtualized data center resources and fiber channels [23].

The objective of network virtualization can be various such as energy minimiza-tion, cost minimization, reliability maximization, and so on. In this section, we presenta previously proposed energy-minimized design of an inter-data-center network [13]which adopts the multihop optical bypass-based virtualization technique in an IP over

“9780471697558c05” — 2015/3/20 — 11:13 — page 109 — #5

INTER-DATA-CENTER NETWORK VIRTUALIZATION 109

Virtualinfrastructure

Physical

infrastructure

VN5

VN3VN2

VN6

VN4

DC3

DC4

N4

N5

DC5DC6

DC1

Data center Network node Virtual node

Dotted lines:

Virtual links

Dashed lines:

Physical lightpaths

Solid lines:

Physical links

N1

N2

N3

N7

DC2

DC7N6

VN72VN12

VN11VN71

Figure 5.2. Minimalist illustration of inter-data-center network backbone virtualization.

WDM network [17]. In Ref. [13], the authors have proposed an energy efficient designof the IDC network backbone through MILP formulations, as well as heuristics. Theseschemes have been extended to address both inter and IDC network provisioning with theobjective of energy efficiency [24]. However, since the scope of this chapter is limitedto IDC network design, we refer the interested reader to the corresponding reference.In the next sections, the corresponding formulation will serve as a benchmark for theOpex-minimized design schemes.

For the sake of simplicity, let us assume that a single virtual infrastructure is mappedonto the physical infrastructure. Furthermore, the following assumptions hold in thedesign of the virtual infrastructure:

• Three types of demands are assumed in the network, namely downstream data cen-ter demands, upstream data center demands, and regular demands. An upstreamdemand is submitted from a backbone node, and it is destined to any or a num-ber of data centers. A downstream data center demand originates from a few data

“9780471697558c05” — 2015/3/20 — 11:13 — page 110 — #6


centers and destined to a certain backbone node where they are aggregated anddelivered to the corresponding end users. A regular demand denotes a nondatacenter unicast flow between two backbone nodes.

• Intensities of all types of demands in a certain time interval are forecasted inadvance. Thus, virtualization of the backbone is performed in advance of theoccurrence of the corresponding demand profile so that the virtualization objectivecan be met.

• For an incoming upstream demand of any size, the overhead of allocatingresources in a given data center is known in terms of utilization, power consump-tion, and power usage efficiency.

5.2.1 Mathematical Formulation

In the formulation, the physical infrastructure is denoted by a directed graph G whereasthe virtual infrastructure is also represented by a directed graph denoted by G′. Givenan upstream data center demand originating at a source node, s, the set of data centersthat are capable of provisioning the corresponding demand is denoted by D. Routing thedemand towards any data center out of D is referred as anycast, whereas routing towarda subset of the eligible data centers is named as manycast. Thus, a manycast demandcan be denoted by the tuple <s,D′ ⊆ D>. Here, if |D′| = 1, the communication modebecomes equivalent to anycast while in case of D′ = D, it becomes identical to multicastcommunication. In this design scheme, upstream data center demands are assumed to beprovisioned based on the manycast communication mode.

Table 5.1 illustrates the notation used in explaining the virtualization frameworkin Ref. [13]. Mathematical formulation of the model is presented in Equation 5.1 and5.2. Equation 5.1 presents the objective of the virtualization, which is minimized energyconsumption throughout the IDC network. As seen in the equation, the total power con-sumption in the network is the sum of the power consumptions at each node location.Power consumption at each node location is a function of the power consumption of theassociated data center (summation term 1), the active IP router ports (summation term-2)and transponders in the directed wavelength channels along with the erbium-doped fiberamplifiers (EDFAs) in the directed fiber links (summation term 3). The number of activeIP ports is calculated by the number of outgoing virtual lightpaths at the correspondingnode. Besides, in the third summation term, the number of EDFAs on a physical link, Sij

is set at �Lfij/Δspan�+ 1 where Δspan is the fiber span length.

min∑i∈Nv

(DCi +

∑j∈Nv

i

Pr · Cij +∑j∈N

pi

(Pt · Wij + Sij · Pedfa · fij)

)(5.1)

Before proceeding with the details, it is worthwhile to provide information on thepower consumption of a data center. Based on the assumptions summarized earlier, theprospective power consumption of data center-d (DCd) is a function of current pro-cessing and cooling power consumption in the corresponding data center and the total

“9780471697558c05” — 2015/3/20 — 11:13 — page 111 — #7


TABLE 5.1. The notation used in the virtualization scheme

Notation Explanation

Pi Power consumption at node iDCi Power consumption of the data center iPr Power consumption of an IP router portCij Number of lightpaths in the virtual link ijNv

i (Npi ) Set of neighbors of node i in the virtual (physical) topology

Pt Power consumption of a transponderPedfa Power consumption of an EDFASmn Number of EDFAs in the physical link mnWij Number of wavelengths in the physical link ijfij Number of fibers in the physical link ijΩDOWN

ds Downstream demand from data center s to node dΩUP

s Upstream traffic (job submission) to data centers from to node sγds

ijdownBinary variable is one if there is downstream traffic from data center s to node d

traversing the virtual link-ij.λsd

ij Regular traffic demand traversing the virtual link ij and destined from node sto node d

Λsd Regular traffic demand from node s to node dΥsd

up Possible demand from node s to data center dγsd

ijup Binary variable is one if there is traffic from node s to data center d traversing thevirtual link ij

Dsmax Maximum number of destinations for the upstream traffic from node s

Dsmin Minimum number of destinations for the upstream traffic from node s

Wmnij Number of wavelength channels on the virtual link ij traversing the physical link mn

DCcoold Cooling power consumed at data center d

DCprocd Processing power consumed at data center d

Θs,d Power consumption overhead introduced to data center d by the job submitted bynode s

Li,j Shortest distance from node i to node jLfm,n Fiber length between node m to node n

additional power consumption overhead of the demands submitted from other locationsand provisioned in data center-d. Equation 5.2 formulates this expression.

DCd = DCcoold + DCproc

d +∑s∈V

∑i �=d

Θs,d · γsdidup

, ∀d ∈ V (5.2)

Three subsets of constraints form the constraint set of the design model. Virtualiza-tion is mainly based on a typical routing and fiber and wavelength assignment (RFWA)in an optical network. Therefore, flow conservation constraints dominate the rest ofthe constraint set. In the virtual topology, single-hop routing is performed in the IPlayer while multihop routing in the physical topology is handled by the optical layer.Hence, flow conservation constraints for both layers have to be formulated separately.

“9780471697558c05” — 2015/3/20 — 11:13 — page 112 — #8


Besides, capacity constraints and manycast constraints for upstream data center trafficare needed. It is worthwhile to note that this chapter only presents the constraints relatedto the upstream data center (i.e., manycast) demands. Downstream data center demandscan also be considered as multiple unicast demands such as the regular demands. Fordetailed information on the formulation of unicast the constraints, the reader is referredto Refs. [13, 17].

Flow conservation constraints in the IP layer: An upstream data center demandrequires to be provisioned in at least (Ds

min) and at most (Dsmax) data centers. A backbone

node must initiate manycast traffic whose size is less than the demand size for maxi-mum number of destination data centers and greater than the demand size for minimumnumber of destination data centers as shown in Equation 5.3. Manycast communica-tion mode requires a “light-tree” in the network [25]. Thus, the total demand size onthe first branches of the light-tree is the size of the manycast demand. Furthermore, anupstream data center demand has to arrive at sufficient number of destinations by mod-ifying Equation 5.3 appropriately. Besides, flow conservation at the intermediate nodeshas to be met, that is, incoming and outgoing traffic volumes for a given type of demandat an intermediate node have to be equal.

Dsmin · ΩUP

s ≤∑d∈V

∑j∈V

Υsdup · γsd

sjup−

∑d∈V

∑j∈V

Υsdup · γsd

jsup≤ Ds

max · ΩUPs , ∀(s) ∈ V (5.3)

Flow conservation constraint in the optical layer: In the optical layer, source node ofa virtual link does not have any incoming wavelength channels as formulated in Equation5.4, while the destination node of a virtual link does not have any outgoing wavelengthchannels. Besides, a virtual link does not contain any loops.

Wijmn − Wij

nm =

⎧⎨⎩

−Cij m = iCij m = j0 else

⎫⎬⎭ , ∀m, n, i, j ∈ V (5.4)

Capacity constraints: Total channel capacity of the fibers from node−m to node−nsets the upper bound for the number of lightpaths traversing a physical link-mn as shownin Equation 5.5.

∑i∈V

∑j∈V

Wijnm − W · fmn ≤ 0, ∀m, n ∈ V (5.5)

Furthermore, a virtual link must have sufficient capacity to accommodate the regulartraffic, downstream DC traffic and the upstream DC traffic traversing it as shown inEquation 5.6, where C denotes the wavelength channel capacity.

∑s∈V

∑d∈V

λsdij +Υsd

up · γsdijup

+Υdsdown · γds

ijdown≤ C · Cij, ∀i, j,∈ V (5.6)

Manycast constraints: Equation 5.7 ensures that each DC upstream demand reachesto sufficient number of destinations. Furthermore, at most one virtual link can be utilizedprior to reaching at a destination as shown in Equation 5.8. To be able to distribute the

“9780471697558c05” — 2015/3/20 — 11:13 — page 113 — #9


traffic over the branches of the light-tree, backbone nodes in the optical domain have tobe multicast capable. Thus, an upstream data center demand can be accommodated bythe same virtual links up to node-j where the demand is split into multiple virtual links.Equation 5.8 formulates this constraint.

Dsmin ≤

∑i

∑i �=d

Υsdup · γsd

idup≤ Ds

max · ΩUPs , ∀s ∈ V (5.7)

∑i �=d

γsdidup

≤ 1, ∀s, d ∈ V (5.8)

∑d∈V

γsdijup

≤ 1, ∀s, i, j ∈ V (5.9)

5.2.2 Heuristic Solution for Inter-Data-Center NetworkVirtualization

In Ref. [13], the authors have proposed a heuristic to solve the aforementioned MILPsolution. The heuristic adopts the energy-efficient virtualization of an IP over WDM net-work [17], and introduces data center demands, as well as data center utilization andpower consumption constraints. The heuristic is named as Power Minimized Provision-ing (PoMiP). Figure 5.3 illustrates a generic flowchart for virtualization steps of an IDCnetwork. The algorithm starts with a set of demands that are sorted in decreasing order.Starting from each demand, the algorithm aims at routing the demand over virtual topol-ogy, G′. If the demand can be routed over the virtual topology, the remaining virtuallink capacities are updated, and the algorithm proceeds with the next non-provisioneddemand. If the demand cannot be routed over the virtual topology; a new virtual linkis added between source and destination nodes, which is later routed over the physicaltopology. The capacity of the newly added virtual link is also updated accordingly.

Here, as seen in the figure, there are three distinguishing functions of the heuristicas follows: (1) Virtual link cost (ϕv

ij) assignment, (2) Physical link cost (ϕphymn ) assignment,

(3) Selection of destination data centers.Equation 5.10 formulates the virtual link cost assignment. The heuristic aims at

selecting the virtual links with higher remaining capacity and lower physical link costs.

ϕvij =

⎧⎪⎪⎨⎪⎪⎩

∑link−mn∈link−ij

ϕphymn

C′ij

C′ij > 0

∞ else

⎫⎪⎪⎬⎪⎪⎭

(5.10)

Physical link cost assignment is formulated in Equation 5.11. As the power con-sumption of IP routers are avoided in the optical domain, power consumption of EDFAs(Pedfa) as a function of the distance between the two nodes forming the link, powerconsumption of the transponders as a function of the number of active wavelengths inthe physical link are the factors that contribute the power consumption in a physical

“9780471697558c05” — 2015/3/20 — 11:13 — page 114 — #10


Pop the firstdemand

Upstream datacenter demand?

NOYES

Route demand on G’

Success?

Add new virtual link

from s to d

Route link on G

Upstream data

center demand?

Update virtual link

capacity

All data centers

in dc reached?

YES NO

NO

NO

YES

YES

DC←Candidate datacenters set

dc←Select a subset of DC

(s,d)←(source, dest)∙

Set virtual link costs (φv)∙

(s,d)←(source, dest)∙Pop first data center in dc∙

Set virtual link costs (φv)∙

Figure 5.3. Generic virtualization steps for an inter-data-center network.

link. Thus, according to Equation 5.11, the heuristic aims at selecting the least powerconsuming links on a physical path.

ϕphymn =

{Pedfa · Smn + Pt · Wmn Wmn > 0

∞ else

}(5.11)

Since PoMiP aims at minimum power consumption throughout the network, for anupstream data center demand initiated at the backbone node s, it ranks and sorts the datacenters with respect to their prospective power consumption in increasing order, andselects the first Ds

min of them as the destinations. Thus, the heuristic maps the manycastflow provisioning problem onto multiple unicast flows provisioning problem.

Given a physical network of N backbone nodes and their associated data centers, forany demand, if a virtual path is found on the virtual topology, runtime of the algorithmis bounded above by O(N2) which is the complexity of a typical shortest path routingalgorithm. If the demand cannot be routed over the virtual topology, newly added virtual

“9780471697558c05” — 2015/3/20 — 11:13 — page 115 — #11

IDC NETWORK DESIGN WITH MINIMUM ELECTRIC BILLS 115

link is routed over the virtual topology within O(N2). Furthermore, searching for a fiberand a lightpath throughout the physical path requires O(F · W) runtime on each link,where F is the number of fibers per link and W is the number of transponders (i.e.,wavelengths) per node. Since fiber and wavelength search will be performed on each linkwith wavelength continuity constraint, fiber, and and wavelength search throughout thepath can be repeated by (N − 1)2 times. Therefore, runtime complexity of the algorithmis O(N2 ·F ·W). Since N ≥ F and N ≥ W, it can be said that the heuristic runs in O(N4)in the worst case.

5.3 IDC NETWORK DESIGN WITH MINIMUM ELECTRIC BILLS

Energy-efficient design of the IDC network reduces the Opex of the operators as they arecharged due to electricity consumption. Therefore, Opex of the operators can be furtherreduced if energy-efficient design is consolidated with electricity price-awareness. Fur-thermore, taking advantage of demand response (DR) component in smart grids can helpreducing Opex of the operators while keeping the power consumption fairly distributedamong the network. It is worthwhile to note that DR denotes regulating power consump-tion through generation of varying price tariffs Three approaches, namely time-of-use(ToU) pricing, real time pricing (RTP), and critical peak Pricing (CPP) are the mostpopular approaches among existing time varying tariffs. Since smart grid and dynamicpricing is a new concept, customers are not willing to join RTP-based pricing tariffs asthey are used to being charged by flat rates. Moreover, elasticity of RTP tariffs requiresrapid adaptation of customers. Based on the analysis in [26], this chapter considers ToUpricing despite several benefits of RTP.

In Ref. [15], the authors have analyzed the impact of ToU-aware virtualization ofthe inter-data-center network where the network is virtualized based on the forecasteddemand profile and the ToU rates in a certain timeslot with the objective of mini-mum electric bills for data center and network operators. In the corresponding study,the authors report that ToU-awareness enables reduction in the electric bills of the net-work and data center operators while introducing longer provisioning delays for the userdemands that are submitted to the data centers. In Ref. [16], the authors have shownthat ToU-aware IDC network virtualization is beneficial as long as IDC workload shar-ing is enabled during virtualization. To this end, the authors have proposed ToU-awareProvisioning (ToUP) which adopts and extends the virtualization scheme in Section 5.2.

Since data centers are the most power hungry components of an IDC network, sig-nificant reduction of the Opex by cutting the electric bills can be possible by enablingworkload sharing between data centers. Therefore, ToUP re-defines the distinguishingfunctions of the virtualization heuristic in order to meet its objective. Furthermore, inaddition to the three demand types, it also accommodates the fourth demand type, namelythe IDC demands. Distinguishing functions of the heuristics are re-defined as follows:

Virtual link cost assignment: Equation 5.12 formulates the virtual link cost assign-ment at time, T . As seen in the equation, if there are sufficient remaining lightpaths on

“9780471697558c05” — 2015/3/20 — 11:13 — page 116 — #12


a virtual link, its cost is set at the total cost of physical links forming the correspondingvirtual link.

ϕvij(T) =

⎧⎨⎩

∑link−mn∈link−ij

ϕphymn (T) C′

ij > 0

∞ else

⎫⎬⎭ (5.12)

Physical link cost assignment: Besides, cost of the physical link mn is set at its contri-bution to the electric bill per unit time. Thus, the product of the ToU price at the locationof the destination end node of the physical link mn (Pricen(T)) and the total energy con-sumption on the corresponding link per unit time is the unit contribution to the electricbill of the network operator.

ϕphymn (T) =

⎧⎨⎩

Pricen(T) ·[

Lfmn · (Pedfa · Smn + Pt · Wmn)

]Wmn > 0

∞ else

⎫⎬⎭ (5.13)

Data center subset selection for upstream data center demands: For an upstreamdata center demand, ToUP computes the prospective contribution of the correspondingdemand to the electric bill of each data center in the network. As mentioned before,it is assumed that the network virtualization manager knows the power consumptionand resource utilization overhead of an upstream data center demand on any data cen-ter. Therefore, contribution of an upstream data center demand to the electric bill of thedata center operator is calculated by the product of the prospective energy consumptionin the data center and the ToU rate at the time of virtualization at the location of thecorresponding data center as seen in Equation 5.14.

Ri(T) = Pricei(T) ·∑j �=i

Θs,i · γsijiup

∀i, s ∈ V (5.14)

IDC workload sharing: As mentioned before, ToUP enables accompanying backbonenetwork virtualization IDC workload migration so that workloads are hosted in thosedata centers that experience lower ToU prices during the corresponding period. Here, anew workload-data center mapping is aimed to be obtained. To this end, in [16], theauthors have proposed a simulated annealing-based procedure which is presented inAlgorithm 5.1.

Algorithm 5.1

Inter-Data-Center Workload Migration Algorithm {BeginSort demands in decreasing orderOpxtemp ←

∑i Oi

dc use Map to calculateMapii ← 100, TempMapii ← 100, ∀iOcurrent ← Opxtempwhile (converge = FALSE){

TempMapij ← Mapij, ∀i, jrandrow ← Select a random row in Map

candidates ← Count (Υrandrow,dIDC > 0)

“9780471697558c05” — 2015/3/20 — 11:13 — page 117 — #13


if(Count(Maprandrow,i > 0)>0){

TempMaprandrow,randrow ← κMINRemainder ← 1 − κMIN

}else{randcol ← Select a random column in TempMapif((randcol) is on the diagonal)

{TempMaprandrow,randrow ← κMINRemainder ← 1 − κMIN

}else{

Remainder ← TempMaprandrow,randcolTempMaprandrow,randcol ← 0

}}

dest1, dest2 ← Candidate destinations out of candidatessharedest1 ←Migration to dest1; sharedest1 ∈ [0,Remainder]sharedest2 ←Migration to dest2; sharedest2 ∈ [0,Remainder − sharedest1]TempMap[randrow][dest1] increment by sharedest1TempMap[randrow][dest2] increment by sharedest2TempMap[randrow][randrow] increment by Remainder − (sharedest1 + sharedest2)Opxtemp ←

∑i Oi

dc use TempMap to calculate

F ← e(Ocurrent−Opxtemp)/(100·B·tcool)

if((F ≥ 1))Map[i][j] ← TempMap[i][j], Ocurrent ← Opxtempelse if(F < 1)

Map[i][j] ← TempMap[i][j], Ocurrent ← Opxtemp with prob. FT ← T · tcoolif(T ≤ Tground OR change in Ocurrent � 1)

converge ← true}

End}

Before proceeding with the details of the algorithm, it is worthwhile to see Table 5.2for the notation, as well as the settings. The algorithm aims at obtaining a new datacenter-workload mapping matrix, Map, and in each annealing iteration it uses a tempo-rary mapping matrix, TempMap. Each data center is assumed to migrate a certain portionof its workload to at most DCmax data centers which is set at two in the pseudocode for thesake of simplicity. By the term Opex, the algorithm denotes the total electric bills of thedata center operators. Initially, Map is set at 100 · I where I is the identity matrix. Thus,each data center hosts 100% of its original workload. Until the algorithm converges, thefollowing iteration steps are repeated: TempMap is set equal to Map, and a random rowof Map denoting the source data center is selected along with a random column whichdenotes a candidate destination data center. If the candidate destination data center is thedata center itself, the algorithm sets the value of the corresponding cell at κMIN , other-wise it is set at zero. Then, the remainder of the workload is aimed at being distributed

“9780471697558c05” — 2015/3/20 — 11:13 — page 118 — #14


TABLE 5.2. The notation used in inter-data-center workload migration algorithm

Notation Explanation

Map: Workload distribution matrixTempMap: A temporary workload distribution matrixDCs: Candidate DCs set to share the workload of the data center sΥsd

IDC: Possible workload migration demand from data center s to data center dOpxtemp(Ocurrent): Temporary (current) OpexT,B: Annealing temperature, Boltzman constanttcool: Cooling rate of the systemdc: List of destination data centersκMIN : Lower bound for the workload percentage not to be migrated

among two of the rest of the data centers. The new workload-data center mapping isstored in TempMap, which is used to calculate the possible electric bill contribution ofthe new workload distribution (Opxtemp). The newly computed workload distributionmap is accepted by running a function (i.e., F in the peseudocode) of the current actualOpex (Ocurrent), newly computed Opex (Opxtemp), Boltzman constant and the coolingrate of the system. If F is greater than or equal to one, the workload that is stored inTempMap is accepted and assigned to Map. Otherwise, it is accepted with a probabilityof F.

At the end of each iteration, the system temperature is cooled by the cooling rate,tcool. If the system temperature is equal to or less than the previously defined groundtemperature, or if the change in current Opex is significantly low, the annealing systemis said to have converged. At this point, the algorithm stops and accepts the new workloaddistribution among the data centers.

In Ref. [16], the authors evaluated the performance of ToUP under a medium-scalecloud system located in the 14-node NSFNET backbone where each backbone node isassociated with a data center which is initially loaded between 0.1 and 0.7. Backbone net-work [13] is considered to be an IP over WDM network with 16 40 Gbps-wavelengths perfiber in which EDFAs are placed at every 80 km. Four time zones are assumed with thedemand profile in Figure 5.4a, whereas the ToU rates have been synthetically derivedfor each location as shown in Figure 5.4b. It is worthwhile to note that the NSFNETtopology four different time zones exist, namely the Eastern Standard Time (EST), Cen-tral Standard Time (CST), Mountain Standard Time (MST), and Pacific Standard time(PST) zones. An entire day is partitioned into eight equal timeslots. Network equipments,namely an EDFA, a transponder and an IP router port are assumed to consume 8, 73, and1000 W, respectively [17]. Besides, workload placement in a data center utilizes mini-mizing heat recirculation [27], and a data center is assumed to consume 168 kW (100 kW)of idle IT (cooling) power and 319.2 kW (280 kW) of full utilization IT (cooling) power.An upstream data center demand is assumed to increase the data workload between 0.025and 0.2. In the IDC workload distribution algorithm, the Boltzman constant is set at 0.01,whereas the cooling rate is 0.95. The ground temperature and the minimum temperaturechange are considered to be 0.005 and 0.001, respectively.

“9780471697558c05” — 2015/3/20 — 11:13 — page 119 — #15


00−02 03−05 06−08 09−11 12−14 15−17 18−20 21−230

20

40

60

80

100

120

Hours (EST)

Offere

d load (

Gbps)

EST

CST

MST

PST

(a)

12

34

56

78

910

1112

1314

01−0304−06

07−0910−12

13−1516−18

19−2122−24

0

5

10

15

NodesHours (EST)

TO

U p

rice (

cents

/kW

h)

(b)

Figure 5.4. (a) Demand profile in different time zones. (b) ToU rates in different locations of

the network.

01–03 04–06 07–09 10–12 13–15 16–18 19–21 22–24–100

0

100

200

300

400

500

600

700

Hours

Opex s

avin

gs w

.r.t

.DeM

iP (

$)

DePoMiP

ToUP – No IDC Traffic

ToUP –κMIN

= 0.7

ToUP –κMIN

= 0.5

ToUP –κMIN

= 0.3

ToUP –κMIN

= 0

(a)

01–03 04–06 07–09 10–12 13–15 16–18 19–21 22–240

50

100

150

200

250

300

350

Hours

Opex o

f th

e n

etw

ork

opera

tor

($)

DePoMiP

ToUP− No IDC Traffic

ToUP –κMIN

= 0.5

ToUP –κMIN

= 0.3

ToUP –κMIN

= 0

Opex of the

networkoperator isacceptable

when tκMIN

is

limited

(b)

Figure 5.5. (a) Opex savings in the inter-data-center network. (b) Opex of the network

equipment.

In Figure 5.5, performance evaluation of ToUP is presented in comparison to delayand power-minimized provisioning (DePoMiP) which has previously been proposedin [13]. Furthermore, ToUP is also evaluated by disabling IDC migration (κMIN = 1).Figure 5.5a illustrates overall Opex savings in the IDC network with respect to delay-minimized provisioning (DeMiP) where DeMiP aims at virtualization of the backbonenetwork with shortest lightpaths for unicast demands and shortest light-trees for upstreamdata center demands. It is clearly seen that ToUP is outperformed by DePoMiP if IDCworkload migration is disabled. Furthermore, enabling IDC workload migration intro-duces more Opex savings when compared to DePoMiP. Thus, lower κMIN values leadsto higher Opex savings in the entire cloud system. However, as seen in Figure 5.5b,the smaller the κMIN , the higher the Opex of the network operator. Therefore, limiting theallowable IDC workload migration seems to be viable. As the authors report in Ref. [16],under such a scenario, enforcing around 30% of the workload to be hosted in the original

“9780471697558c05” — 2015/3/20 — 11:13 — page 120 — #16


data center leads to the best compromise between the Opex of the data center and thenetwork operators.

5.4 INTER-DATA-CENTER NETWORK DESIGN WITH MINIMUMDOWNTIME PENALTIES

Besides energy efficiency and electricity bills, another type of significant operationalexpenses are the downtime penalties. Outage of network and/or computing resourcesin data centers can occur due to component failure. Therefore, resilient design of IDCnetwork design with the objective of minimum outage probability is required to reduceOpex. In Ref. [8] the content placement along with path and content protection in anoptical IDC network have been addressed via ILP formulations and heuristics. Similarly,in [9], locations of data centers are determined via ILP formulations based on anycastcommunication mode with the objective of maximum resilience. In Ref. [28], the authorshave proposed network virtualization-aware IDC network over an elastic optical net-work (EON) backbone. Although the proposed architecture is transparent to the transporttechnology, the authors have adopted the elastic optical networking technology basedon the report of the recent research. Recent research reports that energy consumption,bandwidth utilization, and deployment cost are enhanced by elastic optical networks incomparison to the conventional wavelength switched optical transport networks [29].

An outage denotes unavailability of the IDC network, and it can occur due to eithernetwork component failure or a failure in the data center. Therefore, a resilient designscheme should jointly consider the availability of network components as well as theavailability of data centers. The outage probability of a virtual link (ν�ij) can be formu-lated by Equation 5.15 where Oi

IP, aedfa, arcvr, and arcvr denote the outage probability ofan IP router, availability of an EDFA, availability of a transceiver and the availability ofa receiver, respectively. Besides, Smn, represents the number of EDFAs deployed in thephysical link mn (ρ�mn).

Oijν� = Oi

IP + OjIP +

∑m∈G

∑n∈G,ρ�mn∈ν�ij

Smn · (1 − aedfa) + (1 − atran) + (1 − arcvr) (5.15)

Thus, a virtual link ij is said to be out of service if one of the following conditionsholds:

• IP routers at the source/destination nodes of the link fails.• An EDFA along the physical path forming the virtual link ij fails.• Transmitter at node i fails.• Receiver at node j fails.

Once the outage probability of the virtual link is formulated, outage probability of avirtual path (VP) can be formulated as the sum of outage probabilities of the virtual linksforming the path. If a data center is located at the end of the path, its outage probabilityis also added to the outage probability of the virtual path.

“9780471697558c05” — 2015/3/20 — 11:13 — page 121 — #17

INTER-DATA-CENTER NETWORK DESIGN 121

Outage probability of an upstream data center demand submitted at node s can beformulated as shown in Equation 5.16 by simply assuming that the workload is repli-cated in two data centers where Ds

list, γsdijwp, and Oc

dc denote the list of selected data centersby node-s, a binary variable to denote if data center-d utilizes the virtual link ij and theoutage probability of data center c, respectively. In order to ensure resilience of a givenupstream data center demand; at least one lightpath toward the destination node and itscorresponding data center must be available (first summation term). In the first sum-mation term, duplicates of outage probability of the virtual links and the data centerscan occur, which are eliminated by the second summation term. Since the entire sum-mation leads to the availability of the corresponding demand, one’s complement of thesummation is equal to the outage probability.

OsUS = 1 −

[ ∑d∈Ds

list

((1 − Osd

νP) · (1 − Oddc)

)

−∑

d,c∈DC

∑i∈G′

∑j∈G′

((1 − Oij

ν�) · γsdijwp · γsc

ijwp · (1 − Ocdc) · (1 − Od

dc)

)](5.16)

5.4.1 Minimum Outage Probability in Cloud

In Ref. [28], the authors have proposed an IDC virtual network design scheme, namelyminimum outage probability in cloud (MOPIC). MOPIC computes virtual paths to thedata centers in DCs, and DCmin data centers are selected where the outage probabilityof the data centers and that of the corresponding virtual paths lead to minimum outageprobability.

While routing the virtual links over the physical topology, MOPIC assigns the outageprobability of each physical link as the link cost, thus, a virtual link is aimed to be routedover the most resilient lightpath in the physical topology.

5.4.2 Resource Saving Minimum Outage Probability In Cloud

Resilience requires additional resource usage will introduce additional energy consump-tion. Therefore, an efficient design scheme is expected to make a compromise betweenresource usage and resilience. To this end, resource saving minimum outage probabilityin cloud (RS-MOPIC) has been proposed. Although computing resource usage cannot bereduced, some savings in network resource usage is possible. Reducing the length of thepath traversed from source node to the destination data center can enable resource sav-ing. In the virtualization algorithm in Figure 5.1, virtual link cost assignment is done asfollows. Each virtual link ij is assigned the product of its outage probability and the num-ber of hops in its physical topology mapping. The same principle holds in determiningthe destination data centers. Thus, virtual paths to each data center in DCs is searchedby using the virtual link cost assignment as mentioned above, and selects DCmin datacenters leading to the DCmin-minimum outage probabilities for the corresponding work-load placement. Similarly, while routing the virtual link ij over the physical topology,

“9780471697558c05” — 2015/3/20 — 11:13 — page 122 — #18


each physical link-mn on the physical topology is assigned the product of its outageprobability and the number of nodes traversed by node n to node j.

Performance of MOPIC and RS-MOPIC has been evaluated by using a benchmarkapproach called minimum resource provisioning in cloud (MRPIC). MRPIC mainly aimsat designing the virtual network with minimum network resource usage. To this end, itsets the virtual link cost at the number of physical links forming the corresponding linkwhile routing a demand over the virtual topology. In order to map a virtual link on thephysical topology, MRPIC sets the cost of a physical link at the number of hops to thedestination node of the corresponding virtual link. Similarly, while for the upstream datacenter demands DCmin data centers out of DCs are selected based on the locality principle.

In a medium-scale simulation scenario under the 24-node US National Backbonetopology [17], the demand profile in Figure 5.4 is considered where each 3 h times-lot is denoted by Di. Fiber links interconnecting the data centers are assumed to have1000 GHz spectrum capacity with a data rate/bandwidth ratio of 2 bps/Hz. Besides,1 GHz subcarriers are assumed with a guard band of 10 Ghz whereas the transpondercapacity is assumed to be equal to the capacity of 50 subcarriers. DCmin is set at two forthe sake of simplicity. It is assumed that the outage probability of a router port is 10−6.Further assumptions on the outage probabilities of the optical network components suchas transceivers and EDFAs are taken from Ref. [30]. Besides, four-tier data centers areconsidered with respect to their availability values such as Tier 1, Tier 2, Tier 3, and Tier 4with the availability levels of 99.67%, 99.74%, 99.98%, and 99.995%, respectively [31].

In Figure 5.6a, RS-MOPIC and MOPIC are compared to the benchmark scheme,MRPIC in terms of outage probability of the upstream data center demands. Introduc-ing outage probability awareness to RSA and destination data center selection processreduces the outage probability of upstream data center demands dramatically. The outageprobability under MRPIC is always at the level of 10−6 whereas the outage probabilityof an upstream data center demand is reduced to the level of 10−7 under MOPIC and

D1 D2 D3 D4 D5 D6 D7 D80

0.5

1× 10

–5

Demand profile

Outa

ge p

robabili

ty (

manycast)

0

2

4

× 10−4

Outa

ge p

robabili

ty (

anycast)

MRPIC

RS–MOPIC

MOPIC

D1 D2 D3 D4 D5 D6 D7 D80

50

100

150

200

250

300

350

Demand profile

Nu

mb

er

of

active

ch

an

ne

ls

MRPIC (Any cast)

MRPIC (Many cast)

RS–MOPIC

MOPIC

(b)(a)

Figure 5.6. (a) Outage probability of upstream data center demands. (b) Number of active

channels in the virtual inter-data-center network.

“9780471697558c05” — 2015/3/20 — 11:13 — page 123 — #19

OVERCOMING ENERGY VERSUS RESILIENCE TRADE-OFF 123

RS-MOPIC. Furthermore, joint awareness of resource consumption and outage probabil-ity does not degrade resilience of the demands as RS-MOPIC introduces similar outageprobability with MOPIC. Moreover, under heavy demand profiles (e.g., D6 and D8), RS-MOPIC slightly reduces the outage probability of MOPIC by selecting shorter physicallightpaths for virtual topology mapping. The upper half of the figure shows the outageprobability under anycast-based MOPIC. Instead of placing the workload on multipledata centers, provisioning on a single data center increases the outage probability upto 10−4.

Besides, in Figure 5.6b, number of active channels are presented as the resource con-sumption of the evaluated schemes. Indeed, anycast-based implementation of MRPIC(DCmin = 1) introduces the least resource consumption due to utilizing less net-work resources by 13% and 35% based on the demand profile. Resource consumptionawareness incorporated in outage probability-aware provisioning increases the resourceconsumption of MRPIC by 3.5%–12% depending on the demand profile whereas pureoutage probability-aware design of the virtual IDC network increases channel utiliza-tion by 10%–25%. Thus, RS-MOPIC is more viable to be adopted in order to make acompromise between resilience and resource overhead.

5.5 OVERCOMING ENERGY VERSUS RESILIENCE TRADE-OFF

Although RS-MOPIC improves MOPIC in terms of resource consumption, it is notpower-aware; hence, energy-efficient improvement over MOPIC is required in orderto ensure low Opex for the operators. To this end, in [21], the authors have proposedresilient provisioning with minimum power consumption in cloud (RPMPC) which aimsat making a compromise between power consumption and outage probability. RPMPCimproves MOPIC in the following four ways:

(1) For upstream data center demands, RPMPC selects DCmin/2� data centers outof Ds based on minimum power consumption, whereas the rest are selected basedon those leading to minimum outage probability (see Eq. 5.16).

(2) While routing over virtual topology, RPMPC uses a two-piece function as shownin Equation 5.17. Thus, the first summation term formulates the total physicallink cost forming the corresponding virtual link whereas the second term formu-lates the outage probability of the virtual link. In the second piece of the costassignment function M denotes a large number to avoid the dominance of thefirst piece, that is, power consumption.

ϕvij =

⎧⎨⎩

(∑ρ�mn∈v�ij ϕphy

mn

)+

(M · Oij

ν�

)Aij > 0

∞ else

⎫⎬⎭ (5.17)

(3) In order to route a virtual link over the physical topology, RPMPC uses powerconsumption and the outage probability of the corresponding physical link asformulated in Equation 5.18. It is worthwhile to note that since the backbone is

“9780471697558c05” — 2015/3/20 — 11:13 — page 124 — #20


considered to be an elastic optical network, power consumption of the transpon-ders is formulated by the term Pt · Wmn +

∑λk∈Λmn

Pct · Rmn

k as Pt is the fixedpower consumption and Pc

t is the bandwidth-variable power consumption of atransponder, respectively, whereas Rmn

k is the current bitrate on the correspond-ing transponder. It is worthwhile to note that selection of IP over elastic opticalnetwork as the transport medium is to enable transmission in finer granularityand flexibility in spectrum allocation [32]. However, the proposed framework isadaptable to any optical transport technology. Due to limited space, the reader isreferred to Ref. [33] for the details of the transmission medium.

ϕphymn =

{Pedfa · Smn + Pt · Wmn +

∑λk∈Λmn

Pct · Rmn

k + M · Omnρ� Wmn > 0

∞ else

}

(5.18)

(4) RPMPC enables IDC workload sharing in order to ensure energy savings andlower outage probability. To this end, it adopts the IDC workload distribu-tion algorithm in Algorithm 5.1, and modifies it to meet both objectives. Theonly difference between the workload distribution algorithm of RPMPC andAlgorithm 5.1 is the calculation of the temporary Opex (Opxtemp). The newtemporary Opex calculation is performed by running Equation 5.19. Thus, thetemporary Opex function consists of two pieces where the first piece denotes thepower consumption overhead of the migrated workload on the destination datacenters (i.e., �(Map[s][i] ·Υsi

IDC) and the second piece is the outage probability ofthe demands destined to the selected alternate data centers. In the equation, �(·)denotes a function which returns the additional cooling and processing power fora data center due to workload migration whereas Υsd

IDC is the possible workloadmigration demand from data center s to data center d.

Opxtemp ←∑

i

[�(Map[s][i] ·Υsi

IDC) + M · (OidνP + Oi

dc)

](5.19)

In Ref. [21], the authors have evaluated the performance of RPMPC under the samesimulation settings in Section 5.4, and compared its performance to MOPIC and PoMiPin terms of power consumption and outage probability. In Figure 5.7a, it is clearly seenthat power consumption under RPMPC is similar to that under PoMiP. Furthermore, bythe employment of RPMPC, up to 6.7% enhancement can be introduced to MOPIC whichis purely outage probability-aware. Besides, in Figure 5.7b where MOPIC demonstratesthe best performance in terms of outage probability, RPMPC improves the outage proba-bility under PoMiP dramatically. Therefore, the trade-off between resilience and energyefficiency can be addressed by RPMPC to ensure significant Opex savings.

5.6 SUMMARY AND DISCUSSIONS

With the advent of cloud computing, users are rapidly receiving XaaS via a shared poolof resources based on the pay-as-you-go fashion. Data centers, as the hosts of physical

“9780471697558c05” — 2015/3/20 — 11:13 — page 125 — #21

SUMMARY AND DISCUSSIONS 125

D1 D2 D3 D4 D5 D6 D7 D87,000

7,500

8,000

8,500

9,000

9,500

10,000

Demand profile

Pow

er

consum

ption (

kW

)

MOPIC

RPMPC

PoMiP

2.5–6.7% improvement over Min−OPP

(a)

D1 D2 D3 D4 D5 D6 D7 D80

1

2

3

4

5

6× 10

−6

Demand profileO

uta

ge

pro

ba

bili

ty

MOPIC

RPMPC

PoMiP

RPMPC improvement :over PoMiP

(b)

Figure 5.7. (a) Power consumption of the inter-data-center network under RPMPC, MOPIC,

and POMIP, (b) Outage probability of upstream data center demands under RPMPC, MOPIC,

and POMIP.

servers, play the key role in the delivery of cloud services. Therefore, interconnection ofdata centers over a backbone network is one of the major challenges affecting the per-formance of the cloud system, as well as the Opex of the service providers. This chapterhas introduced recent design approaches for operational cost-efficient design of an vir-tual IDC network design. We have focused on energy efficiency (and electric bills) andoutage probability which can be also denoted by resiliency, availability, and/or reliabilityto help reducing the Opex of the the network and computing services. A generic virtualIDC design framework has been introduced which forms a basis for all of the schemesstudied in this chapter. Then, it has been followed by the PoMiP which aims at mini-mum power consumption throughout the network, ToU-aware provisioning which aimsat minimum electric bills for network and data center operators, MOPIC which aimsat minimum downtime for network as well as computing services, and RPMPC whichaims at meeting both objectives. All schemes have been discussed with pros and consin terms of energy consumption and resilience which has also introduced the trade-offbetween these two factors affecting the Opex. The chapter has been concluded by intro-ducing the benefits of RPMPC which adopts MOPIC and PoMiP to address this trade-off.In Table 5.3, these schemes have been summarized with a comparison with respect tobackbone network technology, energy efficiency, resilience, electricity price awareness,workload migration, and resource usage.

This area of research has still open issues and challenges to be addressed by theresearchers working in this field. Extension of RPMPC by considering the presence ofdifferentiated SLAs in the cloud backbone is an immediate research direction. Further-more, the impact of the intra-data-center network on the performance of the proposedpolicies needs further study. Future work should also investigate the impact of usingdifferent routing and spectrum/wavelength assignment schemes on the performanceof the proposed frameworks in terms of energy efficiency, outage probability, as wellas resource utilization. Last but not least, communication overhead between the IDC

“9780471697558c05” — 2015/3/20 — 11:13 — page 126 — #22


TABLE 5.3. Summary of the virtual inter-data-center network design schemes studied inthis chapter

Scheme Backbone Energy Resilience Electricity Workload Resourcenetwork Efficiency Price Migration Usage

PoMiP [13] IP/WDM√ × × × ×

DePoMiP [13] IP/WDM√ × × × √

ToUP [16] IP/WDM√ × √ √ √

MOPIC [28] EON × √ × × ×RSMOPIC [28] EON × √ × × √

RPMPC [21] EON√ √ × √ √

network and the smart grid communication network prior to virtualization needs to bestudied and addressed by future research.

REFERENCES

1. Q. Zhang, L. Cheng, and R. Boutaba, “Cloud computing: State-of-the-art and researchchallenges,” Journal of Internet Services and Applications, ED-1, 7–18 (2010).

2. R. Moreno-Vozmediano, R. S. Montero, and I. M. Llorente, “Key challenges in cloud com-puting to enable the future internet of services,” IEEE Internet Computing, ED-17/4, 18–25(2013).

3. S. J. B. Yoo, Y. Yin, and K. Wen, “Intra and inter datacenter networking: The role of opticalpacket switching and flexible bandwidth optical networking,” Proceeding of InternationalConference on Optical Network Design and Modeling (ONDM), ED-14, 1–6 (2012).

4. X. Zhao, V. Vusirikala, B. Koley, V. Kamalov, and T. Hofmeister, “The prospect of inter-data-center optical networks,” IEEE Communications Magazine, ED-51/4, 32–38, (2013).

5. M. Gharbaoui, B. Martini, and P. Castoldi, “Anycast-based optimizations for inter-data-centerinterconnections,” IEEE/OSA Journal of Optical Communications and Networking, ED-4/11,B168–B178 (2012).

6. Y. Li, N. Hua, H. Zhang, and X. Zheng, “Reconfigurable bandwidth service based on opti-cal network state for inter-data center communication,” IEEE International Conference onCommunications in China: Optical Networks and Systems, ED-1, 282–284 (2012).

7. D. Develder, M. De Leenheer, B. Dhoedt, and M. Pickavet “Optical networks for grid andcloud computing applications,” Proceedings of the IEEE, ED-100/5, 1149–1167 (2012).

8. M. F. Habib, M. Tornatore, M. De Leenheer, F. Dikbiyik, and Mukherjee, “Design ofdisaster-resilient optical datacenter networks,” IEEE/OSA Journal of Lightwave Technology,ED-30/16, 2563–2573 (2012).

9. B. Jaumard, A. Shaikh, and C. Develder, “Selecting the best locations for data centers inresilient optical grid/cloud dimensioning,” Proceedings of the International Conference onTransparent Optical Networks (ICTON), ED-14, 1–4 (2012).

10. X. Dong, T. El-Gorashi, and J. M. H. Elmirghani, “Green IP over WDM networks with datacenters,” IEEE/OSA Journal of Lightwave Technology, ED-29/12, 1861–1880 (2011).

“9780471697558c05” — 2015/3/20 — 11:13 — page 127 — #23

REFERENCES 127

11. F. Baroncelli, B. Martini, and P. Castoldi “Network virtualization for cloud computing,”Annals of Telecommunications, ED-65/11-12, 713–721 (2010).

12. J. Buysse, C. Cavdar, M. de Leenheer, B. Dhoedt, and C. Develder, “Improving energy effi-ciency in optical cloud networks by exploiting anycast routing,” Proceedings of the SPIE –Network Architectures, Management, and Applications, ED8310, 1–6 (2011).

13. B. Kantarci and H. T. Mouftah, “Designing an energy-efficient cloud network,” IEEE/OSAJournal of Optical Communications and Networking, ED-4/11, B101–B113 (2012).

14. C. Develder, M . Tornatore, M. F. Habib, and B. Jaumard, “Dimensioning resilient opticalgrid/cloud networks,” Communication Infrastructures for Cloud Computing, eds. Hussein T.Mouftah and Burak Kantarci, 191 Global, Hershey, PA, 73–106 (2014).

15. B. Kantarci and H. T. Mouftah, “The impact of time of use (ToU)-Awareness in energyand Opex performance of a cloud backbone,” Proceedings of IEEE Global CommunicationsConference (GLOBECOM), pp. 3250–3255 (2012).

16. B. Kantarci and H. T. Mouftah, “Time of use (ToU)-Awareness with inter-data center work-load sharing in the cloud backbone,” Proceedings of IEEE International Conference onCommunications (ICC), pp. 4207–4211 (2013).

17. G. Shen and R. S. Tucker, “Energy-minimized design for IP over WDM networks,” IEEE/OSAJournal of Optical Communications and Networking, ED-1/1, 176–186 (2009).

18. Environmental Protection Agency (EPA), “Report to Congress on server and data centerenergy efficiency.” Environmental Protection Agency, Washington, DC [Online] http://www.energystar.gov/ia/partners/prod_development/downloads/EPA_Datacenter_Report_Congress_Final1.pdf (2007).

19. G. Sun, V. Anand, D. Liao, C. Lu, X. Zhang, and N.-H. Bao, “Power-efficient provisioning foronline virtual network requests in cloud-based data centers,” IEEE Systems Journal, acceptedto appear, DOI: 10.1109/JSYST.2013.2289584.

20. A. Tzanakaki et al., “Energy efficiency considerations in integrated IT and optical networkresilient infrastructures,” Proceedings of International Conference on Transparent OpticalNetworks (ICTON), ED-13, 1–4 (2011).

21. B. Kantarci and H. T. Mouftah, “Minimum outage probability provisioning in an energy-efficient cloud backbone,” Proceedings of IEEE Global Communications Conference(GLOBECOM), SAC.GDC.1–5 (2013).

22. A. Pages, J. Perello, S. Spadaro, and G. Junyent, “Strategies for virtual optical networkallocation,” IEEE Communications Letters, ED-16/2, 268–271 (2012).

23. K. N. Georgakilas, A. Tzanakaki, M. Anastasopoulos, and J. M. Pedersen, “Converged opti-cal network and data center virtual infrastructure planning,” IEEE/OSA Journal of OpticalCommunications and Networking, ED-4/9, 681–691 (2012).

24. B. Kantarci, L. Foschini, A. Corradi, and H. T. Mouftah, “Design of energy-efficient cloudsystems via network and resource virtualization,” Wiley-International Journal of NetworkManagement, 1–16 DOI: 10.1002/nem.1838 (2013).

25. R. Lin, M. Zukerman, G. Shen, and W.-D. Zhong, “Design of light-tree based optical inter-datacenter networks,” IEEE/OSA Journal of Optical Communications and Networking, ED-5/12, 1443–1455 (2013).

26. R. de Sa Ferreira, L. A. Barroso, P. R. Lino, M. M. Carvalho, and P. Valenzuela,“Time-of-use tariff design under uncertainty in price-elasticities of electricity demand: Astochastic optimization approach,” IEEE Transactions on Smart Grid, ED-4/4, 2285–2295(2013).

“9780471697558c05” — 2015/3/20 — 11:13 — page 128 — #24


27. J. Moore, J. Chase, P. Ranganathan, and R. Sharma, “Making scheduling cool: Temperature-aware workload placement in data centers,” Proceedings of USENIX Annual TechnicalConference (ATEC), 61–74, 2005.

28. B. Kantarci and H. T. Mouftah, “Resilient design of a cloud system over an optical backbone,”accepted, 2014.

29. M. Klinkowski and K. Walkowiak, “On the advantages of elastic optical networks forprovisioning of cloud computing traffic,” IEEE Network, ED-27/6, 44–51 (2013).

30. M. Tornatore, G. Maier, and A. Pattavina, “Availability design of optical transport networks,”IEEE Journal on Selected Areas in Communications, ED-23/8, 1520–1532 (2005).

31. R. Arno, A. Driedl, P. Gross, and R. J. Schuerger, “Reliability of data centers by tierclassification,” IEEE Transactions on Industry Applications, ED-48/2, 777–783 (2012).

32. Z. Shuqiang and B. Mukhjere, “Energy-efficient dynamic provisioning for spectrum elasticoptical networks,” in IEEE International Conference on Communications (ICC), 1–6 (2012).

33. G. Zhang, M. De Leenheer, A. Morea, and B. Mukherjee, “A survey on OFDM-based elas-tic core optical networking,” IEEE Communications Surveys and Tutorials, ED-15/1, 65–87(2013).

“9780471697558c06” — 2015/3/20 — 13:56 — page 129 — #1

6

OPENFLOW AND SDN FORCLOUDS

Alberto Leon-Garcia, Hadi Bannazadeh, and Qi Zhang

Department of Electrical and Computer Engineering, University of Toronto,Toronto, Ontario, Canada

6.1 INTRODUCTION

Application platforms consist of the software and infrastructure (personal and sensordevices, wireless and wired access networks, Internet, and computing clouds) that areinvolved in the delivery of content and applications. Application platforms have beenevolving to provide unprecedented flexibility, scalability and economies of scale. Thisevolution is expected to continue driven by applications that address mobility, socialnetworking, big data, and smart infrastructures.

Service-oriented computing and virtualization are key notions in application plat-forms. Service-oriented computing uses services to support the rapid creation of large-scale interoperable distributed applications. Applications comprise services that can beaccessed through networks. Service-oriented computing and virtualization together pro-vide a foundation for resource management. A virtual resource reveals only the attributesthat are relevant to the service or capability offered by the resource, and it hides imple-mentation details. Virtualization therefore simplifies resource management and allowsoperation over infrastructures consisting of heterogeneous resources.


129

“9780471697558c06” — 2015/3/20 — 13:56 — page 130 — #2

130 OPENFLOW AND SDN FOR CLOUDS

Virtual machines (VMs) play a central role in cloud computing [1], and virtual net-works (VNs) play a key role in the debate over the design of the Future Internet and insoftware-defined networks (SDN) [2, 3]. In combination, cloud computing and SDN canenable highly dynamic, efficient, and cost-effective shared application platforms that cansupport the rapid deployment of applications for a multiplicity of application providers.

In this chapter we discuss the interdependencies between cloud computing and SDNin application platforms, and we provide a sample of major open source efforts thataddress these interdependencies. First, we consider the basic use case of web browsing tointroduce the basic issues in the interplay between cloud computing and SDN. Section 6.3discusses the features and advantages of SDN and its most influential example, Open-Flow. Section 6.4 discusses cloud computing and introduces OpenStack focusing on theNetworking Service provided by its Neutron project. Section 6.5 examines challengesand issues in combining SDN and cloud computing, and we highlight the important roleof Open vSwitch in providing network connectivity to VMs. Section 6.6 introduces theOpenDaylight open-source project. Section 6.7 shows how SDN and cloud computingcome together, and introduces the notion software-defined infrastructures. This chapteris focused on providing an integrated view of SDN and cloud computing. We conclude inSection 6.8 a brief discussion of research trends and challenges in SDN for cloud comput-ing. At the end of the chapter, we provide references to various surveys and introductoryarticles on specifics of SDN or cloud computing.

6.2 SDN, CLOUD COMPUTING, AND VIRTUALIZATIONCHALLENGES

Figure 6.1 depicts a scenario in which an end user wishes to access Web content througha handheld device. In a traditional, un-virtualized Internet service model, the requestfor accessing an HTML page is first received by the wireless access point. The accesspoint forwards the request to a Web server through an access network and an Internetgateway and then a firewall across the Internet. Depending on the current load condition,the request may be forwarded by a load balancer to one of several dedicated applicationservers, which in turn may access a shared database (e.g., SQL) server.

While this un-virtualized model is simple and straightforward to implement, it alsohas several limitations. First, when the service demand is low the application serversmay become underutilized, leading to resource wastage. On the other hand, when service

Internet

Access pointRouter /

GatewayWeb server Load balancer App. serversFirewall

Figure 6.1. Physical infrastructure for web browsing use case.

“9780471697558c06” — 2015/3/20 — 13:56 — page 131 — #3

SDN, CLOUD COMPUTING, AND VIRTUALIZATION CHALLENGES 131

Internet

Data center

Figure 6.2. Virtual infrastructure for Web browsing use case.

demand is high, it is also difficult to scale up the service, as this requires attaching anotherphysical server. If the demand surge is temporary, the attached server may subsequentlybecome underutilized. Furthermore, it is difficult to provide quality-of-service (QoS)guarantees in this model, as the “best-effort” packet delivery implies customers can expe-rience long response times and insufficient throughput when the underlying network isbusy.

Motivated by these limitations, there is a trend toward building virtualized infrastruc-ture for end-to-end service delivery. Infrastructure virtualization aims at having multiplevirtual infrastructures share the same physical infrastructure. In a nutshell, a virtualinfrastructure consists of VMs that are interconnected by an underlying VN, which mayconsist of virtual routers, virtual switches as well as virtual links that interconnect them.

For the Web browsing use case, a virtualized infrastructure is depicted in Figure 6.2.The Web content provider first specifies the topology and resource requirement ofits Web content delivery service as a virtual infrastructure as shown in Figure 6.2.The firewall, load balancer, and all the servers are implemented by VMs, whereasrouters and wireless access points are represented as virtual routers and virtual wire-less access points. This virtual infrastructure is then embedded in the physical networkinfrastructure, where VMs are placed in data centers, and virtual routers and virtualswitches are mapped to physical routers and switches or to software implementa-tions of these. The notion of a flow is central to the virtualization of a network.The flows of packets in a VN are identified by specific values in their header fieldsthat allow routers and switches to identify them and treat them as prescribed bytheir VN.

This model is beneficial for several reasons. First, by isolating VNs from each other,it is possible to achieve better QoS as network performance variability is limited. Second,

“9780471697558c06” — 2015/3/20 — 13:56 — page 132 — #4


by separating logical service infrastructures from the underlying physical networkinfrastructure, it is possible to improve resource utilization by consolidating multipleVMs on a single physical machine, or multiple virtual links on a single physical link. Vir-tualization also enables the scaling (up or down) and migration of virtual resources (e.g.,VMs and VNs) to achieve better resource efficiency, by adapting to demand variation.For example, when demand increases, it is desirable to increase the CPU and memoryallocation of a VM that hosts an application server. If the physical machine currentlyhosting the VM does not have sufficient resources, the VM can be migrated to anotherphysical machine.

The creation of VNs is at the core of the above model. The benefits of the abovemodel require an environment where multiple independent tenants are each allotted theirown dedicated virtual infrastructure; but in fact, they share the same physical infras-tructure, ideally completely oblivious of the presence of other tenants. From a tenantperspective, the virtual computing and networking infrastructure should behave as aphysical infrastructure. In particular, a tenant requires control over connectivity, band-width and QoS, MAC and IP addresses, node number, and node location and mobility.Each tenant may also bring requirements for security, load balancing, caching, andapplication performance.

On the other hand, the infrastructure provider should be able to handle large numbersof tenants, while meeting their specific requirements for security, isolation, network con-trol, and availability. In cases where a tenant requires presence in multiple geographicsites, a provider may be called upon to extend its VNs across multiple data centers oreven to federate with other providers. In short, the provider needs controllers to man-age its own network connectivity and resources while providing tenants with their ownspecific connectivity and networking needs.

SDN is an emerging concept for the control and management of network infrastruc-ture that enables programmability of the network. SDN is therefore a prime candidate toaddress the multiplicity of challenges for creating VNs in application platforms.

6.3 SOFTWARE-DEFINED NETWORKING

6.3.1 What Is SDN?

SDN is an emerging concept that has grown from efforts to define network architecturesthat are flexible, evolvable and can avoid the ossification pitfalls of the current Internet[2]. The early experience with OpenFlow in particular influenced the current view onSDN [3, 4]. We refer the reader to Ref. [5] for a recent survey of SDN and programmablenetworks.

SDN separates the control of network functionality from the forwarding func-tionality in packet-forwarding devices as shown in Figure 6.3. A logically centralizednetwork controller is responsible for the decision on how traffic from a given flow ishandled. The decision of the controller is then programmed into the packet-forwardingdevice.

“9780471697558c06” — 2015/3/20 — 13:56 — page 133 — #5

SOFTWARE-DEFINED NETWORKING 133

High-level networkservice(s)/application(s)

Northbound communication

Southbound communication

(e.g., Openflow)

Service/controller interface

Controller/switch interface

interface

Packet forwarding device(s)

Controller/switch

Servicemanager

Topologymanager

Network controller

Otheressentialfunctions

Figure 6.3. SDN layered architecture (from Ref. [5]).

Proceeding bottom up in Figure 6.3 we have the following layers:

1. Packet-forwarding devices: These devices execute the actual forwarding behav-ior on flows of packets. These devices can consist of actual physical switchesdesigned to provide programmability or software-implemented switches orrouters.

2. Southbound interface: This interface provides the means for network controllersto communicate and control the packet-forwarding devices. For example, Open-Flow provides a protocol for this interface.

3. Network controller: The controller provides network services to higher layersby programming the packet-forwarding devices. The network controller is posi-tioned to make optimal decisions on resource allocations because it has a globalview of the state of the overall network resources. A number of open source andproprietary network controllers are available.

4. Northbound interface: This interface provides the means for applications andhigh-level services to access the services provided by the network controller. In

“9780471697558c06” — 2015/3/20 — 13:56 — page 134 — #6


general, the vendors of network controllers prefer to differentiate their serviceofferings, and so the northbound interface has not been open or standardized.However the OpenDaylight project (discussed in the following text) has beenworking on the development of open northbound interfaces.

5. Applications and high-level services use the services of the network controllerto provide more complex functionality. For example, an application could pro-vide the virtual machines and virtual network to support a virtual tenant. Theapplication would invoke the services of the network controller in creating andmanaging its virtual networks. Other example applications could orchestrate orchain multiple services together to provide security, load balancing, caching, andso on.

SDN provides the foundation for flexible and customizable networking. The networkcontroller provides the capability to define specific treatments for given traffic flows byinstalling rules in the network forwarding devices. The centralization of control in thenetwork controller and the northbound interface allow novel applications to define theoperation of the network in software. This also opens the way for faster and adaptiveconfiguration of the network.

In the context of this chapter, the data center SDN provides the means for supportingnetwork virtualization and automated migration of VMs. It also provides the means toachieve bandwidth optimization, as well as higher utilization of servers and higher energyefficiency. Across data centers, SDN VN capabilities can support rapid provisioning andmigration of cloud services across private and public clouds in support of large-scalegeographically distributed applications.

We note that the layered view in Figure 6.3 is limited in scope to the networkinginfrastructure, and the Application level provides an indication of the broader cloud com-puting context within which networking must take place. In Sections 6.4 and 6.5, we willsee how SDN fits within this context in general and within OpenStack in particular.

6.3.2 OpenFlow

OpenFlow was originally presented as an approach to allow experimentation in new net-work protocols on campus networks [3] although its roots are in the Ethane project toenable highly flexible and secure enterprise networking [6]. OpenFlow separates the net-work control and packet forwarding, a concept first widely deployed in MPLS [7]. It hastwo basic elements as shown in Figure 6.4: (1) defining packet forwarding behavior byallowing a controller to set flow tables that associate an action with each flow in an Open-Flow switch and (2) defining an open secure protocol that enables a network controllerto exchange commands and packets with an OpenFlow switch. This is essentially theSDN southbound interface in Figure 6.3. The notion of an open management interfacewas introduced in Ref. [9].

The Open Networking Foundation was formed to provide specifications to cover thecomponents and basic functions of the OpenFlow switch and the OpenFlow protocol tomanage the switch from a remote controller [8]. OpenFlow Switch Specification 1.0.0

“9780471697558c06” — 2015/3/20 — 13:56 — page 135 — #7


Controller

OpenFlow channel

Grouptable

OpenFlow switch

Flow

table

Flow

table

OpenFlow protocol

Figure 6.4. OpenFlow switch and controller (from Ref. [9]).

TABLE 6.1. Components of a flow entry in a flow table

Match fields Priority Counters Instructions Timeouts Cookie

was released in December 2009. The latest release was 1.4.0 on October 2013. In thefollowing, we summarize the OpenFlow specification as described in Ref. [8].

The OpenFlow switch consists of one or more flow tables and a group table thatare used to perform packet lookups and forwarding. The OpenFlow protocol enables thecontroller to manage the OpenFlow switch. The controller can add, update, and deleteflow entries in the flow tables. As shown in Table 6.1 each entry consists of match fields,counters, and instructions that are applied to matching packets. The match fields consistof ingress port and packet headers and possibly metadata specified by previous flowtables. The required match fields in OpenFlow version 1.4.0 are ingress port, Ethernetdestination and source addresses with arbitrary bitmasks, Ethernet type, IPv4 and IPv6protocol number, IPv4 source and destination addresses with subnet masks or arbitrarybitmasks, IPv6 source and destination addresses with subnet marks or arbitrary bitmasks,

“9780471697558c06” — 2015/3/20 — 13:56 — page 136 — #8


Packetin

PacketoutPacket

Ingressport

Packet +

ingress port +

metadata

Actionset = {}

actionset

OpenFlow switch

actionset

Executeactionset

Table1

Table0

Tablen

Figure 6.5. Matching of packets against tables (from Ref. [9]).

TCP and UDP source and destination port numbers. Additional optional match fieldsinclude switch physical input port, metadata between tables, VLAN ID and priority, IPDSCP and ECN, SCTP, ICMP, ARP, and MPLS and Ethernet PBB. Altogether, the matchfields enable a very rich set of packet classifications spanning from physical port andthrough layers 2–4.

Within each table, entries are matched to packets in priority order so that the firstmatching entry is used. As shown in Figure 6.5, an arriving packet is first matched againstthe first table, and if an entry is matched, the instructions in the entry are performed.If a packet is not matched, then the table-miss flow entry is consulted to determine theappropriate action. This could include dropping the packet, forwarding it to the controllerusing the OpenFlow channel, or continuing to the next table.

The instructions in a flow entry can include actions such as packet forwarding, packetmodification, and group table processing. The instructions may also modify the pipelineprocessing by directing packets, and associated metadata, to subsequent tables for addi-tional processing. An arriving packet begins with an empty action set and this set isupdated each time a match to a table entry is identified. The table pipeline processing endswhen a table entry does not specify another table. At this point, the action set is executed.

In OpenFlow, a port is the network interface where packets pass between OpenFlowprocessing and the rest of the network. Packets that arrive on an ingress port are processedby the OpenFlow pipeline and may be forwarded to an output port. OpenFlow portscan be physical ports (e.g., Ethernet interface), logical ports that do not correspond to ahardware interface (e.g., tunnel), or reserved ports that specify generic forwarding actions(e.g., send to controller, or flooding).

The group table in Figure 6.4 contains group entries, and each entry contains a list ofaction buckets that are applied to packets sent to the group. The group abstraction allowsflow entries to point to common output actions in a switch. For example, a group of type“all” enables the controller to implement flooding.

There are three ways to remove flow entries from tables. First, the controller canrequest removal of an entry. Second, the switch has a flow-expiry mechanism thatremoves an entry after either a hard timeout expires or after the entry has not beenmatched for some specified period of time. Third, a switch may evict table entries when itneeds to recover resources, when eviction is enabled. If a flag is set, the switch is requiredto notify the controller that the entry has been removed.

“9780471697558c06” — 2015/3/20 — 13:56 — page 137 — #9


TABLE 6.2. Meter entry in meter table; Meter band in meter entry

Meter Identifier Meter-Bands Counters

Band Type Rate Counters Type Specific Arguments

OpenFlow uses per-flow meters to implement QoS. A meter is a switch element thatis used to measure and control the rate of packets. Flow entries can specify a meter in itsinstruction set. A meter measures and controls the aggregate rate of all flows entries towhich it is attached. A meter table consists of flow entries as in Table 6.2. Each meterhas a 32-bit identifier. The meter counter is updated each time a packet is processed bya meter. The meter bands specify rate bands and corresponding packet processing. Themeter applies the meter band with the highest configured rate that is lower than the currentmeasured packet rate. For example, rate limiting can be applied if the band type is “drop.”

OpenFlow provides required and optional counters associated with flow tables, flowentries, ports, queues, groups, group buckets, meters, and meter bands. Counters areunsigned integers that may measure counts, such as bytes or packets, and durations suchas seconds or nanoseconds. In combination, these counters can measure rates.

The OpenFlow channel in Figure 6.4 is the interface that allows the controller toconfigure and manage the switch, to send packets out the switch, and to receive notifi-cations from the switch. The channel usually operates over TCP and uses transport layersecurity (TLS).

The OpenFlow protocol has three message types. The controller uses controller-to-switch messages to manage and monitor the switch state. For example, the Modify-Statemessage is used to add, delete, and modify entries in the flow and group tables, andthe Packet-Out message is used by the controller to send packets out of the switch. Theswitch uses asynchronous messages to update the controller. Thus, the Packet-In messageis used to transfer control of a packet to the controller, for example, after a table-missevent. The Flow-Removed message is used to notify the controller that a flow entry hasbeen removed. Symmetric messages are used by the switch or by the controller withoutsolicitation, for example, to start the switch-controller connection (“Hello”) or to monitorliveness in the connection (“Echo”).

Altogether, the OpenFlow specifications allow the customization of the forwardingand treatment of classified traffic flows across the network. Indeed, OpenFlow can exer-cise tight control over which packets are admitted into the network. For example, tableentries can be configured so that packets can traverse the network only after associatedtable entries have been established by the controller. Any flow without such entries areforwarded by the switch to the controller after a table miss, and the controller then decideswhether to accept the flow.

The interest in OpenFlow has led to the availability of switches and routers thatsupport the specification. OpenFlow is influencing the design of packet-processingchips with advanced parsing and classification capabilities. OpenFlow has also influ-enced the development of software-based switches. We will discuss these switches afterintroducing OpenStack cloud computing.

“9780471697558c06” — 2015/3/20 — 13:56 — page 138 — #10


6.4 OVERVIEW OF CLOUD COMPUTING AND OPENSTACK

Cloud computing is a computational approach in which software is hosted in large datacenters and where software is provided as a service [1]. The key technology in cloudcomputing is the VM that provides an abstraction of a physical host machine as shownin Figure 6.6. The VM is enabled by the introduction of a hypervisor that interceptsthe instructions between the OS and hardware and manages the sharing of the hardwareamong multiple VMs. Cloud computing provides a computing utility which providesthe illusion of infinite resources through the on-demand sharing of computing resources.A major advantage of cloud computing is its flexible billing model that provides accessto computing without upfront cost. Cloud computing has revolutionized the delivery ofapplications and its tremendous potential impact has stimulated the development of anopen source platform.

OpenStack is a project developing an open source cloud computing platform to pro-vide infrastructure as a service (IaaS). OpenStack offers a set of interrelated services,each through an application programming interface (API) [10]:

• Dashboard (Horizon project): A Web-based portal to interact with OpenStack ser-vices, such as launching an instance, assigning IP addresses and configuring accesscontrol.

• Compute (Nova project): Manages lifecycle of compute instances: spawning,scheduling, and decommissioning of VMs on demand.

• Networking (Neutron project): Enables network connectivity for other OpenStackservices, such as OpenStack Compute. Provides API for users to define networksand attachments. Supports plug-ins for networking vendors and technologies.

• Object Storage (Swift project): Stores and retrieves arbitrary unstructured dataobjects via a RESTful, HTTP-based API.

App

Operating system

HardwareHardware

Traditional stack Virtualized stack

App App

App

OS

Hypervisor

OS OS

App App

Figure 6.6. Virtual machines.

“9780471697558c06” — 2015/3/20 — 13:56 — page 139 — #11

OVERVIEW OF CLOUD COMPUTING AND OPENSTACK 139

• Block Storage (Cinder project): Provides persistent block storage to runninginstances.

• Identity service (Keystone project): Provides an authentication and authorizationservice for other OpenStack services.

• Image Service (Glance project): Stores and retrieves VM disk images, and it isused by OpenStack Compute during instance provisioning.

• Telemetry (Ceilometer project): Monitors and meters the OpenStack cloud forbilling, benchmarking, scalability, and statistical purposes.

• Orchestration (Heat project): Orchestrates multiple composite cloud applications.• Database Service (Trove project): Provides scalable and reliable Cloud database-

as-a-service functionality for both relational and nonrelational database engines.

Figure 6.7 shows the conceptual architecture of the OpenStack projects. End users caninteract with OpenStack through the dashboard, command-line interfaces and APIs.

Neutron

Cinder Nova

VM

Glance

Cellometer

Keystone

Swift

Heat

Provides

Storesimages in

UI

Provides networkconnectivity for

Provides volumes for

Provides auth for

Backups volumes in

Provides images

Monitors

Provisions

Horizon

Orchestrates

cloud

Figure 6.7. Conceptual architecture of OpenStack (from Ref. [10]).

“9780471697558c06” — 2015/3/20 — 13:56 — page 140 — #12


The Identity service (Keystone) is used to authenticate services. The individual servicesinteract through public APIs.

We focus on the Nova compute service and the Neutron networking service. Novaand neutron use a message queue as a central hub for passing messages. A message-oriented middleware, such as RabbitMQ, is used. Asynchronous calls are used for requestresponse and a call-back is triggered once a response is received. The nova-api pro-vides an interface for interaction with the cloud infrastructure. It supports the OpenStackCompute API as well as the Amazon EC2 API. The API server communicates with therelevant components through the Message Queue. The nova-scheduler process takes aVM instance request and decides the compute server it should run on. The nova-computeprocess (Compute worker) deals with instance management life cycle; it creates andterminates VM instances through hypervisor APIs. The nova-compute process requestsnetworking tasks from the Neutron networking service.

Nova compute is designed for use by multiple tenants in a shared system. Each ten-ant has an individual VN, and volumes, instances, images, keys, and users. A user canspecify the tenant by a tenant ID. Tenant resource limits are set by quotas on: Numberof volumes that may be launched; number of processor cores and RAM that can be allo-cated; floating IP addresses assigned to any instance; and fixed IP addresses assigned tothe same instance when it launches.

The Neutron networking service provides a VN service with connectivity betweeninterface devices managed by OpenStack services, typically compute. Just as the NovaCompute API provides a virtual server abstraction, the Neutron API provides a VNabstraction that allows a user to create and attach interfaces to networks. The Neutronserver accepts API requests and directs these to the appropriate OpenStack plug-in andagents that plug and unplug ports, create networks and subnets, and provide IP address-ing. OpenStack networking has a plug-in architecture that allows it to support a varietyof vendor and networking technologies.

In Neutron networking, three types of network resources are identified:

• Network: An isolated layer 2 segment, analogous to a VLAN in a physical network.• Subnet: A block of IPv4 or IPv6 addresses and associated configuration state.• Port: A connection point for attaching a single device, for example, a NIC for a

virtual server, to a VN. Includes associated network configuration, for example,associated MAC and IP addressing.

Users access Neutron networking to configure network topologies and to then instructthe other OpenStack services to attach virtual devices to these networks. Tenants cancreate their own private networks with their own IP addressing schemes.

The typical data center deployment, shown in Figure 6.8, includes a cloud con-troller host, a network gateway host and a number of hypervisors for hosting VMs. Thedeployment includes the following physical networks:

• Management Network: Internal communication between OpenStack components.• Data Network: Inter-VM communications; IP addressing as per plug-in used.

“9780471697558c06” — 2015/3/20 — 13:56 — page 141 — #13

OVERVIEW OF CLOUD COMPUTING AND OPENSTACK 141

Management network

Data network

Network node

neutron-*-plugin-agent

neutron-I3-agent

neutron-metering-agent

neutron-dhcp-agent

Internet

Cloud controllernode

glance-api

glance-registry

neutron-server

keystone-all

nova-conductor

nova-scheduler

nova-api

rabbitmq-server

mysql server

External

network

API

network

Nova-Compute

Neutron-*-plugin-agent

Compute node

Figure 6.8. Typical physical data center networks in OpenStack (from Ref. [10]).

• External Network: VM access to Internet; user on Internet can reach IP addressesin external network.

• API Network: Exposes OpenStack APIs to tenants; IP addresses reachable fromInternet.

Figure 6.9 shows the possible tenant and provider networks. Tenant networks pro-vide connectivity and need to be isolated from other tenants. Neutron networkingsupports several tenant network types:

• Flat: All instances are on the same network, which can also be shared with thehosts. There is no VLAN tagging or other network segregation.

• Local: All instances reside on the local compute host and are isolated from externalnetworks.

• VLAN: Users create multiple provider or tenant networks using VLAN IDs (802.1Qtagged) that correspond to VLANs present in the physical network. Instances cancommunicate with each other, as well as with dedicated servers, firewalls, loadbalancers, and other networking infrastructure on the same layer 2 VLAN.

• VXLAN (Virtual Extensible LAN) and GRE (Generic Routing Encapsulation):These network overlays support private communication between instances.

Provider networks use an existing physical network in the data center. These networksmay be shared among tenants. By allowing tenant networks to select their IP addresses,

“9780471697558c06” — 2015/3/20 — 13:56 — page 142 — #14


Compute node

VM1

VM2

Tenant network 1 Tenant network 2

VM3

VM4

Neutron router

Provider network

Externalnetworks

Physicalnetwork

Figure 6.9. Tenant and provider networks (from Ref. [10]).

it becomes possible to migrate applications between user data centers and public datacenters as required by demand or fault conditions.

6.5 SDN FOR CLOUD COMPUTING

Figure 6.9 shows that VMs require a new layer of access network inside the computenode. As shown in Figure 6.10 each VM now has one of more virtual interfaces (VIFs)that connect it to virtual switches that are in turn connected to physical interfaces (PIFs).Network connectivity is now required to interconnect VMs in the same host as well as inother hosts. Linux bridging is available to provide this connectivity, but it does not ade-quately meet new requirements that arise with VMs [11]. The migration of applicationsimposes new network mobility requirements. The deployment of tens of VMs in a hostand hundreds of thousands of VMs in a data center poses new scalability challenges. Thesharing of computing resources among multiple tenants introduces new security risks andheightened requirements for isolation. The combination of these requirements can be metby extending SDN into the networks that connect VMs, as done for example, by OpenvSwitch.

6.5.1 Open vSwitch

The Open vSwitch (OVS) is a software-based virtual switch to provide intra- andinter-VM connectivity while also providing an external interface for the control ofconfiguration state and forwarding behavior [11]. This allows OpenFlow capabilities forfine-grained control of flows to be leveraged and integrated across a multilayer networkfor connecting VMs. Support can then be provided for QoS, tunneling, and filtering thatin turn can be used to provide isolation, security, and network mobility.

“9780471697558c06” — 2015/3/20 — 13:56 — page 143 — #15

SDN FOR CLOUD COMPUTING 143

VM VM VM VM

VIFVIFVIFVIFVIF

Virtual switch

PIF PIF PIF

Virtual switch

Figure 6.10. Virtual interfaces connect VMs to virtual switches and to physical interfaces

(from Ref. [11]).

VM VM VM

VIFVIFVIFVIF

Fast path (kemel)

PIF PIFExt. interfaces

Slow path (userspace)

Management domain

Figure 6.11. Open VSwitch (from Ref. [11]).

The virtualization environment that supports VMs can provide the new virtualswitches with useful information, for example, the MAC addresses for the virtual inter-faces, the movement of VMs, and the joining to multicast groups. Better coordinationbetween the virtual computing and networking is possible because the same informationthat is used by the hypervisor and management layer to power VMs on/off, to migratehosts, and to control access can also be used to manage the VN.

Open vSwitch uses Ethernet switching for VM networking with VLAN, RSPAN(Remote Switched Port Analyzer) to analyze traffic, and Access Control Lists. It alsoprovides port bonding, GRE and IPsec tunneling, and per-VM policing.

Figure 6.11 shows the architecture of the Open vSwitch. The Open vSwitch softwareresides within the hypervisor. The switch has a “fast path” module in the kernel thatimplements the speed-critical forwarding engine and the counters for table entries. Theswitch has a “slow path” in user space that implements forwarding logic, and remotevisibility and configuration interfaces, including OpenFlow. Thus Open vSwitch allowsthe forwarding path to be manipulated by writing to the forwarding table and specifyinghow packets are handled (forwarded, dropped, encapsulated) based on header fields.

“9780471697558c06” — 2015/3/20 — 13:56 — page 144 — #16


The Open vSwitch local management interface allows the virtualization layer tomanage the creation of switches and the connectivity of virtual and physical interfaces.The rule-based forwarding provided by the flow tables allows network configurationstate and forwarding to be associated with specific flows, for example, a VM or groupof VMs. This enables a global management process not only to have visibility of localstate in the virtual switch but also to migrate the associated network configuration statecorresponding to a group of VMs that is moved between servers.

In multitenant settings, it is desirable for VMs from different tenants to share thesame physical server while providing strong isolation. On the other hand, it is also nec-essary to provide connectivity between VMs that belong to the same tenant but that residein different hosts. Open vSwitch provides the capability to create virtual private networksto connect VMs from the same tenant while providing isolation from other tenants. OpenvSwitch allows a tenant to be assigned a VLAN ID in small-scale deployments.

Connectivity in larger scale deployments is handled by Open vSwitch through theuse of GRE tunnels. In GRE, an Ethernet frame is encapsulated inside an IP datagram,which is routed from the originating subnet to the destination subnet [12]. A GRE tunnelis established between any two servers that have a VM belonging to the same tenant.The MAC-to-IP mapping required for the tunnel is downloaded into the Open vSwitchtable entries using OpenFlow. This approach has the advantage that no state concerningthe tunnels needs to be maintained in the physical network. OpenFlow does not havea tunnel-provisioning message, so the Open vSwitch Database Management Protocol(OVSDB) was developed to construct the mesh of GRE tunnels between servers thathave VMs from the same tenant.

VXLAN is an alternative tunneling method to GRE. VXLAN creates tunnels byencapsulating Ethernet frames on top of UDP and IP. The approach provides 24-bit tagsto overcome the VLAN scale limitations. The reader should refer to Ref. [12] for morediscussion on SDN networking issues for cloud computing.

6.5.2 Meeting Networking Requirements

We have seen that SDN provides many powerful techniques for realizing network virtu-alization: (i) resource allocation and bandwidth provisioning, (ii) resource isolation andaddressing, and (iii) support for tenant-specific communication and routing protocols.We have seen that SDN-enabled components such as Open vSwitch (OVS) support thecreation of virtual switches and interfaces within VM hypervisors, or in OVS-enabledswitches. Meanwhile, SDN frameworks such as Openflow provide simple and efficientmeans to provision virtual links.

As for resource isolation, Openflow includes the capability to limit bandwidthusage and there are numerous proposals [13] on achieving rate limiting at differ-ent levels, including flow, ingress, and slice limiting. Thus, we anticipate that futureOpenflow-enabled switches will have the capability to provide guaranteed bandwidthfor individual VNs. Even though address isolation is often implemented using tunnel-ing, recent proposals use address translation supported by OpenFlow to achieve addressisolation. Finally, supporting tenant-specific routing protocols can be achieved using avariety of software components, for example, using FlowVisor.

“9780471697558c06” — 2015/3/20 — 13:56 — page 145 — #17

COMBINING OPENFLOW AND OPENSTACK WITH OPENDAYLIGHT 145

6.5.3 Inter-Data Center Networking

New challenges arise when a tenant network is deployed or migrated across different datacenters [14, 15]. The VN needs to handle the addressing schemes and forwarding fabric ofthe data centers. The connectivity between the data centers may be shared with the publicInternet and some means of allocating resources for the tenant networks is required. Inaddition, the live migration of VMs can impose special performance requirements on theinter-data center network.

An SDN approach to providing inter-data center connectivity is attractive for severalreasons. First, it allows the extension of the network abstraction that is already in use inthe individual data center. Second, the isolation techniques from intra-data center SDNcan be extended. Third, the management approach of OpenFlow can be applied. An SDNapproach is proposed in Ref. [15] for providing VNs on demand on loosely coupleddata centers. The approach involves dynamically organizing Virtual Private LAN Service(VPLS) paths to extend VNs across data centers. In Ref. [16], an SDN abstraction andAPI is used to extend OpenStack VN into the WAN. This approach improves over IPSecand SSL VPNs by building on WAN services that can support QoS. The application ofSDN in optical transport networks has begun to receive attention. For example, Ref. [17]presents an Open Transport Switch for bursting data between data centers using opticaltransport networks.

6.6 COMBINING OPENFLOW AND OPENSTACK WITHOPENDAYLIGHT

Given the large number of technologies that can implement network virtualization at var-ious levels, it becomes increasingly important to design frameworks in the managementplane to ensure these technologies can work seamlessly to achieve management includ-ing consistency, efficiency, performance, reliability, and security. These frameworks needto provide various functionalities including monitoring, scheduling, resource alloca-tion, dynamic adaptation, and policy enforcement. For example, OpenDaylight [18] isa framework that provides functionalities for managing VNs in the context of SDN.

6.6.1 OpenDaylight Overview

OpenDaylight is an open-source project that is developing a modular, pluggable, andflexible controller platform. The controller exposes open northbound APIs to appli-cations. These applications can then use the controller to gather network intelligence,perform analytics, and then orchestrate new rules using the controller.

As shown in Figure 6.12, the controller platform consists of dynamically plug-gable modules that perform required network tasks. Base network services address basicmanagement functions of network devices. The topology manager builds the networktopology, and the Stats manager collects statistics. The Switch manager handles south-bound device information, and the Forwarding Rules manager (FRM) installs flowson southbound devices. The host tracker tracks connected hosts and the ARP handlerhandles ARP messages. Other network services can be added to the controller platform.

“9780471697558c06” — 2015/3/20 — 13:56 — page 146 — #18

Management

GUI/CLI

VTN

coordinator

D4A

protection

OpenStack

Neutron

Network applications

orchestration and services

Controller platform

OpenStack service

OpenDaylight APIs (REST)

Base network service functions

Topology

MgrStats

Mgr

Switch

MgrFRM

Host

tracker

ARP

handler

Affinity

service

OVSDB

Neutron

VTN

manageroDMC

Service abstraction layer (SAL)

(plug-in mgr., capability abstractions, flow programming, inventory,...)

NETCONFOpenFlow

1.0 1.3OVSDB

OpenFlow enabled

devicesOpen vSwitches

Additional virtual and

physical devices

Southbound interfaces

and protocol plugins

Data plane elements

(virtual switches,

physical device interfaces)

Figure 6.12. OpenDaylight architecture (from Ref. [18]). VTN, virtual tenant network; oDMC,

open dove management console; D4A, defense4A# production; LISP, locator/identifier sep-

aration protocol; OCSDB, Open vSwitch data base protocol; BGP, border gateway protocol;

PCEP, path computation element communication protocol; SNMP, simple network management

protocol.

“9780471697558c06” — 2015/3/20 — 13:56 — page 147 — #19

COMBINING OPENFLOW AND OPENSTACK WITH OPENDAYLIGHT 147

REST API

REST API

VTN application

VTN coordinator

VTN manager

Switch Switch Switch Switch Switch Switch Switch Switch Switch

VTN manager VTN manager

ODC driver

OpenDaylight controller

(ODC)


(ODC)


(ODC)

Figure 6.13. Virtual tenant network architecture (from Ref. [18]).

Figure 6.12 shows the Virtualization edition of OpenDaylight, which targets datacenters. This edition includes the OVSDB protocol southbound to configure OpenvSwitches in VNs. In particular, the Neutron bundle of the Virtualization edition supportsVXLAN and GRE tunnels for OpenStack and CloudStack deployments.

The Virtualization edition also supports Virtual Tenant Network (VTN) service.VTN provides multitenant VN on an SDN controller. VTN allows users to design anddeploy a network without requiring knowledge of the physical network. VTN maps thedesired network to the underlying physical network. Figure 6.13 shows the architectureof the VTN application. The VTN coordinator is an application that allows a user to usethe VTN Virtualization. The coordinator interacts with one or more VTN Managers toimplement the user configuration.

Figures 6.7 and 6.12 in combination show how cloud computing and SDN interactin the deployment of virtual computing and networking resources. In Figure 6.7, theuser may initiate the deployment of an application that requires support from a set ofVMs with connectivity requirements. Figure 6.12 shows how the Neutron networkingservice in OpenStack can invoke the services of OpenFlow to provide the desired networkconnectivity.

In a typical scenario, a service provider (i.e., a tenant) submits a virtual infrastruc-ture request. The request describes the topology of the virtual infrastructure, and providesresource requirement of each virtual node (i.e., VMs, virtual switches, routers, and fire-walls) as well as bandwidth requirement for each virtual link. This request is sent to ascheduler that makes decisions regarding how each virtual node and link is mapped tophysical resources.

“9780471697558c06” — 2015/3/20 — 13:56 — page 148 — #20


OVS OVS OVS

NovaNeutron

OpenStack

Internet

Managed by OpenDayLight

Managed by Nova

OpenFlow OVSDB

REST API

BGP SNMP

Topologymanager

Statsmanager

Switchmanager

Hosttracker

VTNmanager

OpenDayLight

Service abstraction layer

OVS

App.

requests

Figure 6.14. Architecture for managing virtual infrastructures.

The scheduling of virtual infrastructure is also known as the VN embedding problem[19], whose goal is to improve the acceptance rate of virtual infrastructure requests whileminimizing operational costs such as energy. Once the scheduling decision is made, thescheduler allocates appropriate physical resources according to the scheduling decision.This is done by scheduling VMs and other virtual resources, creating appropriate virtualswitch and routers and installing forwarding policies in Openflow-enabled switches.

In Figure 6.14, we use OpenDaylight as an example to illustrate this process. A ten-ant submits its virtual infrastructure request to OpenStack, which in turn, uses Novato schedule corresponding VMs in the data center. It also delegates OpenDaylight toschedule the VN. To do so, the scheduling request is first sent to OpenDaylight managerthrough its REST API. OpenDaylight solves the VN embedding problem and contacts theunderlying components such as OpenFlow controllers, OpenvSwitch database (OVSDB)to create VN components and install forwarding rules in Openflow switches. Once theVN is created, the VN topology information is then stored in the Topology Manager,and information of the VN components (e.g., virtual switches) is stored in the switchmanager. Once the virtual infrastructure is scheduled, a stats manager will continuouslymonitor the status of the virtual infrastructure through the service abstraction layer. Basedon the operating conditions, the allocation of each virtual infrastructure may need to bechanged over time. For example, the tenant may want to scale up or down the virtualinfrastructure at run-time to cope with demand fluctuation.

“9780471697558c06” — 2015/3/20 — 13:56 — page 149 — #21

SOFTWARE-DEFINED INFRASTRUCTURES 149

6.7 SOFTWARE-DEFINED INFRASTRUCTURES

Throughout this chapter, we have assumed that the resources consist of computing andnetworking resources. However, in slightly different contexts, for example, in virtualiz-ing the wireless and wireless access networks in Figure 6.14, additional resources couldinclude programmable hardware and other sharable high-performance resources. In theSAVI project [20, 21], we address the more general setting where the Software-DefinedInfrastructure (SDI) includes heterogeneous virtualized resources that are managed inintegrated fashion along with computing and networking. In addition to cloud and net-work controllers, these SDIs may require controllers for these additional resources, forexample, programmable hardware resources [22].

Figure 6.15 show the architecture for the SAVI SDI resource management system(RMS). The SDI manager has overall control of resources of different types, for exam-ple, A, B, and C. The external entities request virtual resources from the SDI resourcemanagement system through open interfaces. The SDI RMS executes coordinated andintegrated resource management for the heterogeneous resources through an SDI man-ager and a topology manager. The SDI manager performs its management functionsbased on the resource information provided by the topology manager. Resource-specificcontrollers (e.g., OpenStack or OpenFlow controllers) are responsible for managingresources of a given type. Each resource controller accepts the high-level user descrip-tions and manages the resources of a given type. The topology manager maintains aglobal view of the resources, their relationships, as well as monitoring and measurementdata. It enables the SDI manager to perform state-aware resource management.

SAVI is exploring the deployment of applications in a multitier cloud that includesmassive core data centers, smart edge nodes, and access networks. SAVI has designed anode cluster that provides virtualized and physical computing and networking resources,

External entities

Open interface

SDI resource

management

system

SDI manager Topology manager

Resource controller CResource controller BResource controller A

Resource A

Converged heterogeneous

resources

Physicalresource

Virtualresource

Resource B Resource C

Figure 6.15. SAVI SDI resource management system.

“9780471697558c06” — 2015/3/20 — 13:56 — page 150 — #22


including heterogeneous resources such as Intel Xeon servers, storage, OpenFlowswitches, GPUs, NetFPGAs, Altera DE5-Net FPGAs, and ATOM servers. SAVI hasimplemented the Janus SDI Resource Management System to manage the heterogeneousresources provided by a SAVI node. Janus builds on top OpenStack and OpenFlow. ACanadian test bed has been deployed with nodes in the following universities: Victoria(British Columbia); Calgary and Alberta; Carleton, Toronto, York, Waterloo in Ontario;and McGill in Quebec. The SAVI test bed is supporting research on large-scale appli-cations, multitier cloud computing, architecture of the smart edge, virtualized wirelessaccess, and management of SDI.

6.8 RESEARCH TRENDS AND CHALLENGES

We began this chapter with a discussion of application platforms to provide a holisticview of the broad range of requirements that must be met by future SDN and computingclouds. While great progress has been made in advancing SDN and clouds, here wereiterate several major challenges that remain to be addressed: Orchestration, adaptiveresource management, content distribution, and scalability.

Methods for the orchestration of the resources to support distributed applications arein a relatively early stage of development. Methods are needed for the automated deter-mination and allocation of the computing and networking resources for applications. TheHeat project in OpenStack is striving to meet this need by developing an orchestrationengine for launching cloud applications [23]. A Heat template is used to describe theinfrastructure resources required by an application. The Network Functions Virtualiza-tion (NFV) concept is being developed to virtualize network node functions that canserve as building blocks to create communication services [24]. Clearly, orchestration isa key element in NFV.

The automated scaling of resources allocated to support an application is essentialto achieving the economies of scale that derive from cloud computing and virtualiza-tion. Methods are required for the measurement and monitoring of demand and availableresources and for the autoscaling of resources. To be implemented, the rich literature inadaptive resource management requires a platform for measurement and monitoring andautomated resource management. The Ceilometer project in OpenStack is developingan infrastructure to collect measurements within OpenStack to support monitoring andmetering [25]. The SAVI project discussed above is exploring the use of Ceilometer inconverged virtualized computing and networking infrastructures.

The collection and distribution of content represents a major driver of current ITinfrastructure. The growth in video services and the emergence of Big Data applicationsnecessitate an exploration of the virtualization and management of storage resources.The churn in demand for specific content requires striking a balance between contentthat is stored remotely in a few sites and content stored broadly in local sites. Variousinformation-centric architectures need to be explored in the context of the multitier cloudinfrastructures that are emerging to support application platforms. The huge volumes ofcontent that need to be transferred also motivate the investigation of the optical transporttechnologies in these new architectures.

“9780471697558c06” — 2015/3/20 — 13:56 — page 151 — #23

REFERENCES 151

The scalability of management systems will be challenged by the continuous growthin application platforms and associated resources. The volume of messaging consumedby management and data collection must be kept to reasonable levels while providing theresponsiveness, effectiveness, and reliability required of the management system. Thisrequires further research in management system architectures.

6.9 CONCLUDING REMARKS

The potential benefits of service-oriented computing and the virtualization of resourceshave spurred intense activity in the advancement of cloud computing and SDN. In thischapter, we have provided an integrated view of how cloud computing and SDN, andspecifically OpenFlow, OpenStack, Open vSwitch, and OpenDaylight come together. Wehave also introduced the SAVI project which explores the notion of SDI that encompassesboth cloud computing and SDN to support large-scale applications.

REFERENCES

1. Fox, Armando, et al. “Above the clouds: A Berkeley view of cloud computing.” Departmentof Electrical Engineering and Computer Science, University of California, Berkeley, ReportUCB/EECS 28 (2009).

2. Anderson, Thomas, et al. “Overcoming the Internet impasse through virtualization.” Com-puter 38.4 (2005): 34–41.

3. McKeown, Nick, et al. “OpenFlow: Enabling innovation in campus networks.” ACM SIG-COMM Computer Communication Review 38.2 (2008): 69–74.

4. Shenker, Scott, et al. “The future of networking, and the past of protocols.” Open NetworkingSummit (2011).

5. Nunes, Bruno Astuto A., et al. “A survey of software-defined networking: Past, present, andfuture of programmable networks.” IEEE Communications Surveys and Tutorials 16.3 (2013):1617–1634.

6. Casado, Martin, et al. “Ethane: Taking control of the enterprise.” ACM SIGCOMM ComputerCommunication Review 37.4 (2007): 1–12.

7. Davie, Bruce, and Yakov Rekhter. MPLS: Technology and Applications. San Francisco, CA:Morgan Kaufmann Publishers Inc., 2000.

8. Open Networking Foundation, Open Flow Switch Specification version 1.4.0, October 14,2013.

9. Campbell, Andrew T., et al. “Open signaling for ATM, internet and mobile networks (OPEN-SIG’98).” ACM SIGCOMM Computer Communication Review 29.1 (1999): 97–108.

10. OpenStack Foundation, OpenStack Administrator Guide, Havana, April 6, 2014.

11. Pfaff, Ben, et al. “Extending networking into the virtualization layer.” Hotnets. New York,October 2009.

12. Azodolmolky, Siamak, Philipp Wieder, and Ramin Yahyapour. “SDN-based cloud computingnetworking.” 2013 15th International Conference on Transparent Optical Networks (ICTON),IEEE, June 23–27, Cartagena, Spain, 2013.

“9780471697558c06” — 2015/3/20 — 13:56 — page 152 — #24


13. http://archive.openflow.org/wk/index.php/Rate_Limiter_Proposal. Accessed November 21,2014.

14. Wood, Timothy, et al. “The case for enterprise-ready virtual private clouds.” Usenix Hot-Cloud Proceedings of the 2009 Conference on Hot Topics in Cloud Computing, June 14–19,San Diego, CA, 2009.

15. Luo, Mon-Yen, and Jun-Yi Chen. “Software defined networking across distributed datacentersover cloud.” 2013 IEEE 5th International Conference on Cloud Computing Technology andScience (CloudCom), vol. 1. IEEE, December 2–5, Bristol, 2013.

16. Baucke, Stephan, et al. “Cloud Atlas: A software-defined networking abstraction for cloudto WAN virtual networking.” 2013 IEEE 6th International Conference on Cloud Computing,June 28–July 3, Santa Clara, CA, pp. 895–902, 2013.

17. Sadasivarao, Abhinava, et al. “Bursting data between data centers: Case for transport SDN.”2013 IEEE 21st Annual Symposium on High Performance Interconnects, August 21–23, SanJose, CA, pp. 87–90, 2013.

18. Linux Foundation Collaborative Projects, Open Daylight Technical Overview,www.opendaylight.org/project/technical-overview. Accessed November 21, 2014.

19. Zhani, Mohamed Faten, et al. “VDC planner: Dynamic migration-aware virtual data cen-ter embedding for clouds.” 2013 IFIP/IEEE International Symposium on Integrated NetworkManagement (IM 2013), IEEE, May 27–31, Ghent, Belgium, 2013.

20. Kang, Joon-Myung, Hadi Bannazadeh and Alberto Leon-Garcia. “Software-defined infras-tructure and the future CO.” ICC Communications Workshops, Budapest, Hungary, June2013.

21. Kang, Joon-Myung, Hadi Bannazadeh and Alberto Leon-Garcia. “Software-defined infras-tructure and the SAVI testbed.” TridentCom 2014, Guangzhou, May 2014.

22. Byma, Stuart, Hadi Bannazadeh, Alberto Leon-Garcia, J. Gregory Steffan, Paul Chow. “Virtu-alized reconfigurable hardware resources in the SAVI Testbed.” Tridentcom 2014, Guangzhou,May 2014.

23. Heat: Openstack Orchestration, https://wiki.openstack.org/wiki/Heat. Accessed July 30, 2014.

24. Ersue, Mehmet. ETSI NFV management and orchestration, https://www.google.ca/webhp?sourceid=chrome-instant&rlz=1C5MACD_enCA568CA577&ion=1&espv=2&es_th=1&ie=UTF-8#q=nfv%20orchestratino. Accessed July 30, 2014.

25. Ceilometer: OpenStack Telemetry, https://wiki.openstack.org/wiki/Ceilometer, Accessed July30, 2014.

“9780471697558c07” — 2015/3/20 — 16:21 — page 153 — #1

7

MOBILE CLOUD COMPUTINGJaveria Samad, Seng W. Loke, and Karl Reed

Department of Computer Science and Computer Engineering,Latrobe University, Melbourne, Australia

7.1 INTRODUCTION

Cloud computing opened the doors for a paradigm shift for the ways in which systems aredeployed and used. It has made possible utility computing with infinite scalability anduniversal availability of systems [1]. Mobile cloud computing (MCC) has taken this to astep further by enabling the users to carry on their tasks irrespective of their movementand location [2, 3]. Despite the increasing popularity and usage of MCC, there are certainissues inherent with it that still haunt the mobile cloud community, making it difficult toutilize the full potential of the clouds. These issues or “risks” span the whole structure andlife cycle of mobile clouds and could be as varied as security, operations, performance,and end users.

This chapter aims at exploring MCC further and to highlight the risks related tomobile clouds, in addition to the risks normally associated with system development.While we briefly present practical solutions for most of these issues via some standardmethod or approaches, suitable for the respective issues, the aim is to point out the needfor systematic risk analysis and management frameworks for such applications.


153

“9780471697558c07” — 2015/3/20 — 16:21 — page 154 — #2

154 MOBILE CLOUD COMPUTING

7.1.1 Significance/Motivation

Cloud computing mainly focuses on how to best manage the computing, storage, andcommunication resources shared by multiple users virtually; whereas MCC works byapplying cloud computing solutions using resources available in mobile environment. Itallows the execution of mobile applications, data storage, and processing on external/remote resources rather than on the mobile device itself, while allowing free move-ment of the user/mobile device. MCC requires functional collaboration between differentmobile devices. It requires the mobile devices to be aware of “presence,” “status,” andthe “context” of other portable devices within their network, so as to provide the bestpossible ad hoc communication environment [2].

The complexity and dynamism of a mobile cloud system poses many risks. At sys-tem level, these include the risks of connectivity, limited resources, security, and limitedpower supply. As the system complexity increases, both the technical and nontechnicalrisks increase, and so is the need to manage these risks. The ad hoc nature and mobil-ity [2] in MCC environments means that the development of these system need morerigorous and specialized risk management to deal with all the risks. This can furtherburden the developers of MCC frameworks and applications. In addition to the complex-ity of mobile cloud infrastructure, they also have to deal with the risks at framework/application level including but not limited to efficient job distribution, virtualization andscalability, and so on.

In the current scenario, from our review so far, we conclude there is no availableformal risk management process in place to deal with the risks of MCC. As with anydevelopment and deployment activity, an effective risk management is integral to thesuccess of any MCC system; it’s a critical element while designing MCC systems. How-ever, the literature review shows that the current work on mobile cloud systems focusesmore on cost and resource savings, and there has been little progress toward the devel-opment of mobile cloud “aware” risk management methodologies. There is a need tomake the mobile cloud developers and users realize the importance of an effective andefficient risk management in place. Risk management not only protect the organizationsfrom various risks but also plays a critical role in enabling mobile cloud providers toachieve their goals by improved decision making through up-to-date risk reporting andalso to help meet end users’ quality-of-service requirements. An efficient risk manage-ment process can also protect the providers from risks of cost overuns during the wholesystem life cycle, and can also improve customer satisfaction/confidence in a deliveredsystem.

The organization of this paper is as follows: Section 7.2 provides an overview of theMCC domain and provides a discussion on different selected mobile cloud frameworksand their categorization. Section 7.3 defines risk management and presents an analysis ofrisk factors currently prevalent in the MCC domain. An illustration of how these risks canaffect an application is also presented in this section. Section 7.4 presents an analysis ofmobile cloud frameworks (surveyed in Section 7.2) from a risk management perspectiveand also discusses the effectiveness of traditional risk approaches in dealing with MCCrisks. Section 7.5 summarizes the review and concludes.

“9780471697558c07” — 2015/3/20 — 16:21 — page 155 — #3

MOBILE CLOUD COMPUTING 155

7.2 MOBILE CLOUD COMPUTING

Cloud computing refers to the provisioning of computing capabilities as a “service”(instead of product) to users via Internet and Web technologies. Cloud computing hasbeen defined by different authors in various words. Some see it only as an enhance-ment to multiple existing technologies, whereas some others are very enthusiastic aboutthe potential of cloud computing. A simple definition of cloud is presented in FrançoisRagnet and Conlee [4] as “a model for enabling convenient, on-demand network accessto a shared pool of configurable computing resources (e.g., networks, servers, storage,applications, and services) that can be rapidly provisioned and released with minimalmanagement effort or service provider interaction.” Vaquero et al. [5] has provided acomprehensive comparison of different cloud definitions prevailing in the literature toclarify what cloud computing really is. Based on their explanation, the cloud computingconcept can be summarized as a large pool of easily usable, accessible, and dynami-cally scalable virtualized resources, offered as pay-per-use model, allowing for optimumresource utilization.

Basic cloud computing is an amalgam of various technologies all put together tochange the ways in which IT infrastructures are built. The cloud is different from otherolder technologies (i.e., Internet, grid/distributed computing) in a sense that with cloudsthe users can use service when they need it and for as long as they need it. Cloud comput-ing works on a mechanism of “utility-based services” where you pay only for durationand amount/type of services used. Also, unlike these technologies, cloud computingprovides architectural, domain and platform independence.

The key characteristics of any cloud infrastructure are “abstraction” and “virtual-ization.” Cloud computing must be able to allow the users to use computing serviceson shared resources virtually, in a dynamically scalable way, without having knowledgeabout location of, or the hardware and software resources involved, database design, andstorage infrastructure. With cloud computing, the users can enjoy much needed elastic-ity (scalability), resource sharing/pooling, on-demand service access on utility basis, andbroader network access and availability. Abstraction and virtualization are provided byindividual cloud vendors at different levels, with some allowing total flexibility and someothers offering somewhat restricted control to the users.

Cloud computing can be seen from two perspectives: (1) the way the clouds aredeployed and (2) the services that are delivered by the cloud platform [6, 7]. Cloudcan be deployed as either: public cloud—where the cloud infrastructure is available togeneral public and the cloud provider and consumer usually belong to different organisa-tions; private cloud—where cloud infrastructure is limited to a private group (or a singleorganization); or hybrid cloud—which combines services of public and private clouds.

The services offered by cloud can be categorized in three different ways that is(1) platform as a service (PaaS), (2) software as service (SaaS), or (3) infrastructureas service (IaaS), which are self-explanatory for the type of services they offer [5–7].In Huang et al. [8], the authors have discussed yet another approach for mobile cloudservice models that classifies mobile cloud in three models, that is, mobile as serviceconsumer (MaaSC), mobile as service provider (MaasP), and mobile as service broker

“9780471697558c07” — 2015/3/20 — 16:21 — page 156 — #4


(MAASB). The authors advocate using a more user-centric approach for ensuring mobilecloud design principles.

Mobile cloud computing is an enhancement of cloud computing in which the capa-bilities of cloud computing are realized using mobile communication infrastructure [9].Basic MCC is based on same techniques as that of mobile networks. A mobile clouddiffers from simple cloud computing in the same manner as mobile networks differ fromwireless networks.

The popularity of mobile applications has increased dramatically in past decade,allowing users to use applications plus mobility. Mobile applications provide muchneeded freedom to the users as it enables them to use these applications wheneverthey need and wherever they need. Such applications span all walks of life includingbut not limited to entertainment, gaming, learning, healthcare, and commerce. Despitethe ease of use and popularity, mobile users still suffer from the issues such as limitedpower supplies, limited storage space, and limited computing resources on their mobiledevice [10, 11]. Answer to this problem would be to export all the complex processingand storage to some external server or “cloud” instead of mobile device itself. Cloudcomputing provides one such solution.

MCC is formally defined as a “model for transparent elastic augmentation ofmobile device capabilities via ubiquitous wireless access to cloud storage and comput-ing resources, with context-aware dynamic adjusting of offloading in respect to changein operating conditions, while preserving available sensing and interactivity capabilitiesof mobile devices” [12].

In more general terms, MCC refers to the usage of cloud computing on mobiledevices, independent of the movement of user. All the storage and complicated process-ing is done external to the mobile device, thereby saving tremendously on computingresources and power supply for mobile device. The added benefit a mobile cloud presentsis mobility; however, a mobile cloud cannot be fully advantageous, if it doesn’t cater forthe other functionality aspects associated with conventional clouds, that is, adaptability,scalability, and availability. The primary aim of MCC is to merge the advanced com-puting and communications technologies, to provide users with a seamless computingenvironment.

MCC provide users with a number of benefits including sharing the resources andapplications without investing huge amounts of money on specialized hardware and soft-ware. Also, as most of the complex processing is done externally, the users can enjoycost reductions for computing power as well [13]. Instead of using a remote cloud forprocessing, an ideal mobile cloud scenario could make use of a “local” cloud made upof surrounding mobile devices. This would also eliminate the dependence of device onremote servers, and possibly reduce data transfer latency.

As with any other wireless mobile networks, MCC faces challenges as well.A primary challenge for any mobile cloud environment is providing constant networkavailability, irrespective of user movement, which can be difficult or impossible at times.However, the emerging technologies are addressing this problem intensively with somesystems providing “caching” facilities for mobile applications so that users can continuework seamlessly, even if connection is disrupted momentarily.

“9780471697558c07” — 2015/3/20 — 16:21 — page 157 — #5


At present, most mobile cloud applications depend on remote servers for processingand storage, exposing the users still to the risk of loosing precious resources over con-nectivity with remote servers. This remote connectivity also presents the issues of thebandwidth, time delays, costly data services, and context ignorance. Mobile cloud usersoften need information services relevant to their recent/current contexts (like locationand time). An ideal mobile cloud scenario should be capable of utilizing the resourcesfrom a more local cloud and accessible mobile devices, if this is better, instead of relyingon remote servers [13, 14]. A temporary local mobile cloud made up of eligible mobiledevices present in the surroundings at the same time can solve the problems of bandwidth,costs, and time delays, while ensuring more context aware solutions for users [12, 15].Technology currently in use for mobile and sensor-based networks can be utilized fordevelopment and deployment of this local mobile cloud setting.

7.2.1 Types of Mobile Clouds

The mobile cloud setup can be seen in three different ways. More or less similarcategorization is also proposed in Fernando et al. [16]:

1. Client server: In this approach, a mobile device works as a thin client for a remoteserver, where the processing for mobile applications is done on remote servers.All the public clouds like Amazon’s EC2, Microsoft Azure Platform, or GoogleAppEngine can be examples of such client-server cloud models [6].

2. Peer to peer: In this approach, all the eligible mobile devices can act as resourceservers for other eligible mobile devices within the surroundings. All these mobiledevices make up a local cloud, thus eliminating the need to connect to remoteservers. This is the ideal scenario, ensuring highest level of mobility [12, 14, 17].

3. Hybrid approach: As with other wired and wireless networks, a hybrid approachcomprises the features of both client server and peer-to-peer approaches. Thisapproach works by enabling a mobile device to act as a client for a local cloud,which in turn connects to a remote server. However, as a mobile device might notconnect to a remote server directly, it can bypass the remote connectivity issues.Kovachev’s mobile community cloud [15] can be an example of such a hybridapproach.

7.2.2 Mobile Cloud Application Models and Frameworks:A Brief Overview

There are various existing MCC frameworks and application models, each trying to pro-vide solutions for or improve some of prevalent MCC concerns [16, 18, 19]. Here, wepresent a brief overview of the current approaches in MCC and provide an analysis for asubset of these methodologies, from the risk management perspective. We discuss threeaspects of MCC: mobile cloud architectures, communication mechanisms or connec-tion protocols, and inherent risk management strategies within MCC frameworks. These

“9780471697558c07” — 2015/3/20 — 16:21 — page 158 — #6


aspects are useful and significant for analysing the current approaches from the riskperspective, as the most common risk contributors can be the underlying structure andtechnology itself.

7.2.2.1 Mobile Cloud Architectures. An MCC architecture refers to the dif-ferent approaches used by multiple frameworks for their respective job distribution andprocessing. Various approaches have been used in literature to distribute the jobs effec-tively between mobile device and clouds dynamically; however, we will restrict ourselvesto the approaches adopted by the frameworks surveyed in this review.

The MCC frameworks and concepts surveyed in this chapter can be categorized intotwo groups:

1. that use application partitioning/client server such as Hyrax, Spectra, Chroma,Alfredo, CMCVR, Cuckoo, and MWSMF;

2. that use VM technology such as Cloudlets, CloneClouds, MAUI, and MobiCloud.

The first category is directly related to previously discussed client-server and peer-to-peermobile cloud types. An interesting observation, however, is that, in some situations thevirtualization can be mapped on to the “Hybrid” mobile cloud type as it sometimes incor-porates the characteristics of client-server or peer-to-peer technologies. Moreover, someframeworks such as MAUI incorporate the characteristics of both categories collectively.

The MCC frameworks and concepts belonging to each category are discussed in thefollowing sections.

Application Partitioning/Client Server This category comprises the frame-works that use client-server approach for task offloading. Frameworks belonging to thiscategory work by a mechanism of application partitioning in which the task is dividedfor processing between mobile device and remote server. In most cases, the criterion forthis division is embedded in the code; however, in a few other cases, it can be decided atruntime.

Apache Hadoop is one such software framework that supports distributed applica-tions and data-intensive processing across large sets of independent computers, and iscapable of dynamic scalability for up to thousands of machines in minimal time [20].It’s claimed to be capable of automatic failure detection and handling through Hadoop’sNameNode and Hadoop Distributed File System (HDFS); the task failures are handledthrough node-replication mechanisms. Hadoop is a free implementation of MapRe-duce [21]. Its significance is eminent from the fact that many of the current MCCframeworks and models are based on Hadoop and MapReduce.

One such platform based on Hadoop is Hyrax [22], which supports cloud computingon Android smartphones. Hyrax works by utilizing a resource pool of multiple mobiledevices present in the surroundings. Hyrax designers have discussed how such a mobilecloud could be formed, enabling applications to utilize computational resources of allthe mobile devices (making up cloud) collectively. The key processes in any Hadoopimplementation are NameNode, JobTracker, DataNode, and TaskTracker. A Hadoopclusters works via master node and slave or worker node (that works as DataNode and

“9780471697558c07” — 2015/3/20 — 16:21 — page 159 — #7


TaskTracker). Distributed processing is supported through Hadoop’s MapReduce imple-mentation where a job is divided into independent activities and processed separately.The Reduce functions processes the outputs from each of these tasks and produces theresults collectively that are then stored in HDFS. The fault tolerance is provided in Hyraxvia the fault tolerance mechanisms of Hadoop.

Another framework that uses application partitioning via a client-server techniqueis proposed in Jan et al. [23]. It discusses the Alfredo framework for distributed pro-cessing of applications between mobile devices and remote servers. In Alfredo, theapplication presentation layer or UI remains at the mobile device, but the data process-ing is done on servers. Alfredo framework is based on R-OSGi which is a middlewareplatform allowing applications to be distributed in multiple modules. R-OSGi is itselfan extension of the OSGi model, allowing apps to run on multiple virtual machinesinstead of on one machine. When a device requests some application, the application’sdetails and relevant services information is sent to client’s (mobile device) “renderer”which in turn generates the UI accordingly. The services are run in one of two ways:either on client side, or in server in which case an ad hoc proxy is created for the clientto access these services on the server. They have not discussed any risk managementspecifically.

The Spectra framework presented by Flinn et al. [24] is also a client-server archi-tecture where the mobile device offloads its processing to a server via communicationprotocols. One major drawback of this approach is that the services have to be prein-stalled over servers. Spectra is not suitable for very fast response applications, rather ittargets apps that can afford 1–2 s of delays. Spectra works by matching resource poolwith service requests to predict if the applications should execute locally or should beoffloaded to remote servers, for maximum efficiency. As with MAUI [25], for Spectra,the developers need to specify which modules can be potential offloading candidates orwhich components can benefit from offloading.

Another framework based on application partitioning techniques, much like MAUIand Spectra, is Chroma [26] that involves offloading individual RPCs to the cloud (orremote servers). Because of coarse-grained execution, less offloading overheads areinvolved. Chroma is a tactic-based remote execution system where the “tactics” are theuseful partitions of the system and these tactics vary in the amount of resources used andquality of apps. Like Spectra, it responds very quickly to the changing resource needsand in Chroma too, the developers have to manually specify the methods for offloading.It is also similar to Spectra in resource monitoring and predictions. However, like Spectraand Alfredo, there isn’t much discussion on risk management or fault handling.

The work by Satish et al. [27] has also proposed a client-server architecture basedon the Mobile Web Services Mediation Framework (MWSMF) for Mobile Enterpriseswhich consist of Mobile Hosts acting as service providers for client devices. Whenevera mobile device requests some services, these mobile hosts provide seamless integra-tion of requesting nodes with the enterprise. The focus of this framework is to provideproper quality-of-service and discovery mechanisms for successful adoption of mobileweb services into enterprise environments. MWSMF uses Enterprise Service Bus (EBS)technology to act as intermediary between the web service clients and the Mobile Hostswithin the Mobile Enterprise.

“9780471697558c07” — 2015/3/20 — 16:21 — page 160 — #8


The virtual mobile computing (VMC) framework is presented in Huerta-Canepaand Lee [28]. Like Hyrax, VMC is also based on Hadoop and supports virtual MCC.Mobile device “location’ is the basic aspect in this framework and much like conventionalMCC, it relies on neighbouring mobile devices. Because of this, the framework requirescontinuous (and fast) discovery and selection of suitable mobile devices for computationoffloading. Distribution of tasks is carried out via Hadoop. VMC framework doesn’tcurrently support risk management.

Another client-server based “Cuckoo” framework is presented by Roelof et al. [29]for computation offloading and targets the Android platform. The Cuckoo ResourceManager is responsible for identifying, selecting, and registering remote resources foroffloading. The authors have also presented the evaluation of this framework on real-lifeapplications.

Further frameworks based on application partitioning are proposed in Luo [30] andZhang et al. [31]. The Cloud-Mobile Convergence for Virtual Reality (CMCVR) frame-work [30] allows user-friendly convergence of mobile devices to cloud resources. Like afew others, this framework also works by using a mechanism of task partitioning; how-ever, the unique attribute is “scanning tree” which is a data structure for managing clusternodes at multiple levels. This framework targets mainly media applications and the mainfocus is dynamic provisioning of context-aware multimedia services and rendering com-parable to virtual environments for mobile users. The framework proposed by Zhanget al. [31] works by partitioning a single application into elastic components that can beexecuted dynamically. The elastic applications consist of one or more “weblets” (appli-cation partitions/components) that function independently but communicate with eachother. These weblets are platform independent. The application weblets and resourcedemands are monitored continuously by the “elasticity manager” which then uses thisinformation for decision making for where and how to launch the weblets. However,unlike CMCVR, the authors of Weblets framework have also proposed the authenticationand communication mechanisms for elastic applications.

VM Based The frameworks belonging to this category work by offloading the tasksto a server having preinstalled image of VM of the mobile device that initiated the task.Because of virtualization, such offloading is usually seamless but slower.

A cloudlet-based framework concept is proposed by Satyanarayanan et al. [14]. The“cloudlets” approach presented by authors tries to overcome the latency issues by mak-ing use of a local cloudlet that is connected to the Internet and comprising nearby mobiledevices. This technique reduces the latency and ensures speedy response. Instead of con-necting to a distant server for application processing over the Internet, this approachuses the local cloudlet and enables the mobile device to perform all processing at justone-hop latency. If for any reason the local cloudlet is unable to carry out the requiredprocessing and computations, the mobile device can go into a safe/failure mode anduse the distant cloud’s services for the time being. The cloudlets are designed to keeponly cached copies of data unlike clouds. The authors have also discussed the variousscenarios where such a cloudlet can be deployed in different ways. To enable simple(self-)management without compromising application diversity and potential, “transientcloudlet customization using hardware VM technology” is applied where the guest

“9780471697558c07” — 2015/3/20 — 16:21 — page 161 — #9


device’s software environment is hidden from that of cloudlet’s software infrastructure.The cloudlet customizes each job and later cleans itself up after each operation. Likethe cloudlet framework, the CloneCloud framework proposed in Chun and Maniatis [32]also addresses the challenges of mobile devices limitations via an “augmented execution”technique. In this approach, the execution is offloaded to the cloud containing/runningthe “clone” or replica of mobile devices’ (smartphone’s) software. Depending on thecomplexity of tasks, either full or partial execution is offloaded to the cloned cloud. Itgives the illusion of increased computational power to the users. The authors have alsodiscussed the categories of CloneClouds for possible augmented executions. However,though they have slightly touched upon the concerns relevant to each framework, but boththese frameworks have failed to address the potential risks, and hence, risk avoidance andmanagement.

MAUI framework [25] can be considered as an improvement in CloneCloud andcloudlet frameworks as unlike these, MAUI combines virtual machine migration withcode partitioning. This approach allows the required flexibility but within necessary lim-its of control. The developers annotate which methods to offload while programming;however, the decision to offload the methods is made at run-time on the basis of parame-ters like profiling information, connectivity, bandwidth, and latency. Unlike many otherframeworks, the decision making for offloading is done at the single method level insteadof complete modules. This approach can be prone to risks as in some situations as the sin-gle method-based decisions cannot represent the whole picture, and an effective decisionmaking should consider more than just one method at a time.

Mobicloud [33] is a mobile cloud framework that treats mobile devices as servicenodes to improve ad hoc networking operations, and increases the capability of cloudcomputing for securing mobile ad hoc networking (MANET) applications. This frame-work is also based on virtual machine technology where every mobile node is treated asa virtualized component and mirrored in cloud as one on more extended semishadowimages (ESSI). These ESSIs are not necessarily the same as virtual images and canbe either exact clones or partial clones. MobiCloud provides intermediary services foraccess management, security isolations, and risk assessment and intrusion detection forMANETs. In short, it works to provide security service architecture in multiple securitydomains. They have also presented virtual trusted and provisioning domain (VTaPD) toenable isolated information flow and access controls across multiple virtual domains.However, again we see a lack of potential risks discussion.

7.2.2.2 Communication Protocols. Three types of common network proto-cols in MCC are Wi-Fi, Bluetooth, and 3G/4G. These terms refer to the way the mobiledevices can connect to the Internet, and to each other. Each of these communication pro-tocols have their own benefits and drawbacks, and this subcategorization can be useful inunderstanding whether if/how the use of some specific communication protocol withina framework can possibly contribute toward the risk of using a relevant framework.

Wi-Fi Most of the frameworks discussed earlier such as Hyrax, Alfredo, Spectra,Chroma, VMC, Cuckoo, Cloudlets, CloneCloud, MAUI, and MobiCloud have suggestedWi-Fi as a communication mechanism. Although Wi-Fi provides improved performance

“9780471697558c07” — 2015/3/20 — 16:21 — page 162 — #10


in terms of reliability and increased bandwidth and data rates, the frameworks usingWi-Fi as their embedded communication protocol should be comparatively more proneto risks of security threats and interference due to external objects. Also, the Wi-Fi con-nectivity is dependent on the availability of hotspots, unlike 3G which can be connectedfrom anywhere. Related to Wi-Fi, is also Wi-Fi Direct for mobile to mobile communica-tion which has higher bandwidth than Bluetooth and a much longer range, tens of metresor more, compared to the often noted 10 metres of classic Bluetooth. Wi-Fi can also beused for tethering purposes.

3G/4G 3G/4G provides users with more consistent networking conditions and bet-ter security; nevertheless, it takes its toll in terms of slower data transfers, increasedbattery drainage, and higher costs. For MCC, it provides users with advantages of connec-tivity from anywhere, anytime; however, its battery consumption rates and the responsetimes make it a secondary choice for mobile cloud developers. The frameworks cur-rently employing this approach have generally suggested using this in conjunction withWi-Fi for optimum access and connectivity. None of the frameworks have used 3G/4Gas their primary communication medium, but some of the frameworks like MAUI andCloneClouds have used this in their experiments to compare performances of differentapproaches.

Bluetooth Despite the advantages of Bluetooth, like ease-of-use and no-costwireless access, Bluetooth is not a very popular medium for widespread Internet con-nectivity because of lower ranges and potentially more susceptible to interference. Withan exception of Alfredo, none of the surveyed frameworks have utilized Bluetooth exclu-sively and the only mention we find is within the Cuckoo framework as it uses Ibismiddleware which can be run with any of the above discussed communication proto-cols. Bluetooth low energy (BLE) is more energy efficient than Classic Bluetooth, andgenerally more energy efficient than Wi-Fi-based protocols though with lower bandwidthand range than Wi-Fi protocols.

7.2.2.3 Risk Management Strategies. An interesting thing to note is thatwith an exception of Zhang et al. [31], none of the frameworks have discussed therisks associated with their use. However, we see some mentions of fault tolerance andrisks in Hyrax and MobiCloud, respectively. Hyrax framework implies fault toleranceimplementation as it’s based on Hadoop and uses the same fault tolerance mechanism torecover from failures. However, no explicit mention of relevant risks and risk manage-ment is given. Similarly, the MobiCloud framework has proposed using context-awareinformation for aiding risk assessment and intrusion detection. Despite this, no thoroughdiscussions of risks and risk management have been provided and little attempt has beenmade to generalise these beyond the scenarios described.

7.2.3 Discussion

As stated earlier, except for the framework proposed by Zhang et al. [31], none of theframeworks have discussed the risks associated with the use of that particular framework.

“9780471697558c07” — 2015/3/20 — 16:21 — page 163 — #11

RISKS IN MCC 163

To analyse each framework for inherent risks, we have tried to identify the risks asso-ciated with each aforementioned aspect, as all frameworks belonging to same categoryshare, on most part, similar risks.

7.3 RISKS IN MCC

This section first explores the risk management concept in general and provides an insightinto the “risk” and “risk management” definitions and the basic steps in any risk man-agement methodology. Then, we present a survey of risks inherent in cloud and MCCdomains. Identification of MCC risk factors is needed to fully understand what types ofrisks are being faced in the MCC domain and what is the pattern and intensity of theserisks, at present. Recognizing the loopholes in current MCC technologies and analyz-ing the identified risks for their causes and solutions, could be a key starting point toexplore the possibilities of a risk management system that would be able to deal withthese risks.

7.3.1 Risk Management

Risk can be defined as the “possibility” of something happening that can affect the out-come negatively; it’s measured in terms of “probability” and “impact” [34, 35] andusually derived by formula: RE = P(O)× L(O)

where RE is risk exposure, P(O) is probability of negative outcome, and L(O) is theloss or impact of that negative outcome [36].

This formula can be taken as a standard risk calculation device, as the similar formu-las are being used in other domains as well for calculating respective risks, for example,finance, insurance, and health. The presentation of these formulas could be slightly dif-ferent in different domains, but the basic logic is similar, comprising two basic elements:probability and impact [37–39].

Risk management is the process of managing risks in a given system with the aid offormal processes, methods and tools, for example, providing a disciplined environmentfor continuously analyzing the risk factors, calculating the relative importance of eachrisk item and designing strategies to deal with these risks. Any risk management sys-tem usually comprises these basic activities: (i) risk identification, (ii) risk analysis, (iii)treatment, and (iv) monitoring and control.

Risk identification refers to identifying the risk factors, that is, proactively diagnos-ing what could potentially go wrong. Risk analysis includes calculating the likelihoodand impact of identified risk factors and its prioritization, whereas risk treatment refersto exploring the possible treatment options for prioritized risk factors and detecting besttreatment solutions. Monitoring and control, on the other hand, is a continuous activitycarried out throughout the risk cycle; it involves overall risk planning and monitoring forany change in status or priority of identified factors and to keep a look-out for any newrisks surfacing. Such proactive decision making reduces the system’s exposure to risksand minimizes the potential loss from these risks. A formal risk management processprovides an auditable system for risk mitigation and contingency [40].

“9780471697558c07” — 2015/3/20 — 16:21 — page 164 — #12


As with any new technology, cloud computing also has some associated risks thatneed to be managed for successful and efficient utilization of clouds. The complexityof cloud infrastructure poses many serious risks, and there is an ever-increasing needfor managing these risks effectively and proactively. Shipley [41] holds the view that atpresent cloud computing is implemented without any proper risk management. The sameis true for MCC. In MCC, the intricacy of the system makes it further risk prone.

In the previous section, we have surveyed and analyzed a sample of mobile cloudframeworks and models. In this section, we will be discussing the multiple risk factorsthat have been identified so far within cloud and MCC domains. The analysis of frame-works on the basis of their ability to deal with prevalent risks will then be presented inthe following sections. We will also be categorizing these risk factors according to thenature of individual risk items. This section has been divided into two parts: the first partwill present the risks that are common to both cloud and MCC domains. The second partwill present the risks specific to MCC area.

7.3.2 Risks in Cloud Computing (Inherited by Mobile Clouds)

Despite the benefits of cloud computing and its potential to improve efficiency and pro-ductivity, there’s some reluctance to its usage. It’s because of the fact that it’s a relativelynewer technology and there’s no formal mechanism or standards for managing the risksassociated with cloud usage. Many of the identified risk factors (Sections 7.3.2.1–7.3.2.3)are common to both cloud computing and MCC systems. This is because the basicattributes of both domains are similar; as MCC itself is based on the fundamentals ofcloud computing technology, so in addition to the basic characteristics, MCC inheritsthe risks of the cloud computing domain as well. This phenomenon makes it importantto explore the risks in the cloud computing domain as well to have coverage of possiblerisk factors that can affect MCC systems.

There are a number of sources in the literature that discuss the different types ofcloud computing risks. These risks factors are classified into three categories: securityrisks—including all the risks related to the security aspect of cloud computing networksand data, performance risks—including the risks related purely to performance attributesof a cloud infrastructure, and legal/environmental risks—including risks related to legis-lation and the operational environment of cloud providers and users. The risk factors arerepresented in separate tables according to their categorization. Sections 7.3.2.1–7.3.2.3represent the risks common to both the cloud computing and MCC systems, whereasSection 7.3.3 discusses the risks specific to MCC.

7.3.2.1 Security Risks. The typical characteristics of cloud environments suchas abstraction, virtualization, shared resources, and ad hoc nature make it very difficultto implement proper security and safety mechanisms. Also, with cloud the users usuallyhaving no knowledge or control over these mechanisms makes the situation even worse.As with any other network, “security and privacy” is one of the biggest risk factor thatneeds proper consideration and management. This category comprises the risks relatedto security and privacy aspects of the infrastructure from both the user and provider’sperspective; this includes the risks to communication networks itself and the data.

“9780471697558c07” — 2015/3/20 — 16:21 — page 165 — #13

RISKS IN MCC 165

• Unauthorized access: As most of the data storage and processing is done exter-nally, it’s difficult to implement physical and logical controls over access rightswhich bring with it the risk of unauthorized access. This issue is inherent in allremote and distributed systems, but the abstraction and virtualization of resourcesin cloud environments makes it even more difficult to deal with [1, 33, 41–49].

• Security defects in technology itself : Failure to implement security controls thatare essential to protect customers’ assets is yet another risk factor that needs tobe managed properly. At present, the cloud providers tend to keep their functionalprocedures and policies a secret, and so there’s no way of knowing the level ofsecurity provided to its users by the vendor [41, 50].

• Security defects in Web services: Cloud applications are usually provided as ser-vices over the web. However, unlike other Web-based applications, the cloudapplications are not user specific, and hence present serious vulnerability issues.Potential loopholes in security of Web services pose a potential risk for cloudcomputing. Moreover, the attackers can also use the weakness of web applicationsecurity for gaining access to other users’ data as well [50].

• Leak of customers’ information: The risk of losing customer’s private information(i.e., passwords and profile information) to unwanted attackers is yet another riskshared equally by the cloud vendors and users [41, 51].

• Leak of proprietary information: This risk of losing confidential proprietary infor-mation is very serious, especially if the organisations involved are governmentor other national agencies. A survey given by Shipley shows that percentage ofcommercial organisations reluctant to use cloud computing for fear of losing pro-prietary information is 28%. Understandably, this percentage will be much higherfor security critical organizations [41, 45, 50].

• Data location: In case of clouds, the cloud users are never sure of the exact locationof their data storage. This makes data location an important security issue. Someof the authors perceive data location as major performance and legislative risks aswell [1, 42, 43, 47, 48, 50].

• Physical location of the system: As with data, the physical location of the systeminfrastructure (e.g., servers) is not made public by cloud providers. Being unawareof the location of your system can sometimes become a security issue [1].

• Data segregation/isolation: Cloud computing is based on the notion of sharedresources; multiple users’ data can be stored on the same servers, making datasegregation one of the biggest risk factor faced by cloud users. It makes the sys-tem prone to the risks such as unwanted access to one user’s data by another [33,42, 43, 47, 50–52].

• Data recovery: Failure to recover the data and services properly after some disas-ter can be a problem. It’s different from “faulty backup” risk factor as sometimes,despite proper backup mechanisms, data are lost due to other legislative or envi-ronmental reasons. For example, consider a scenario where many cloud providers(primary) themselves rely on cloud services of other bigger providers (secondary).In such cases, if the intermediate primary cloud provider goes out of business, itmight not be possible for its users to retrieve their data as they were not the direct

“9780471697558c07” — 2015/3/20 — 16:21 — page 166 — #14


customers of the secondary cloud provider and hence have no access rights on thedata hosted by the secondary cloud provider, even though they are the owners ofthat data. Such legislative gaps can result in data losses. Similarly, if the secondaryservice provider goes down, it can also cause the service to become unavailable,creating risks for data recovery [42, 43, 48].

• Weaknesses of browsers: Cloud computing uses Web-based browsers for serviceprovisioning. The weaknesses of browsers is one of the security risks associatedwith clouds as any attacker can use these weaknesses to get access to confidentialdata [43].

• Data security: Failure to have an effective data security model in place can bedisastrous to any cloud vendor or users. Sharing of resources, unawareness ofdata location, and malicious attacks to the infrastructure from inside or outside thecloud provider organisation all contribute to the risk of data security. In additionto usual network-related security issues, abstraction and virtualization of cloudinfrastructure make the data even more vulnerable [13, 50, 51, 53].

• Network security: Network security is an important risk factor because if the net-work is compromised, the whole of the cloud infrastructure will be compromised.Networks can be attacked in many ways (e.g., spoofing and sniffing), and anyloophole in network security can bring the whole cloud to risk [50, 54].

• Data integrity: This refers to the accuracy of data that is managed by the clouds.Issues of shared storage can affect data integrity adversely, making it more riskprone [45, 49, 50].

• Faulty backup mechanisms: Faulty data backups and backup procedures can causeproblems. The cloud is typically a pay-as-you-go service which makes backupmore risk prone as it has to keep up with ever changing user environment con-tinuously. Any minor loophole can be disastrous to the security of the systemdata [13, 50].

• Insecure/incomplete data deletion: Cloud computing is utility-based service, andevery time a user leaves the system, the cloud infrastructure should be able todelete that user’s relevant data and reallocate resources to other users. However,the failure to delete the leaving user’s data properly or completely is a risk that cangive rise to unauthorized access of one user’s data to other [51].

• Natural disasters: It’s a risk as natural disasters can cause interruption to cloudservice availability, for example, Amazon EC2 June 2009 incident. It is oftenuncontrollable, but there should be proper risk management planning to deal withsuch situations [43].

• Malicious insider: Cloud service providers usually place multiple users’ data inone place. This can be very risky as any malicious insider can just attack one singlelocation to gain access to thousands of customers/users’ data. The situation willbe even worse if the customer is an organization as by attacking a single point, ahacker can access complete organizations’ information [43, 52].

Most of these risk factors are similar to those in general communication networks andapplications risks. This is mainly because the cloud infrastructure is itself based on

“9780471697558c07” — 2015/3/20 — 16:21 — page 167 — #15

RISKS IN MCC 167

Web and wireless network technology and all the inherent risks of those domains, arepropagated to cloud infrastructures as well. However, owing to the scalable and flexiblenature of clouds, these risks are further magnified in the cloud computing domain. Someof these risk factors may be common to other categories as well; it’s because differentauthors have slightly different perspectives regarding these risks, with some consideringa factor as security risk and others may perceive it as performance or legislation risk (e.g.,for some “data location” is a security risk while others perceive it as legislative or perfor-mance risk). To cover all different perspectives, we have stated these factors sometimesin multiple risk categories, with each representing the related school of thought.

7.3.2.2 Performance Risks. This category represents the risks related purelyto performance attributes of cloud infrastructure. This comprises the risks of reliabil-ity, usability, and efficiency, which can directly affect the performance (i.e., quality andeffective execution) of the system. Application or system performance is perhaps themost important risk, every cloud provider and user should consider. This is the only riskthat is mutually dependent on all other risk factors (or most of it). Occurrence of any riskfactor is a risk to effective performance of system and application.

• Features and general maturity of technology: Although getting increasingly pop-ular, MCC is still in its early stages and needs maturity in terms of processes andfunctionality. For the same reason, the mobile cloud-based services/applicationsare also still in their premature stages. This immaturity of MCC technology andservices is comparatively more risk prone than other traditional computing plat-forms. Any weaknesses or loopholes in technology are a major risk to both theproviders and customers equally [41, 43, 50].

• Data location: In MCC, the data isn’t located on the same premises as the organi-sation itself. Also, the location of data isn’t known to the users. In such case, thereis always the risk that data access and also the transfer and processing would bemore complex and more time consuming, than if it’s located locally [43, 50].

• Data segregation/isolation: Besides being a security and privacy issue, data iso-lation is a risk to application performance as well. If the data isn’t properlyisolated, there is a risk that user application is unable to access required dataefficiently, resulting in delayed processing, and so its performance would bedegraded [33, 42, 43, 50, 51].

• Portability: Application running on one platform may not give the same perfor-mance on other platforms. A basic aspect of a mobile cloud system is that it mightbe made up of or used from multiple heterogeneous mobile devices, and hencevarious different platforms. Also as MCC is still in its early stages, there aren’tany standards for data formats or interfaces, which makes portability a risk to theperformance of MCC services and applications [45, 55].

• Data availability: It’s a real-time performance risk, for example, if required dataaren’t available when user need them [1, 33, 45, 47, 50].

• Service availability: Timely and continuous availability of mobile cloud services isan important risk factor that needs much consideration and proper risk planning.

“9780471697558c07” — 2015/3/20 — 16:21 — page 168 — #16


Any disruption to the infrastructure can cause the cloud to become unavailable[1, 13, 45, 47, 51, 53].

• Reliability: Risks of service downtimes, faults, and failures [13, 45].• Resource exhaustion: In the mobile cloud environment, resources are at risk of

exhaustion. The reason is that in clouds, there are usually more users per applica-tion and more applications per server. The scalability aspect of the cloud furthercontributes to this risk; any problems in service mechanisms (i.e., inappropriatemodelling of resource usage or inadequate resource provisioning) can also leadto resource or memory exhaustion which in turn can substantially degrade systemperformance [51].

• Complexity: Elasticity, abstraction, sharing of resources and its ad hoc naturemake cloud computing a complex infrastructure. There would be more users perapplication, and the servers might be hosting more applications/server. Increasedcomplexity of this infrastructure makes it more susceptible to risks that affect theefficient performance of the system.

• Network constraints: Wireless and mobile networks are comparatively more proneto risks of disconnections, limited bandwidth, and high latency. Any attempt toimprove these figures will put a strain on power resources [11].

Some of the factors cited here are similar to those of security risks. As mentioned earlier,this is because different authors can perceive a single risk item in multiple ways, accord-ing to their own values. Some of the factors like portability and reliability are includedin the performance risk category as these factors can directly affect the performance ofa cloud computing infrastructure.

7.3.2.3 Legislative/Organizational Risks. These are the risks related to leg-islation, business organisation, and the operational environment of cloud providersand users.

• Vendor lock-in: Due to lack of standards it’s difficult to switch from one providerto other; or worse, to move it back in-house (e.g., in case of price increase or poorservice quality) [41, 48, 51, 53, 56].

• Business viability of provider: If the cloud provider goes out of business or ter-minates cloud services, it’s a big risk for customers as they are exposed to loss ofservice, loss of investment, and this can lead to loss of their own customers andusers [1, 41, 42, 51, 56].

• Unpredictable costs: Cloud services are usually pay as you go, and even a slightchange in costs by provider can affect the total costs of usage tremendously. Therisk of unpredictable costs can be managed via thorough negotiation before gettinginto contracts [41, 56].

• Regulatory compliance: As the users’ data are located elsewhere, there is a riskof regulatory compliance between cloud providers and customers. As the usersare not managing their own data and services, there’s a risk that their own data

“9780471697558c07” — 2015/3/20 — 16:21 — page 169 — #17

RISKS IN MCC 169

won’t be compliant with their organizational policies and standards. Moreover, thecertifications (current and future) will also be potentially put to risk with migrationto the cloud [42, 43, 45, 47, 51].

• Data location (risks of changing jurisdictions): As the users are unaware oflocations where their data are stored or handled, they usually have no controlover their own data. Loss of governance is a business risk that needs consider-ation [41–43, 47, 50, 51]. Some other authors have presented the more or lesssimilar concept as “Data-access risks” due to changing jurisdictions. It’s verymuch possible that customers’ data are located in some other legal jurisdiction,making it difficult to apply one’s own state laws or regulations on data; for exam-ple, some states have restriction on certain types of data and if your data are locatedon servers in that area, it could become a problem [45]. Moreover, the data loca-tion can also present the risks of “auditing,” due to changing jurisdictions. Datastored in other countries or jurisdictions will be subject to the laws and conventionsof the hosting country which can be a risk for data auditing. Also, if someone hasclaims against the cloud service providers, it could become a risk too with multi-ple state laws involved regarding data and service. This data storage over multiplejurisdictions and inherent risks of ignorance about other states’ laws makes it avery risky legal activity [48, 51].

• Investigative support: Investigating inappropriate or illegal activity may be impos-sible in cloud computing, because logging and data for multiple customers may beco-located and may also be spread across an ever-changing set of hosts and datacenters. This makes it very risk susceptible [42, 45, 48, 53].

• Lack of organizational learning: Your staff isn’t managing your technology/data,and there is a risk that cloud customers won’t be able to train their own staff fortheir own data and services [56].

• Business risks from co-tenant activities: Cloud resources are shared betweenmultiple customers and that means there is a risk that some malicious activ-ity by one customer can affect other customers as well who are sharing sameresources, which can, in turn, be a huge risk to the reputation of innocent tenantorganizations [51].

• Licensing risks: The ad hoc nature of the cloud makes it difficult to audit licencingcompliances [51]

This category of risks is very important owing to the remote processing and remotestorage characteristics of cloud computing. In cloud computing, the location of serversand data is usually unknown, and this means that some or all of the users’ data couldbe in some place that is outside of their own jurisdiction. This poses many risks toalready fragile cloud computing environments. Such risks can be as extreme as loss ofcontrol over data or in the worst case, loss of data completely in matters of changedjurisdiction or disputes with providers. Moreover, the auditing or any legal disputescould also be complicated in cloud computing systems because of the aforementionedreasons.

“9780471697558c07” — 2015/3/20 — 16:21 — page 170 — #18


7.3.3 Further Risks in MCC

As discussed earlier, MCC poses some additional risks to conventional cloud comput-ing. The following segment presents the risk factors that primarily relate to the MCCenvironment:

• Resource limitation of mobile devices: Mobile devices are at risk of resourceexhaustion if mobile cloud application development fails to consider this fac-tor; this risk is even greater with smaller mobile device environments andsensors [10, 11].

• Application mobility: An application may have to continuously switch betweendevice and cloud (or different local clouds), every time the mobile environmentchanges. If not managed properly at development and later levels, it can causeinterruptions as well as put tremendous constraint on mobile device resources aswell [12].

• Device mobility: This presents the risks of connectivity to correct base stations andwith other mobile devices in the cloud [10, 11].

• Portability: A mobile cloud is made up of varied mobile devices hence posing arisk for portability, that is, an application running on one device platform may notbe suitable for other devices [11, 12].

• Metering risk: In a mobile cloud environment, a single user will be connecting anddisconnecting multiple times, thus bringing the efficient “metering of services” tohigher risk levels [57, 58].

• Context-awareness: The cloud services need to be aware of a user’s current contextand adapt to it automatically. Failure of these services to self-adapt is a risk toefficient and effective MCC [11, 12].

• Physical risks: Portable devices are more prone to physical risks like damage, andtheft than desktop PCs or standalone machines [11].

7.3.4 Discussion

A fact worth noticing here is that most of the literature has covered what we can call“system-level” risks. These mostly include the risk factors prevalent in high-level cloudsystems and communications networks. The low-level risks at mobile cloud applicationdevelopment or frameworks level have not been effectively discussed.

Furthermore, irrespective of the huge lists of risks prevailing in MCC, relativelylittle attention has been given to managing these risks in comparison to other issues likesaving on resources or saving on costs of mobile devices. This is in contrary to othermatured IT systems and technologies. In addition to identifying the risks, it’s crucial todevise strategies to appropriately manage and mitigate these risks. A better way wouldbe to deploy a proper risk management process in place before moving into the cloud soas to proactively identify, monitor, assess and manage the risks in order to avoid them ormitigate them.

“9780471697558c07” — 2015/3/20 — 16:21 — page 171 — #19

RISKS IN MCC 171

3

2

1

6

4

5S3

AnekaUser QoS based scaling ofcompute resources

Dynamic scalable rutime

ECG Data

analysis

software

Large number of users

Use

r re

qu

ests

IaaS

PaaS

SaaS

A User ECG sensorEmbedded bluetoothenabledData processor &communication module

ECG Sensor moduleUser requests

Blu

eto

oth

co

nn

ectivity

Wireless/ mobile 3G network

Amazonweb services

7:57

Figure 7.1. ECG data analysis software as service [59].

A detailed analysis of the current situation in MCC from the risk perspective is givenin the following sections.

7.3.5 An Illustration of Risks for a Real-Life Application

To see that how these risks can affect any application, we have taken an example ofa mobile cloud-based e-health application proposed in Pandey et al. [59].1 They havedesigned a scalable real-time health monitoring and analysis system and have used anelectrocardiogram (ECG) analysis system prototype as their case study. This prototypesystem collects patient data (e.g., pulse and heart beat rates) through an ECG sensordevice attached to a patient’s body. This sensor device transmits the data to the patient’smobile device via Bluetooth without manual intervention. A client software on a mobiledevice then transfers the data to an ECG analysis Web-service hosted on a cloud comput-ing stack, either using Wi-Fi or mobile device’s 3G network (Fig. 7.1). They have usedthe Aneka cloud computing platform and Amazon’s S3 storage services. The softwarethen analyses the patient’s data, generates results and appends the latest findings to thepatient’s medical record. Depending on analysis, the data could be sent to patients, doc-tors, or emergency services as needed. When we use this application for demonstratingthe inherent risks and their consequences, we see that its impacts could be disastrous

1Note that this is not a critique on ECG analysis prototype as understandably, risk is not their research focus.

“9780471697558c07” — 2015/3/20 — 16:21 — page 172 — #20


(that can even cost a precious human life) if no proper risk management is implemented.It’s a time-critical application that requires extreme levels of reliability for the resultsthat are being generated and needs to be always up to date with the latest findings. Anoverview of how different risks can impact the application and users is given in Table 7.1.

TABLE 7.1. Overview of risk factors in mobile clouds

Risk category Risk subcategory Risk factors ECG analysisapplication potentialvulnerabilities(data= patient data,service=ECG analysissoftware service)

Security andprivacy risks

Network securityrisks

• unauthorized access• security defects in

technology• security defects in Web

services• weakness of browsers• network security• physical location of

system infrastructure

Conformity (because ofmanipulation), sabotage(malware, virus, worm),data integrity, serviceavailability, credibility

Data security andprivacy risks

• leak of customerinformation

• leak of proprietaryinformation

• data location• data segregation/isolation• data recovery• data security• data integrity• insecure/incomplete data

deletion• faulty backup mechanism

Privacy, conformity,sabotage, identity theft,customer confidence,business loss/competitiveedge, espionage, wrongdiagnosis, record mix up,data integrity, data loss

Others • natural disaster• malicious insider

Performancerisks

Availability andreliability risks

• data location• data segregation/isolation• data availability• service availability• reliability

Faulty data collection,wrong analysis,prolonged responsetimes, correct andefficient data availability,loss of business, lifethreat

“9780471697558c07” — 2015/3/20 — 16:21 — page 173 — #21

RISKS IN MCC 173

TABLE 7.1. (Continued)

Service andapplicationusabilityrisks

• features and generalmaturity of technology

• complexity• portability• resource exhaustion

Misdiagnosis, ease of use,time critical, mobiledevice limitations,compatibility fordifferent devices,compatibility of deviceto SaaS, robustness, andload balancing

Legal/environmentalrisks

Organisational/businessrisks

• vendor lock in• business viability of

provider• unpredictable costs• data location• investigative support• lack of organisational

learning• business risks from

co-tenant activities

Varied costs due todifferent usage patterns,lack of organizationalexpertise (softwaremalfunction can gounnoticed, wrong use ofapplication, clouddependence), regulatorycompliance (less optionsfor legal support)

Environmentalrisks

• natural disaster Data and serviceavailability, life risks

Different legaljurisdictions

• data access risks due tochanging jurisdictions

• auditing risks due tochanging jurisdictions

• licensing risks

Reduced support for legalauditing, ignoranceabout other countrieslaws, different lawswithin differentjurisdictions

Mobile cloudcomputing risks

• Resource limitationrisks

Performance,incompatibilities,erroneous or slowresponse, memoryexhaustion, applicationcrash, limited power,Bluetooth connectivityissues, ECG monitoringdevice and mobile phonecompatibility, patient’s3G plan limitations

Mobility risks(betweendevice andcloud)

• application mobility• device mobility

Patients won’t be static,networking conditionsdifferent for differentgeographical locations,networking conditionsdifferent for differentusers within samegeographical regions

(Continued)

“9780471697558c07” — 2015/3/20 — 16:21 — page 174 — #22



• Portability risks (betweenmultiple devices)

Hugely varied mobiledevices (of patients)

• Metering risk Usage patterns andfrequency different fordifferent patients

• Context-awareness Patient’s location canimpact data collection (ifs/he at home or gym)

• Physical risks Patient can damage/loosethe monitoring device ormobile phone

7.3.5.1 Security and Privacy Risks. The authors have tried to reduce the risksof security breaches to the application data by implementing encryption mechanismswhich are expected to minimize the risks of unauthorized access to the data. For thatpurpose, they have suggested using a third-partyp key infrastructure which will make itdifficult for attackers to access the confidential data. When it comes to privacy and leak ofinformation, we see that though being an important aspect, it’s not as critical as reliabilityand efficiency in this scenario. However, any loop holes in security can sabotage theoverall reliability and efficiency of the whole setup. Risk factors like unauthorized access,security defects in technology, Web-services or networks, or malicious insider mightmean attackers can access patient’s personal information, misuse it, or even manipulateit. In this particular case, it could mean the integrity of patients record is at risk andany tampering or mishandling can lead to false alarms for emergencies or no alarmswhen needed. In case of alarming results for a patient’s ECG analysis, the patient needsimmediate medical assistance; however any meddling in patient’s data can lead to his/herdeath. Also, in case if the patient’s ECG is normal, a false alarm can be sent to emergencyservices that could have huge financial impacts and the availability of emergency servicesfor genuine cases can also be endangered.

In addition to the risks of unauthorized access, there are certain other risk factorsthat can meddle with the efficient processing as well and can’t possibly be handled byencryption mechanisms only. Risk factors like faulty backups, data integrity, and inse-cure/incomplete data deletion can pose a risk to the availability of patient’s medicalhistory and records for doctors. Any weaknesses in backup and data recovery procedurescan mean longer times for patient diagnosis or even loss of medical records. The risks todata integrity can be very grave in such time critical applications when a human life isat risk. The privacy risk from insecure/incomplete data deletion can be of less impact inthis case unless attacked by malicious attackers.

7.3.5.2 Performance Risks. For a time critical application like this, applicationand system performance is an aspect that needs to be critically assessed for risks. Theonly quality-of service-(QoS) parameter the authors have selected is shorter response

“9780471697558c07” — 2015/3/20 — 16:21 — page 175 — #23

RISKS IN MCC 175

times. To ensure fast responses, the authors have implemented dynamic scalable runtime(DSR) module that continuously monitors the average response time and if it’s above athreshold value, the DSR assigns more resources until response time decreases. How-ever, besides short response times, any degradation in other performance attributes (e.g.,reliability, efficiency, and availability) can also lead to consequences ranging from erro-neous or unavailability of patient records, to the possible loss of human life. Also, asthis application intends to target a huge population of cardiac patients worldwide, anydefects in robustness or load-handling will be a risk to efficient performance. Despitebeing increasingly popular and evaluated, MCC systems and cloud based e-health appli-cations like this still need maturity in terms of features of technology. Any weakness intechnology or feature-sets can be a risk to QoS, and hence customer satisfaction, andbusiness objectives as well. For example, in this particular scenario the target populationis cardiac patients, which on most part, comprises people above the ages of 55 years,2

and it will be much more difficult for these people to get accustomed to complex applica-tions and features than younger people. If patients don’t find the application easy to useor of good quality or cost effective, they might misuse the device or may be hesitant inusing it; a fact that could have serious implications on ROI. Also, each error contributesto increased dissatisfaction in customers (in this case, doctors and patients) and can be arisk to patient health as well.

Data isolation and segregation from the security perspective is a risk to privacy andloss of confidential patient information and records. However within the performanceperspective, faulty data deletion or improper data isolation can cause a mix-up of patients’records and can result in wrong diagnosis for the respective patients. Data location onother hand can affect timely availability of patient data but its probability is very low.

As this application targets wide patient population, the mobile devices involvedcould be hugely varied, which poses a risk of application portability for multiple devices.The analysis software is hosted by cloud services, but the client software resides onmobile devices itself, and so it should be extensively evaluated for processing efficiencyfor multiple devices and mobile platform. Any problems with portability could mean apercentage of patients being unable to use this application.3 Also, the compatibility ofmobile devices with cloud services should also be risk free for otherwise the use of thisapplication will be limited to a subset of patients only.

Data availability, service availability, and reliability risk factors are extremely crit-ical owing to the nature of this application and for reasons discussed earlier. Any errorin these aspects can be life-threatening for some patients with heart risk. Applicationcrashes and downtimes can be very dangerous. For entertainment applications, we cantolerate some risk but in applications like this ECG monitoring and analysis system, suchrisks are intolerable and should be managed very carefully. Also, resource exhaustion onthe mobile can directly affect availability and it’s a possibility in this scenario.

2Can be less than 55 years as well, but major percentages belong to age category over 55 years http://www.worldheartfailure.org/index.php?item=75, http://www.rosscountyhealth.com/brochures/MensHeart.pdf3 Age of patients is already a risk factor with respect to smartphone usage, as elderly find smartphones muchtoo complex for their comfort level.

“9780471697558c07” — 2015/3/20 — 16:21 — page 176 — #24


7.3.5.3 Legal/Environmental Risks. Most of the legislative risks don’t havedirect noticeable impacts in this application; however, unpredictable costs, lack of orga-nization learning, and licensing risks are applicable. The ECG data analysis is providedas a service to users who can pay on per-analysis basis. However this software service isitself dependent on cloud based storage provided by Amazon. As there’s flexibility forpatients to use this application with different frequencies, depending on health conditionand doctor’s recommendations, keeping tracks and auditing licensing compliances willbe very complex. Also, because of similar reasons (different frequency and usage pat-terns), the costs won’t be fixed and can be unpredictable making it difficult for both thepatients and doctors to keep track of their relevant costs. Similarly, the lack or organisa-tional learning can create cloud dependence for a long time and also it will be difficultfor employees to gain expertise on technology usage and analysis.

7.3.5.4 Mobile-Specific Risks. Like any other mobile application, resourcelimitation of mobile devices is a risk that can affect the performance of the ECG appli-cation negatively. Considering the target user population (i.e., cardiac patient and peoplewith heart risks worldwide), it’s safe to assume that a large number of different typesof mobile devices will be used. In addition to general issues of possible incompatibil-ities, limited power resources, and limited memory, there could be some devices withcomparatively slower response times due to reasons like memory exhaustion (becauseof large data files stored on a patients mobile OR if a patient owns a mobile phone withlesser memory capacity). Also, some mobile devices face Bluetooth connectivity issues(e.g., iPhone connectivity with Nokia and Sony Ericson mobile devices via Bluetoothmight not be reliable, and some Nokia devices might have connectivity issues with otherdevices like HP notebooks). Asking every patient to buy a compatible smartphone isn’tan attractive option; likewise, usage of an application shouldn’t be restricted to own-ers of specific mobile devices. It can be fatal if application developers fail to cater tothis problem. Another aspect that needs consideration is device mobility. As the patientswon’t be static to one geographical location (a single patient can be frequent traveller),so the devices connectivity to cloud and availability of services accordingly is very riskprone. Also, context-awareness needs to be considered in analysis as for this particularapplication; it can tamper with the results falsely. For example, consider a scenario if thepatient decides to check his/her ECG after a workout at gym or elsewhere, his/her heartand pulse rate would be very different from his/her normal rates. If the analysis softwarefails to recognize this, it can generate false alarms and hence a wrong diagnosis. Caseslike this can negatively affect users’ confidence in the system and can eventually lead toapplication failure.

7.3.6 Understanding Risks with Near-to-Life Scenarios

Let’s consider an example to see that how ever-changing mobile environments caneffect overall risk situation of a mobile cloud system. To understand these risks andtheir impacts better, let’s consider some scenarios briefly. Consider an elderly heartpatient John as a user of the above mentioned ECG analysis app. He must wear the

“9780471697558c07” — 2015/3/20 — 16:21 — page 177 — #25

RISK MANAGEMENT FOR MCC 177

heart-rate-monitor all the time (preferably) so that his condition could be monitored byhis doctors. Now consider this person John in following scenarios:

Scenario 1: John is working in a well-developed city that has many offerings interms of security and mobile and network services at better speeds.

Scenario 2: John is travelling to his home village that’s not very well developed, andhence offers fewer options for network and mobile services.

Scenario 3: John goes to a third-world country for an official assignment for 2 years.The crime rates in that country are noticeably higher than his own country. Thequality of mobile devices isn’t very good and the most commonly used mobiledevices are different from those in X’s home country.

As mentioned earlier, to calculate risk we need to determine the probability andimpact of that risk in the given mobile cloud instance. Now let’s consider the effect ofthese seemingly innocent scenarios on a few risk factor values (Table 7.2).

In Samad et al. [60], these risks and calculations are discussed in detail with exam-ples and its highlighted that how the current context is important in assessing risks inmobile cloud systems and how any slightest change in context parameters can influencethe whole risk picture.

7.4 RISK MANAGEMENT FOR MCC

This section analyzes the MCC from risk perspective. We have surveyed multiple MCCframeworks and also identified the risk factors currently prevalent in this domain. Wehave also demonstrated using a real life application that how different risk factors canaffect application processing. This section provides an analysis of the surveyed frame-works from a risk perspective and discusses a current situation in the MCC domainregarding risk management.

7.4.1 Analysis of MCC Frameworks from the Risk Perspective

As mentioned previously, four of the frameworks have been selected for in-depth analysisin this research for better understanding of current situation. These are (1) hyrax, (2)Cloudlet, (3) Clonecloud, and (4) Amazon’s EC2 for mobiles. This selection is solelyon basis of (our assumption of) most widely used commercially and popularity amongresearch community.

The Hyrax framework is designed for computations that need only the data on mobiledevices and usually requires no interaction with traditional servers for large scale com-putations. The system is targeted at multimedia and sensor data that doesn’t need to bechanged frequently. The authors of this framework claim to provide scalability, fault tol-erance, privacy, and hardware interoperability. However, the Hyrax architecture lacks inenergy efficiency, reduced efficiency for CPU and memory usage, and is not suitablefor slow networking conditions. From the framework’s analysis, we can see that where it

“9780471697558c07” — 2015/3/20 — 16:21 — page 178 — #26

TAB

LE7.

2.Ex

ampl

eof

effe

cts

onris

kpr

obab

ilitie

san

dim

pact

s

Scen

ario

1Sc

enar

io2

Scen

ario

3

Eff

ecto

nPr

obab

ilitie

sT

hepr

obab

ility

valu

esfo

rri

skfa

ctor

slik

eba

ttery

hard

war

epr

oble

m,m

emor

yle

akby

apps

,an

ddi

ffer

ent

mem

ory

size

sw

ould

have

sim

ilar

valu

esas

thos

ein

scen

ario

s2

and

3.T

heri

skfa

ctor

ssu

chas

poor

conn

ec-

tion,

blin

dsp

ots,

infr

astr

uctu

repr

ob-

lem

s,et

c.,w

illha

vehi

gher

prob

abili

tyin

scen

ario

s2

and

3,th

anin

scen

ario

1T

heri

skfa

ctor

ssu

chas

com

patib

il-ity

issu

es,

and

diff

eren

tda

tafo

rmat

ssu

ppor

ted

byde

vice

sw

illha

vehi

gher

prob

abili

tyva

lues

than

scen

ario

s1

and

2.T

hepr

obab

ility

ofth

eri

sk“d

evic

eth

eft”

will

behi

ghes

tin

scen

ario

3.E

ffec

ton

impa

cts

Insc

enar

ios

2an

d3,

the

prob

abili

tyof

John

faci

nglo

wba

ndw

idth

orno

ise

issu

esis

unde

rsta

ndab

lyhi

gh,

but

the

impa

ctva

lue

wou

ldbe

depe

nden

ton

the

leve

lof

band

wid

thde

teri

orat

ion

orle

velo

fno

ise,

for

exam

ple.

i.If

the

band

wid

thde

teri

orat

esfr

om79

Mbp

sto

50M

bps,

then

itsim

pact

wou

ldbe

say

med

ium

ii.B

utif

the

band

wid

thdr

ops

from

79M

bps

to20

Mps

,the

nits

impa

ctw

ould

behi

ghor

very

high

“9780471697558c07” — 2015/3/20 — 16:21 — page 179 — #27


considers the security and privacy risks, it hasn’t considered several risks for performanceand availability of the system when and where needed.

The Cloudlet framework, on other hand, tends to improve the performance (fastresponses) and overcomes slow networking issues. It lacks in mechanisms dealing withsecurity and privacy risks and focuses on most part in improving the performance and deal-ing with resource limitations of mobile devices. As the mobile devices are not directlyconnected to a major remote cloud, so it can be assumed that in some situations certainrisks won’t be a major problem at higher levels. However, at cloudlet level, these privacyand security risks will pose some serious threats and if not dealt with properly, can bevery damaging for business owners and users. This is because the cloudlet infrastructureis fundamentally based on a “self-management” concept and the basic criterion of thisframeworkis tomakethingseasierandself-manageablewithminimalmanual intervention.As a result there is a potentially bigger risk factor as compared to when the infrastructuremanagement was done by domain experts with more technical knowledge of security andsafety procedures are required. Moreover, even if the users are not directly connected toremote cloud server, any failures or weaknesses at cloudlet level can jeopardize the wholesystem and can even pose risks to the whole cloud infrastructure to which that cloudletis connected by passing on the failure instances to the main infrastructure.

Some cloud service providers offer users with the option for different availabilityzones with each having a separate infrastructure. Different zones can be insulated fromeach other and users can host their applications on more than one geographical locationto ensure higher availability. In case of failures, one backs up the other. However, thereis a possibility that weaknesses of this system as a network glitch can cause the serversto start automatic back up, causing complete server congestion and a possible downtimeof several hours. It’s a risk that can affect availability and performance of applicationusage. In case of time critical applications, the risk is manifold. This further raises variousquestions regarding the loop holes in their risk contingency and management.

One thing worth noting is that with an exception of Hyrax, the rest of the frame-works are more concerned about saving costs and resources related to mobile computing(understandable given their research foci). Not much focus has been put on the risks thatcould arise by using that particular framework or risks that can occur by processing appsover that framework. For example, some of frameworks have talked about dividing jobsbetween mobile devices and cloud servers; however, there is no mention of what riskscould be associated by such division of jobs, how those risks can be dealt with and whatrisks users can face if they run their applications based on that framework.

One of the questions we planned to answer via this analysis was what kind of appli-cations will be supported by each framework so that it can be analyzed what type andwhat level of risk management shall be applied to each of these frameworks. Figure 7.2discusses the rationale behind this risk hierarchy diagrammatically—sources of risk canbe at the application-specific level, framework level, or underlying system level. Oneassumption that each of these frameworks have made is that the applications will be sta-ble for longer times without any need for frequent/immediate updates and modifications.Apart from that, it can be assumed that these frameworks are targeting a range of applica-tions as there is no mention of any specific set of apps that will be most suitable for eachframework. This factor can be positive in a sense that now we can assume that a single

“9780471697558c07” — 2015/3/20 — 16:21 — page 180 — #28

Risk hierarchy

We can perceive the risks in mobile cloud computingdomain, at three levels.

1. Level 1 (top level) represents the general risks

within mobile cloud computing domain itself, e.g.,system losses, security breaches, and failures, etc.Level 2 (middle level) represents risks specific tothe mobile cloud computing frameworks and

application models connected to the central/maincloud computing system. This level comprises therisks associated with the use of any particularframework, e.g., risks associated with use of Hyrax

OR risks associated with use of CloneCloud, etc.Level 3 (low level) represents the risks that can arise due to running a particular application onsome particular framework.

2.

3.

Framework A Framework B Framework n

App A2 App An App B1 App B2 App Bn App n1 App n2App A1

Mobile cloud

system

Figure 7.2. Risk hierarchy.

“9780471697558c07” — 2015/3/20 — 16:21 — page 181 — #29


risk management process can be applied to all of these frameworks; but the major dis-advantage of this practice would be that if there are any risks specific to some particularframework, they might be left unattended and can prove to be fatal later on. For example,the Hyrax framework is not suitable for applications requiring immediate responses, andlikewise CloneCloud and Cloudlets don’t provide much in terms of data security andprivacy. So, it can be assumed the time critical applications shouldn’t be run on Hyraxwithout due consideration to risk and contingency measures. Similarly, for security crit-ical applications Cloudlets and CloneCloud may not be the best choice in terms of risksinvolved.

Considering this situation prevailing in MCC world, there is a need for a formal riskmanagement process that

1. is general enough to be applicable to all frameworks/categories as an extendedmodule, and

2. explicit enough to allow catering to framework specific risks.

Table 7.3 provides a tabular analysis of these frameworks at a glance. The column “appli-cations supported” represents what applications are used as references or examples inthese frameworks as there are no mentions of targeted/supported applications by authorsof these frameworks.

7.4.2 Effectiveness of Traditional Risk Management Approachesin Managing Mobile Cloud Risks

A number of approaches have been proposed in literature for managing risks of soft-ware development at many levels. However, most of these processes focus on traditionaldevelopment environment only. These approaches may not be customized to respond tothe challenges of application development for distributed, remote processing cloud envi-ronments. Our survey of the MCC frameworks has also highlighted a fact that there isn’ta formal risk management process in place to deal with risks associated with the use ofthese frameworks or the applications based on these framework.

In order to make the MCC environment more effective and efficient, we need todevise a mechanism to deal with related risks. Such a risk management process can bemade to work at two levels: framework level and application level. Also the MCC riskmanagement process should be able to cope with ad hoc nature of MCC environment.We need a risk management methodology that can be appended to any framework andit will work equally effectively in all mobile cloud domains/environments. For this pur-pose, the most suitable risk management process can be selected and modified to workin dynamic and robust environments to match with cloud computing requirements. Inabsence of any such process, a specialized risk management process can be designedbased on the strengths of existing traditional processes as starting point. For designingsuch a process, we need to explore mobile computing, grid computing and distributedcomputing settings individually too see how each of these domains deal with respec-tive development risks. This information, along with the analysis of surveyed mobile

“9780471697558c07” — 2015/3/20 — 16:21 — page 182 — #30


TABLE 7.3. Framework analysis

Framework Category Features Applicationssupported

Inherentrisk man-agement

Hyrax Applicationpartition-ing/Clientserver

• Used primarily forcomputations thatinvolve data on mobiledevices, not for genericdistributed computation

• Do not expect to replaceor effectively collaboratewith traditional serversfor generic large-scalecomputation

• Sufficient space to storemultimedia data andsensor logs

System is targeted atmultimedia and sensordata, which can beconsidered historicalrecords that do notneed to be changed

None

Pros: scalability, faulttolerance, privacy,hardware interoperability

Risks: no energy efficiency,less effective for CPUand memory usage, can’tcope with slow n/wconditions

Cloudlets VM-based • Location-based services,fast interactive response,rapid customization ofinfrastructure, peakbandwidth load issuesolved

• Deal with resourcepoverty via nearbyresource rich cloudlet

• Cloud solves mobileresource poverty andcloudlets solve latencyissues

• Assumption that devicesmaking up cloudletwon’t be moving duringrequest processing

Local business systems,multimedia apps,human cognitionapplications(voice/facerecognition)

None

(Continued)

“9780471697558c07” — 2015/3/20 — 16:21 — page 183 — #31



CloneCloud VM-based • Four augmentationcategories

• Assumption thatapplications won’tchange much over time

• Assumption thatrequesting devices wouldremain within range ofclone running serverduring requestprocessing

Designed to cater forwide range of apps

None

Amazon EC2 Client/server • Flexibility for users tocreate as many“instances” (virtualservers) as needed

• Four “availability zones’for fault tolerance

• Most widely usedcommercially

Supports all kinds ofapplication viamultiple Web servicecomponents like S3,EBS, SimpleDB,RDS, CF

Multiplezones tohandlefailures

cloud frameworks and the identified risk factors, can then be used to design a (preferablyautomated) risk management framework for MCC.

7.4.3 Related Work for Risk Management in Cloud Computing

When it comes to risk management in cloud computing, we see that most of the con-tributions are limited to the security and privacy issues in clouds. Many authors havehighlighted the security issues in clouds and suggested numerous approaches in dealingwith those security issues, each suggesting a solution depending on their cloud usage scopeand domain [61–66]. Brender and Markov [48] have also highlighted the risks in cloudcomputing but their focus is on the adaptability of clouds by different companies, basedon their analysis of few Swiss companies of different sizes. NIST has also provided somesecurityandprivacyguidelines forclouds inWayneandGrance [47].Thesesolutions rangefrom different authorization and access control mechanisms to audits conducted by thirdparties to ensure security. However, most of these suggestions/proposals are discursive innature with none providing complete solutions to the problem.

An important contribution has been made by Saripalli and Walters [67] in whichthey have proposed a model for quantitatively assessing security risks in cloud plat-forms. In the literature we also see some more advanced intrusion detection and securityframeworks like [65, 66] but as mentioned previously, they have not been implementedyet (to the author’s knowledge) and they are mostly proposed as a guidelineframeworkthat requires cloud computing users to follow some certain steps to achieve security.

“9780471697558c07” — 2015/3/20 — 16:21 — page 184 — #32


Ko et al. [66] have proposed a framework “TrustCloud” for accountability in clouds.They have defined four components of trust in clouds as security, privacy, accountabil-ity and auditability. Another classification of trust is given as preventative and detectionsecurity measures taken to enhance confidence. The detection framework proposed bythem consists of five layers and usually revolves around maintaining logs for account-ability at different layers. Similarly, Zhang et al. [62] have proposed an informationsecurity risk framework focusing on data security issues. The framework proposed bythem is qualitative in nature and very basic with no actual strategies to deal with therisks or threats identified. Like a process outline, it defines only the steps that need tobe taken for mitigating risks. On slightly different lines, the authors in Houmansadret al. [65] have proposed a cloud based intrusion detection framework for mobiledevices. This framework however doesn’t deal directly with cloud/mobile cloud securityissues.

There is also some work being done in privacy issues in cloud computing explicitly.Authors like Pearson et al. [68] have specifically focused on privacy issues as a separateentity from “security” in the cloud computing domain. The suggestion for privacy issuesrange from encryption of data to personalization or preference classification, to preventmisuse or theft of private data.

In summary, we see a lot of work in cloud computing related to security and privacyissues, but there is less work in dealing comprehensively with the range of aspects ofcloud computing risks like performance, connectivity, and mobility. These issues andtheir solutions are mentioned implicitly in some of the work but not explicitly, and henceno concrete work to deal with or mitigate these risks. Also, most of the work we see inregards to risk management and mitigation is theoretical or discursive in nature. Anotherimportant point highlighted by the literature analysis is that though we see work in cloudcomputing regarding security and privacy issues, we don’t see much work being donein MCC in the same areas. Also, as discussed earlier, being proactive in MCC domainneeds the system to be efficiently context-aware. Some authors such as La et al. [69] andPapakos et al. [70] have suggested at using context awareness in cloud domains, but theirfocus is not on using this information for risk management.

7.5 CONCLUSIONS

As MCC is still in its early stages, it still needs some maturity in terms of processes,technical and nontechnical aspects and auditing. This immaturity along with the intri-cacy of mobile cloud system introduces many risks. At system level, these include therisks of connectivity, limited resources, security, and limited power supply. Moreover, asthe system complexity increases, both the technical and nontechnical risks increase andso is the need to manage these risks. The ad hoc nature and mobility in MCC environ-ments means that the development of these system need more rigorous and specializedrisk management to deal with all the risks. This further burdens the developers of MCCframeworks and applications. In addition to the complexity of mobile cloud infrastruc-ture, they also have to deal with the risks at framework/application level including butnot limited to efficient job distribution, virtualization, scalability, and so on.

“9780471697558c07” — 2015/3/20 — 16:21 — page 185 — #33

CONCLUSIONS 185

We make the following observations and notes on interesting future trends:

• Our analysis highlighted the fact that most of the current frameworks and applica-tions tend to focus on savings costs and resources, but not much attention is givento managing the risks. We have also highlighted the reasons that why risk man-agement is important and why it should be incorporated in MCC frameworks andapplications. No one can deny the importance of identifying, and hence treatingthe risks at earlier stages instead of when it has already become a problem. Beingproactive in managing risks can save mobile cloud providers and users from manynegative outcomes like service disruptions, financial losses, data losses, loss ofcustomer satisfaction and confidence and at worst, lost of lives.

• An ideal MCC risk management system should be able to assess continuously(and automatically) what can go wrong, identify risk areas, implement treatmentsolutions for identified factors, and in case of contingency failure, ensure safe sys-tem closure. Such an ideal system can also make use of historical data of previousincidents and risk management records from past incidents. Also, the ideal riskmanagement system shouldn’t focus on just the service providers; rather, it shouldalso take into consideration the other stakeholders as well, such as customers andusers.

• While we have focused on risk, there is a range of issues surrounding risk so thatit is a complex and multifaceted concern, including security, trust and privacy, notjust system reliability and performance. We have only provided a broad overviewbut there remains future work in studying complex issues of risks in specificMCC applications, in those that involve remote cloud servers and those thatuse a collection of surrounding devices, the so-called mobile computing crowd.Emerging mechanisms such as homomorphic encryption4 provides a partial solu-tion to protect data computed with remotely, but its applicability in the mobilecloud environment needs further exploration.

• There is emerging a much larger range of mobile devices, from the Google Glass towatch computers as well as smart jackets and smart shoes, each of which could par-ticipate in a resource pool to provide local cloud-like services or utilise the greaterCloud. An approach that considers the range of devices on multiple users workingtogether in a risk-managed manner could be an interesting avenue of work.

• With the rapid growth in crowdsourcing and crowdsensing, crowd-sourced cloudsbecome interesting together with their human-related risks—while SETI@homehas been around for a long time, mobile versions of that and variants of the ideafor different applications have emerged (e.g., BOINC5).

• Modern mobile computing development is not only resulting in a range of wear-able and mobile devices of different forms, but the Internet of Things has emergedwith everyday objects forming potential resource providers, participating in future

4http://www.infoworld.com/t/encryption/ibms-homomorphic-encryption-could-revolutionize-security-2333235http://boinc.berkeley.edu/

“9780471697558c07” — 2015/3/20 — 16:21 — page 186 — #34


resource clouds. Interesting developments include vehicular clouds (e.g., initiatedby Gerla [71]), and the increasing uptake of drones for non-military uses—one caneven consider fly-in/fly-out cloud servers where mobile computing infrastructurecan be flown into a disaster stricken zone to provide computing services for a time,a situation where perhaps need outweighs the risks.

REFERENCES

1. Livingstone, R. (2011). Navigating through the Cloud. Createspace, ASIN: B005ILWCGG.

2. Huang D. (2011). Mobile Cloud Computing. IEEE COMSOC MMTC E-Letter.

3. Wang Q. A. (2011). Mobile Cloud Computing. Thesis for Master of Science, University ofSaskatchewan, Canada.

4. Ragnet F., Conlee R. G. (2010). Can You Trust Cloud? A Practical Guide to the Oppurtunitiesand Challenges of Document 3.0 Era. Cloud Computing-White Paper, Xerox Corporation.

5. Vaquero L. M., Merino L R., Caceres J., Lindner M. (2009). A break in the clouds: Towardsa cloud definition. ACM SIGCOMM Computer Communication Review, 39: 50–55.

6. Sosinsky B. (2011). Cloud Computing Bible. Hoboken, NJ: John Wiley & Sons, Inc.

7. Baun C., Kunzel M. (2012). A taxonomy study on cloud computing systems and technologies.In Cloud Computing: Methodology, System, and Applications, Wang L., Ed., Boca Raton, FL:CRC Press, p. 73–90.

8. Huang D., Xing T., Wu H. (2013). Mobile cloud computing service models: A user centricapproach. IEEE Network, 27: 6–11.

9. Huang T. D., Lee C., Niyato D., Wang P. (2011). A survey of mobile cloud computing:Architecture, applications, and approaches. Wireless Communications and Mobile Computing.DOI:10.1002/wcm.1203.

10. Satyanarayanan M. (1996). Fundamental Challenges in Mobile Computing. Philttdelphia PA:ACM PODC 96.

11. Gupta A. K. (2008). Challenges of Mobile Computing. Proceedings of 2nd National Con-ference on Challenges & Opportunities in Information Technology (COIT-2008) March 29,Mandi Gobindgarh, India, pp. 86–90.

12. Kovachev D., Cao Y., Klamma R. (2010). Mobile Cloud Computing: A Comparison ofApplication Models, Computing Research Repository, CoRR, vol. abs/1009.3088, 2010.

13. Kumar K., Lu Y. H. (2010). Cloud computing for mobile users: Can offloading computationsave energy? IEEE Computer, 43(4): 51–56.

14. Satyanarayanan M., Bahl P., Cáceres R., Davies N. (2009). The case for VM-based cloudletsin mobile computing. IEEE Pervasive Computing, 8: 14–23.

15. Kovachev D., Renzel D., Cao Y., Klamma R. (2010). Mobile Community Cloud Comput-ing: Emerges and Evolves. Proceedings of Eleventh International Conference on Mobile DataManagement May 23–26, Kansas City, MO, pp. 393–395.

16. Fernando N., Loke S., Rahayu W. (2012). Mobile cloud computing: A survey. FutureGeneration Computer Systems, 29: 84–106.

17. Rajiv R., Zhao L. (2011). Peer-to-Peer Service Provisioning in Cloud Computing Environ-ments. Berlin: Springer, pp. 1–31.

“9780471697558c07” — 2015/3/20 — 16:21 — page 187 — #35

REFERENCES 187

18. Srikumar V., Buyya R., Ramamohanarao K. (2006). A taxonomy of data grids for distributeddata sharing, management, and processing. ACM Computing Surveys, 38(1): 1–53.

19. Lamia Y., Da Silva D., Butrico M., Appavoo J. (2010). Understanding the Cloud ComputingLandscape. Cloud Computing and Software Services, pp. 1–6.

20. Hadoop, A. http://hadoop.apache.org/. Accessed Novemer 21, 2014.

21. Dean J., Ghemawat S. (2008). Mapreduce: simplified data processing on large clusters.Communications of the ACM, 51(1): 107–113.

22. Marinelli E. (2009). Hyrax: Cloud Computing on Mobile Devices Using Mapreduce. MastersThesis, Carnegie Mellon University, Pittsburgh, PA.

23. Jan R., Riva O., Alonso G. (2008). Alfredo: An Architecture for Flexible Interaction withElectronic Devices. Proceedings of the 9th ACM/IFIP/USENIX International Conference onMiddleware (Middleware 2008), December 1–4, Leuven, Belgium, pp. 22–41.

24. Flinn J., Satyanarayanan M., Young P. (2002). Balancing Performance, Energy and Qual-ity in Pervasive Computing. IEEE 22nd International Conference on Distributed ComputingSystems, July 2–5, Vienna, Austria, pp. 217–226.

25. Cuervo E., Balasubramanian A., Cho D., Wolman A., Saroiu S., Chandra R., Bahl P. (2010).MAUI: Making Smartphones Last Longer with Code Of?oad. Proceedings of 8th ACMMobiSys’10, June 15–18, San Francisco, CA, pp. 49–62.

26. Balan R. K., Satyanarayanan M., Park S. Y., Tadashi O. (2003). Tactics-Based Remote Exe-cution for Mobile Computing. Proceedings of the 1st International Conference on MobileSystems, Applications and Services, May 5–8, 2003, San Francisco, CA, pp. 273–286.

27. Satish S., Vainikko E., Šor V., Jarke M. (2010). Scalable Mobile Web Services MediationFramework. IEEE 5th International Internet and Web Applications and Services Conference,May 9–15, Spain, pp. 315–320.

28. Huerta-Canepa G., Lee D. (2010). A Virtual Cloud Computing Provider for Mobile Devices.ACM Workshop on Mobile Cloud Computing & Services: Social Networks and Beyond.MCS’10, June 15, San Francisco, CA.

29. Roelof K., Palmer N., Kielmann T., Bal H. (2010). Cuckoo: A Computation OffloadingFramework for Smartphones. IEEE 2nd International Conference on Mobile Computing,Applications and Services MobiCase’10, October 25–28, San Francisco, CA.

30. Luo X. (2009). From Augmented Reality to Augmented Computing: A Look at Cloud-MobileConvergence. IEEE International Symposium on Ubiquitous Virtual Reality (ISUVR’09), July8–11, South Korea, pp. 29–32.

31. Zhang X., Schiffman J., Gibbs S., Kunjithapatham A., Jeong S. (2009). Securing Elastic Appli-cations on Mobile Devices for Cloud Computing. ACM Cloud Computing Security Workshop(CCSW’09), November 13, Chicago, IL, pp. 127–134.

32. Chun B. G., Maniatis P. (2009). Augmented Smartphone Applications through Cole CloudExecution. Proceedings of the 12th conference on Hot topics in operating systems, May 18–20,2009, Monte Verità, Switzerland, p. 8.

33. Huang D., Zhang X., Kang M., Luol J. (2010). Mobicloud: Building Secure Cloud Frame-work for Mobile Computing and Communication. Proceedings of the Fifth IEEE InternationalSymposium on Service Oriented System Engineering, SOSE, June 4–5, Nanjing, China,pp. 27–34.

34. ACTIA (2004). Guide to Risk Management: Insurance and Risk Management Strategies.Australian Capital Territory: ACT Insurance Authority.

“9780471697558c07” — 2015/3/20 — 16:21 — page 188 — #36


35. OB-007. (2009). Risk Management-Principles and Guidelines. AS/NZS ISO 31000:2009Standards Australia and Standards Newzealand, ACT Australia.

36. Boehm B. W. (1997). Software risk management: Principles and practices. IEEE Transactions,January: 32–41.

37. AIRMIC, et al. (2002). A Risk Management Standard. Technical Report. Institute of RiskManagement (IRM), Association of insurance and risk managers (AIRMIC) and Nationalforum for risk management in public sector (ALARM).

38. Stoneburner G., Goguen A., Feringa A. (2002). Risk Management Guide for InformationTechnology Systems. Nationtal Institute of Standards and Technology (NIST), Departmentof Commerce.

39. Shimonski R. J. (2004) Risk Assessment and Threat Identification. Security Plus StudyGuide http://www.windowsecurity.com/articles-tutorials/misc_network_security/Risk_Ass-essment_and_Threat_Identification.html. Accessed February 14, 2015.

40. Samad J., Ikram N., Usman M. (2007). Managing Risks: An Evaluation of Risk Manage-ment Processes. IEEE International Multi-topic Conference, December 23–24, Islamabad,Pakistan, pp. 281–287.

41. Shipley G. (2010). Cloud Computing Risks. Information Week-Cover Story, pp. 22–24. UBMLLC. http://www.informationweek.com/. Accessed February 14, 2015.

42. Brodkin J. (2008). Seven Cloud Computing Risks. InfoWorld Canada. Downsview: July 2.

43. Mansfield-Devine S. (2008). Danger in clouds. Network Security, 2008: 9–11.

44. Ovadia S. (2010). Navigating the challenges of the cloud. Behavioral & Social SciencesLibrarian, 29(3): 233–236.

45. Scott P., Jaeger P. T., Wilson S. C. (2010). Identifying the security risks associated withgovernmental use of cloud computing. Elsevier: Government Information Quarterly, 27:245–253.

46. Glimmer B. (2011). Navigating the cloud. Broadcast Engineering; ProQuest Telecommuni-cations, 53: 24–26.

47. Wayne J., Grance T. (2011). Guidelines on Security and Privacy in Public Cloud Comput-ing. NIST Special Publication 800-144, Computer Security Division- National Insititue ofStandards & Technology (NIST)/U.S. Department of Commerce.

48. Brender N., Markov L. (2013). Risk perception and risk management in cloud comput-ing: Results from a case study of Swiss companies. International journal of InformationManagement, 33: 726–733.

49. Khan A. N., Mat K., Khan S., Madanic S. (2013). Towards secure mobile cloud computing:A survey. Future Generation Computer Systems, 29, p. 1278–1299.

50. Subashini S., Kavitha V. (2010). A survey on security issues in service delivery models ofcloud computing. Elsevier Journal of Network and Computer Applications, 34: 1–11.

51. Catteddu D., Hogben G. (2009). Cloud Computing: Benefits, Risks and Recommendationsfor Information Security. European Network and Information Security Agency (ENISA).Web Application Security Communications in Computer and Information Science, vol. 72,p. 17.

52. Glott R., Husmann E., Sadeghi A., Schunter M. (2011). Trust Worthy Clouds Underpinningthe Future Internet. Berlin: Springer, pp. 209–221.

53. Armbrust M., Fox A., Griffith R., Joseph A., Katz R., Konwinski A., Lee G., PattersonD., Rabkin A., Stoica I., Zaharia M. (2009). Above the Clouds: A Berkeley View of Cloud

“9780471697558c07” — 2015/3/20 — 16:21 — page 189 — #37

REFERENCES 189

Computing, Research Report. Berkeley, CA: UC Berkeley Reliable Adaptive DistributedSystems Laboratory.

54. Yan Y., Hao X. (2014). Privacy Security Issues under Mobile Cloud Computing Mode.Proceedings of International Conference on Computer, Communications and InformationTechnology (CCIT 2014), January 2014, Beijing, China, p. 49–52.

55. Wasserman, A. I. (2010). Software Engineering Issues for Mobile Application Development.ACM FoSER 2010, November 7–8, 2010, Santa Fe, NM.

56. Sakthivel S. (2007). Managing risk in offshore systems development. European Journal ofOperational Research, 174: 245–264.

57. Kaliski B. S., Pauley W. (2010a). Towards Risk Assessment as Service in Cloud Environments.Hopkinton, MA: EMC Corporation.

58. Kaliski B. S., Pauley W. (2010b). Towards Risk Assessment as Service in Cloud Environments.2nd Unisex Workshop on Hot Topics in Cloud Computing (HotCloud’10). Hopkinton, MA:EMC Corporation.

59. Pandey S., Voorsluys W., Niu S., Khandoker A., Buyya R. (2012). An autonomic cloudenvironment for hosting Ecg data analysis services. Elsevier Future Generation ComputerSystems, 28: 147–154.

60. Samad J., Loke S. W., Reed K. (2013). Quantitative Risk Analysis for Mobile Cloud Comput-ing: A Preliminary Approach and a Health Application Case Study. Proceedings of 12th IEEEInternational Conference on Trust, Security and Privacy in Computing and Communications(TrustCom), July 16–18, Melbourne, Australia, pp. 1378–1385.

61. Ramgovind S., Eloff M., Smith E. (2010). The Management of Security in Cloud Computing.IEEE Information Security for South Africa (ISSA) August 2–4, Johannesburg, South Africa.

62. Zhang X., Wuwong N., Li H., Xuejie Z. (2010). Information Security Risk ManagementFramework for the Cloud Computing Environments. 10th IEEE International Conferenceon Computer and Information Technology (CIT 2010), June 29–July 1, Bradford, UnitedKingdom, pp. 1328–1334.

63. Bisong A., Rehman S. (2011). An overview of the security concerns in enterprise cloudcomputing. International Journal of Network Security & Its Applications, 3(1): 99–110.

64. Carroll M., Merwe A. V., Kotzé P. (2011). Secure Cloud Computing: Benefits, Risks andControls. Information Security South Africa (ISSA), August 15–17, Johannesburg, pp. 1–9.

65. Houmansadr A., Zonouz S. A., Berthier R. (2011). A Cloud-Based Intrusion Detection andResponse System for Mobile Phones. IEEE/IFIP 41st International Conference on DependableSystems and Networks Workshops (DSN-W), June 28–30, Hong Kong.

66. Ko R. K. L., Mowbray M., Pearson S., Kirchberg M., Liang Q., Lee B. S. (2011). Trustcloud:A Framework for Accountability and Trust in Cloud Computing. IEEE World Congress onServices, Cloud & Security Lab, Hewlett-Packard Laboratories.

67. Saripalli P., Walters B. (2010). QUIRC: A Quantitative Impact and Risk Assessment Frame-work for Cloud Security. IEEE 3rd International Conference on Cloud Computing, July 5–10,Florida, pp. 280–288.

68. Pearson S., Shen Y., Mowbray M. (2009). Privacy Manager for Cloud Computing. Berlin:Springer LNCS 5931, pp. 90–106.

69. La H. J., Kim S. D. (2010). A Conceptual Framework for Provisioning Context-Aware MobileCloud Services. IEEE 3rd International Conference on Cloud Computing (CLOUD), July 5–10, Florida, pp. 466–473.

“9780471697558c07” — 2015/3/20 — 16:21 — page 190 — #38


70. Papakos P., Capra L., Rosenblum D. S. (2010). Volare: Context-Aware Adaptive Cloud Ser-vice Discovery for Mobile Systems. 9th International Workshop on Adaptive and ReflectiveMiddleware, ARM’10, November 29–December 3, India, pp. 32–38.

71. Gerla, M. (2012). Vehicular Cloud Computing. Ad Hoc Networking Workshop (Med-Hoc-Net), 2012 The 11th Annual Mediterranean, June 19–22, pp. 152–155.

“9780471697558part3” — 2015/3/20 — 11:54 — page 191 — #1

PART III

CLOUD MANAGEMENT

“9780471697558part3” — 2015/3/20 — 11:54 — page 192 — #2

“9780471697558c08” — 2015/3/20 — 11:59 — page 193 — #1

8ENERGY CONSUMPTION

OPTIMIZATION IN CLOUDDATA CENTERS

Dzmitry Kliazovich1, Pascal Bouvry4, Fabrizio Granelli2, andNelson L. S. da Fonseca3

1Interdisciplinary Centre for Security, Reliability and Trust, University ofLuxembourg, Luxembourg City, Luxembourg

2Department of Information Engineering and Computer Science, University ofTrento, Trento, Trentino, Italy

3Institute of Computing, State University of Campinas, Campinas,São Paulo, Brazil

4Faculty of Science, Technology and Communications, University of Luxembourg,Luxembourg City, Luxembourg

8.1 INTRODUCTION

Cloud computing has entered our lives and is dramatically changing the way peopleconsume information. It provides platforms enabling the operation of a large varietyof individually owned terminal devices. There are about 1.5 billion computers [1] and6 billion mobile phones [2] in the world today. Next-generation user devices, such asGoogle glasses [3], offer not only constant readiness for operation, but also constantinformation consumption. In such an environment, computing, information storage, andcommunication become a utility, and cloud computing is one effective way of offer-ing easier manageability, improved security, and a significant reduction in operationalcosts [4].


193

“9780471697558c08” — 2015/3/20 — 11:59 — page 194 — #2

194 ENERGY CONSUMPTION OPTIMIZATION IN CLOUD DATA CENTERS

Cloud computing relies on the data center industry, with over 500 thousand datacenters deployed worldwide [5]. The operation of such widely distributed data centers,however, requires a considerable amount of energy, which accounts for a large slice ofthe total operational costs [6, 7]. Interactive Data Corporation (IDC) [8] reported that, in2000, on average the power required by a single rack was 1 kW, although in 2008, this hadsoared to 7.4 kW. The Gartner group has estimated that energy consumption accounts forup to 10% of the current data center operational expenses (OPEX), and with this estimatepossibly rising to 50% in the next few years [9]. The cost of energy for running serversmay already be greater than the cost of the hardware itself [10, 11]. In 2010, data centersconsumed about 1.5% of the world’s electricity [12], with this percentage rising to 2%for the United States of America. This consumption accounts for more than 50 millionmetric of tons of CO2 emissions annually.

Energy efficiency has never been a goal in the information technology (IT) indus-try. Since the 1980s, the only target has been to deliver more and faster; this has beentraditionally achieved by packing more into a smaller space, and running processors ata higher frequency. This consumes more power, which generates more heat, and thenrequires an accompanying cooling system that costs in the range of $2–$5 million peryear for corporate data centers [9]. These cooling systems may even require more powerthan that consumed by the IT equipment itself [13, 14].

Moreover, in order to ensure reliability, computing, storage, power distributionand cooling infrastructures tends to be over provisioned. To measure this ineffi-ciency, the Green Grid Consortium [15] has developed two metrics: the power usageeffectiveness (PUE) and data center infrastructure efficiency (DCIE) [16], which mea-sures the proportion of power delivered to the IT equipment relative to the totalpower consumed by the data center facility. PUE is the ratio of total amount ofenergy used by a computer data center facility to the energy delivered to comput-ing equipment while DCIE is the percentage value derived, by dividing informationtechnology equipment power by total facility power. Currently, roughly 40% of thetotal energy consumed is related to that consumed by IT equipment [17]. The con-sumption accounts approximately, while the power distribution system accounts theother 15%.

There are two main alternatives for reducing the energy consumption of data cen-ters: (1) shutting down devices or (2) scaling down performance. The former alternative,commonly referred to as dynamic Power Management (DPM) results in greatest savings,since the average workload often remains below 30% in cloud computing systems [18].The latter corresponds to dynamic voltage and frequency scaling (DVFS) technology,which can adjust the performance of the hardware and consumption of power to matchthe corresponding characteristics of the workload.

In summary, energy efficiency is one of the most important parameters in moderncloud computing data centers in determining operational costs and capital investment,along with the performance and carbon footprint of the industry. The rest of the chapteris organized as follows: Section 8.2 discusses the role of communication systems in cloudcomputing. Section 8.3 presents energy efficient resource allocation and schedulingsolutions. Finally, Section 8.4 concludes the chapter.

“9780471697558c08” — 2015/3/20 — 11:59 — page 195 — #3

ENERGY CONSUMPTION IN DATA CENTERS: COMPONENTS AND MODELS 195

8.2 ENERGY CONSUMPTION IN DATA CENTERS: COMPONENTSAND MODELS

This section introduces the energy consumption of computing and communicationdevices, emphasizing how efficient energy consumption can be achieved, especially incommunication networks.

8.2.1 Energy Consumption of Computing Servers and Switches

Computing servers account for the major portion of energy consumption of data centers.The power consumption of a computing server is proportional to the utilization of theCPU utilization. Although an idle server still consumes around two-thirds of the peak-load consumption just to keep memory, disks, and I/O resources running [19, 20]. Theremaining one-third increases almost linearly with an increase in the load of the CPU[6, 20]:

Ps(l) = Pfixed +(Ppeak − Pfixed)

2(1 + l − e− 1

a ), (8.1)

where Pfixed is idle power consumption, Ppeak is the power consumed at peak load, l is aserver load, and a is the level of utilization at which the server attains power consumptionwhich varies linearly [0.2, 0.5].

There are two main approaches for reducing energy consumption in computingservers: (1) DVFS [21] and (2) DPM [22]. The former scheme adjusts the CPU power(consequently the level of performance) according to the load offered. The power ina chip decreases proportionally to V2f , where V is a voltage, and f is the operatingfrequency. The scope of this DVFS optimization is limited to the CPUs, so that the com-puting server components, such as buses, memory, and disks, continue to function at theoriginal operating frequency. On the other hand, the DPM scheme can power down com-puting servers but including all of their components, which makes it much more efficient;but if a power up (or down) is required, considerably more energy must be consumed incomparison to the DVFS scheme. Frequency downshifts can be expressed as follows(Eq. 8.1):

Ps(l) = Pfixed +(Ppeak − Pfixed)

2(1 + l3 − e− l3

a ). (8.2)

Figure 8.1 plots the power consumption of computing server.Network switches form the basis of the interconnection fabric used to deliver job

requests to the computing servers for execution. The energy consumption of a switchdepends on various factor: (i) type of switch, (ii) number of ports, (iii) port transmissionrates, and (iv) employed cabling solutions; these can be expressed as follows [23]:

Pswitch = Pchassis + nc × Plinecard +R∑

r=1

nrp × pr

p × urp, (8.3)

where Pchassis is the power related to the switch chassis, Plinecard is the power consumedby a single line card, nc is the number of line cards plugged into the switch, Pr

p is the

“9780471697558c08” — 2015/3/20 — 11:59 — page 196 — #4


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Server load

Po

we

r [W

]

No DVFSDVFS enabled

Ppeak

Pidle

Figure 8.1. Computing server power consumption.

power consumed by a port running at rate r, nrp is the number of ports operating at rate r

and urp ∈ [0, 1] is a port utilization, which can be defined as follows:

up =1T

t+T∫t

Bp(t)Cp

dt =1

T ∗ Cp

t+T∫t

Bp(t)dt (8.4)

where Bp(t) is an instantaneous throughput at the port’s link at the time t, Cp, is the linkcapacity, and T is the time interval between measurements.

8.2.2 Energy Efficiency

In an ideal data center, all the power would be delivered to the IT equipment execut-ing user requests. This energy would then be divided between the communication andthe computing hardware. Several studies have mistakenly considered the communica-tion network as overhead, required only to deliver the tasks to the computing servers.However, as will be discussed later in this section, communications is at the heart oftask execution, and the characteristics of the communication network such as bandwidthcapacity, transmission delay, delay jitter, buffering, loss ratio, and performance ofcommunication protocols, all greatly influence the quality of task execution.

Mahadevan et el. [23] present power benchmarking of the most common networkingswitches. With current network switch technology, the difference in power consumptionbetween peak consumption and idle state is less than 8%; turning off an unused portsaves only 1–2 W [24]. The power consumption of a switch comprises three components:(1) power consumed by the switch base hardware (the chassis), (2) power consumed byactive line cards, and (3) power consumed by active transceivers. Only the last componentscales with the transmission rate, or the presence of the forwarded traffic, while the for-mer two components remain constant, even when the switch is idle. This phenomenon isknown as energy proportionality, and describes how energy consumption increases withan increase in workload [24].

“9780471697558c08” — 2015/3/20 — 11:59 — page 197 — #5

ENERGY CONSUMPTION IN DATA CENTERS: COMPONENTS AND MODELS 197

Making network equipment energy proportional is one of the main challenges facedby the research community. Depending on the data center load level, the communicationnetwork can consume between 30 and 50% of the total power used by the IT equip-ment [21, 26] with 30% being typical for highly loaded data centers, whereas 50%is common for average load levels of 10–50% [27]. As with computing servers, mostsolutions for energy-efficient communication equipment depend on downgrading theoperating frequency (or transmission rate) or powering down the entire device or itscomponents in order to conserve energy. One solution, first studied by Shang et al. [25]and Benini et al. [28] in 2003, proposed a power-aware interconnection network uti-lized dynamic voltage scaling (DVS) links [25], and this, DVS technology was latercombined with dynamic network shutdown (DNS) to further optimize energy consump-tion [29]. Refs. [30–34] review the challenges and some of the most important solutionsfor optimization of energy consumption and the use of resources.

The design of these power-aware networks when on/off links are employed is chal-lenging. There are issues with connectivity, adaptive routing, and potential networkdeadlocks [35]. Because a network always remains connected, such challenges are notfaced when using DVS links. Some recent proposals combined traffic engineering withlink shutdown functionality [36], but most of these approaches are reactive and may per-form poorly in the event of unfavorable traffic patterns. A proactive approach is necessaryfor on/off procedures. A number of studies have demonstrated that simple optimiza-tion of the data center architecture and energy-aware scheduling can lead to significantenergy savings of up to 75% based on traffic management and workload consolidationtechniques [21].

8.2.3 Communication Networks

Communication systems have rarely been extensively considered in cloud computingresearch. Most of the cloud computing techniques evolved from the fields of cluster andgrid computing which are both designed to execute large computationally intensive jobs,commonly referred as high-performance computing (HPC) [37]. However, cloud com-puting is fundamentally different: Clouds satisfy the computing and storage of millionsof users at the same time, yet each individual user request is relatively small. These userscommonly need merely to read an email, retrieve an HTML page, or watch an onlinevideo. Such tasks require only limited computation to be performed, yet their perfor-mance is determined by the successful completion of the communication requests butcommunications involves more than just the data center network; the data path from thedata center to the user also constitute an integral part for satisfying a communicationrequest. Typical delays for processing users’ requests, such as search, social networks,and video streaming, are less than a few milliseconds, and we sometimes even measuredon the level of microsecond. Depending on the user location, these delays are as large as100 milliseconds for intercontinental links and up to 200 milliseconds if satellite linksare involved [38]. As a result, a failure to consider the communication characteristicson an end-to-end basis can mislead the design and operational optimization of moderncloud computing systems.

Optimization of cloud computing systems and cloud applications will not onlysignificantly reduce energy consumption inside data centers, but also globally, in the

“9780471697558c08” — 2015/3/20 — 11:59 — page 198 — #6


wide-area network. The world hosts around 1.5 billion Internet users [1] and 6 billionmobile phone users [2], and all of them are potential customers for cloud computingapplications. On an average, there are 14 hops between a cloud provider and end users onthe Internet [39, 40]. This means that there are 13 routers involved in forwarding the usertraffic, each consuming from tens of watts to kilowatts [23]. According to Nordman [41],Internet-connected equipment accounts for almost 10% of the total energy consumed inthe United States. Obviously, optimization of the flow of communication between thedata center providers and end users can make a significant difference. For example, awidespread adoption of the new Energy-Efficient Ethernet standard IEEE 802.3az [42]can result in savings of €1 billion [43].

At the cloud user end, energy is becoming an even greater concern: More and morecloud users use mobile equipment (smart phones, laptops, tablet PCs) to access cloudservices. The only efficient way for these battery-powered devices to save power is topower off most of the main components, including the central processor, transceivers andmemory, while also configuring sleeping cycles appropriately [44]. The aim is to decreaserequest processing time so that user terminals will consume less battery power. Smallervolumes of traffic arranged in bursts will permit longer sleeping times for the transceivers,and faster replies to the cloud service requests will reduce the drain on batteries.

8.3 ENERGY EFFICIENT SYSTEM-LEVEL OPTIMIZATIONOF DATA CENTERS

8.3.1 Scheduling

This section addresses issues related to scheduling, load balancing, data replication, vir-tual machine placement, and networking that can be capitalized on to reduce the energyconsumption in data centers.

Job scheduling is at the heart of the successful power management in data cen-ters. Most of the existing approaches focus exclusively on the distribution of jobsbetween computing servers [45], the targeting of energy efficiency [46], or thermalawareness [47]. Only a few approaches consider the characteristics of the data centernetwork [48–50], such as DPM-like power management [18].

Since energy savings result from such DPM-like power management procedures[18], job schedulers tend to adopt a policy of workload consolidation maximizing theload on the operational computing servers and increasing the number of idle servers thatcan be put into the “sleep” mode. Such a scheduling policy works well in systems that canbe treated as a homogenous pool of computing servers, but data center network topologiesrequire special policies. For example, the most widely used data center architecture [51],fat-tree architecture presented in Figure 8.2, blindly concentrates scheduling and may endup grouping all of the highly loaded computing servers on a few racks, yet this creates abottleneck for network traffic at a rack or aggregation switch.

Moreover, on a rack level, all servers are usually connected using Gigabit Ethernet(GE) interfaces. A typical rack hosts up to 48 servers, but has only 2 links of10GE connecting them to the aggregation network. This corresponds to a mismatchof 48GE/20GE= 2.4 between the incoming and the outgoing bandwidth capacities.

“9780471697558c08” — 2015/3/20 — 11:59 — page 199 — #7

ENERGY EFFICIENT SYSTEM-LEVEL OPTIMIZATION OF DATA CENTERS 199

Core network

Aggregation

network

Access

network

Figure 8.2. Three-tier data center architecture.

Implementation in a data center with cloud applications requiring communication meansthat the scheduler should tradeoff workload concentration with the load balancing ofnetwork traffic.

Any of the data center switches may become congested in either the uplink or down-link direction or both. In the downlink direction, congestion occurs when the capacityof individual ingress links surpasses that of egress links. In the uplink direction, themismatch in bandwidth is primarily due to the bandwidth oversubscription ratio, whichoccurs when the combined capacity of server ports overcomes a switch aggregate uplinkcapacity.

Congestion (or hotspots) may severely affect the ability of a data center networkto transport data. The Data Center Bridging Task Group (IEEE 802.1) [52] specifieslayer-2 solutions for congestion control in IEEE 802.1Qau standard. This standard intro-duces a feedback loop between data center switches to signal the presence of congestion.Such feedback allows overloaded switches to backpressure heavy senders by notifyingthem when congestion occurs. Such technique can avoid some of the congestion-relatedlosses and keep the data center network utilization high. However, it does not address theproblem adequately since as it is more efficient to assign data-intensive jobs to differentcomputing servers so that those jobs can avoid sharing common communication paths.To benefit from such spatial separation in the three-tiered architecture (Fig. 8.2), thesejobs must be distributed among the computing servers in proportion to job communica-tion requirements. However, such approach contradict the objectives of energy-efficientscheduling, which tries to concentrate all of the active workloads on a minimum set ofservers and involve a minimum number of communication resources.

Another energy-efficient approach would be the DENS methodology, which takesthe potential communication needs of the components of the data center into considera-tion along with the load level to minimize the total energy consumption when selectingthe best-fit computing resource for job execution. Communicational potential is definedas the amount of end-to-end bandwidth provided to individual servers or group of serversby the data center architecture. Contrary to traditional scheduling solutions that modeldata centers as a homogeneous pool of computing servers [45], the DENS methodologydevelops a hierarchical model consistent with the state of the art of topology of data cen-ters. For a three-tier data center (see Fig. 8.2), DENS metric M is defined as a weightedcombination of server-level (fs), rack-level (fr), and module-level (fm) functions:

M = α · fs + β · fr + γ · fm (8.5)

“9780471697558c08” — 2015/3/20 — 11:59 — page 200 — #8


where α, β, and γ are weighted coefficients that define the impact of the correspondingcomponents (servers, racks, and/or modules) on the metric behavior. Higher α valuesfavor the selection of highly loaded servers in lightly loaded racks. Higher β values willgive priority to computationally loaded racks with low network traffic activity. Higher γvalues favor the selection of loaded modules.

The selection of computing servers combines the server load LS(l) and the commu-nication potential Qr(q) corresponding to the fair share of the uplink resources on the topof the rack ToR switch. This relationship is given as follows:

fS(l, q) = LS(l) ·Qr(q)ϕ

δr

(8.6)

where LS(l) is a factor depending on the load of the individual servers l, Qr(q) defines theload at the rack uplink by analyzing the congestion level in the switch’s outgoing queueq, δr is a bandwidth over provisioning factor at the rack switch, and ϕ is a coefficientdefining the proportion between LS(l) and Qr(q) in the metric. Given that both LS(l) andQr(q) must be within the range [0, 1] higher ϕ values will decrease the importance ofthe traffic-related component Qr(q).

The fact that the energy consumption of an idle server consumes merely two-thirdof that at peak consumption [19] suggests that an energy-efficient scheduler must con-solidate data center jobs on the minimum possible set of computing servers. On the otherhand, keeping servers constantly running at peak loads may decrease hardware reliabil-ity and consequently affect job execution deadlines [53]. These issues are addressed withDENS load factor, the sum of two sigmoid functions:

LS(l) =1

1 + e−10(l− 12 )

− 1

1 + e−10ε (l−(1− 1

ε ))(8.7)

The first component in Equation (8.8) defines the shape of the main sigmoid, whilethe second serves to encourage convergence toward the maximum server load value(see Fig. 8.3). The parameter ε defines the size and the inclination of this fallingslope and he server load l is within the range [0,1].

Figure 8.4 presents the combined server load and queue-size-related components.The bell-shaped function obtained favors the selection of servers with a load level aboveaverage located in racks with little or no congestion.

8.3.2 Load Balancing

Enabling the sleep mode in idle computing servers and network hardware is the most effi-cient method of avoiding unnecessary power consumption. Consequently, load balancingbecomes the key enabler for saving energy.

However, changes in the power mode introduce considerable delays. Moreover, theinability of instantaneous wake up of a sleeping server means that a pool of idle serversmust be available to be able to accommodate incoming loads in the short term and pre-vent quality-of-service (QoS) degradation. It should be remembered that data centersare required to provide a specific level of quality of service, defined as service-levelagreements (SLAs), even at peak loads. Therefore, they tend to over provision com-puting and communication resources. In fact, on average, data center are functioning

“9780471697558c08” — 2015/3/20 — 11:59 — page 201 — #9


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1L s

(l)

Server load (l)

Favor high server

utilization

Penalize underloaded

servers

ε

Figure 8.3. DENS metric selection of computing server.

00.2

0.40.6

0.81

0

0.5

10

0.1

0.2

0.3

0.4

0.5

Server load (l)Queue load (q)

Fs(l

,q)

Figure 8.4. Server selection according to load and communication potential.

at only 30% of their capacity. The load in data centers is highly correlated with regionand time of the day since more users are active during the daytime hours; the numberof users during the day is almost double that at night. Moreover, user arrival rate is notconstant, but can spike due to the crowd effect. Most of the time almost 70% of datacenter servers, switches, and links remain idle, although during peak periods, this usagecan reach 90%. However, idle servers still need to run OS software, maintain virtualmachines, and power on both peripheral devices and memory. As a result, even whenbeing idle, servers still consume around two-thirds of the peak power consumption. Inswitches, this ratio is even higher with the energy consumed being shared by the switch

“9780471697558c08” — 2015/3/20 — 11:59 — page 202 — #10


chassis, the line cards, and the transceiver ports. Moreover, various Ethernet standardsrequire the uninterrupted transmission of synchronization symbols in the physical layerto guarantee the synchronization required prevents the downscaling of the consumptionof energy, even when no user traffic is transmitted.

An energy-efficient scheduler for cloud computing applications with traffic loadbalancing can be designed to optimize energy consumption of cloud computing datacenters, like e-STAB proposed in Ref. [54]. One of these is the e-STAB scheduler, whichgives equal treatment to communicational demands and computing requirements of jobs.Specifically, e-STAB aims at (i) balancing the communication flows produced by jobsand (ii) consolidating jobs using a minimum of computing servers. Since network trafficcan be highly dynamic and often difficult to predict [55], the e-STAB scheduler ana-lyzes both load on the network links and occupancy of outgoing queues at the networkswitches. This queuing analysis helps prevent a buildup of network congestion. Thisscheduler is already involved in various transport-layer protocols [56] estimating bufferoccupancy of the network switches and can react before congestion-related losses occur.

The e-STAB scheduling policy involves the execution of the following two steps foreach incoming cloud computing data center job:

Step 1: Select a group of servers S connected to the data center network with thehighest available bandwidth, if at least one of the servers in S can accommodate thecomputational demands of the scheduled job. The available bandwidth is definedas the unused capacity of the link or a set of links connecting the group of serversS to the rest of the data center network.

Step 2: Within the selected group of servers, S, select a computing server withthe least available computing capacity, but sufficient to satisfy the computationaldemands of the scheduled task.

One of the main goals of the e-STAB scheduler is to achieve load-balanced net-work traffic as well as to prevent network congestion. A helpful measure is the availablebandwidth per computing node within the data center. However, such a measure does notcapture the dynamics of the system, such as sudden increase in the transmission rate ofcloud applications.

To provide a more precise measure of network congestion, e-STAB adjusts scalesthe available bandwidth to the component related to the size of the bottleneck queue (seeFig. 8.5). This favors empty queues or queues with minimum occupancy and penalizeshighly loaded queues that are on the threshold of buffer overflow (or on the threshold oflosing packets).

By using the available bandwidth with the component Q(t) metric, the availableper-server bandwidth can be computed for modules and individual racks as follows:

Frj(t) =1T

t+T∫t

((Crj − λrj(t)) · e−(ρ·qrj(t)/Qrj·max)ϕ

Srj

)dt (8.8)

“9780471697558c08” — 2015/3/20 — 11:59 — page 203 — #11


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Queue occupacy q(t)/Qmax

Q(t

)

Figure 8.5. Queue-size related component of the STAB scheduler.

0

0.20.4

0.60.8

10

0.20.4

0.6

0.81

0

0.5

1

Fm

and F

r

Queue occupacy (q)Link load (λ)

Figure 8.6. Selection of racks and modules by the STAB scheduler.

where Qrj(t) is the weight associated with occupancy levels of the queues, qrj(t) is thesize of the queue at time t, and Qrj ·max is the maximum size of the queues allowed atthe rack j.

Figure 8.6 presents the evolution of Frj(t) with respect to different values of thenetwork traffic and buffer occupancy. The function is insensitive to the level of utilizationof the network links for highly loaded queues, while for lightly loaded queues, the linkswith the lighter load are preferred to the heavily used ones.

Having selected a proper module and a rack based on their traffic load and congestionstate indicated by the queue occupancy, we must select a computing server for the jobexecution. To do so, we must analyze energy consumption profile of the servers.

“9780471697558c08” — 2015/3/20 — 11:59 — page 204 — #12


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Server load

Fs k

(t)

Figure 8.7. Selection of computing servers by the STAB scheduler.

Once the energy consumption of a server is known, it is possible to derive a metricto be used by the e-STAB scheduler for server selection, as follows:

FSk(t) =1T

t+T∫t

(1

1 + e−10ε(lk(t)− ε

2

))− 1

2

(1 − Pidle

Ppeak

)(1 + lk(t)

3 − e−(

lk(t)τ

)3)

dt,

(8.9)

where lk(t) is the instantaneous load of server k at time t and T is an averaging interval.While the second summand under the integral in Equation (8.9) is a reverse normalizedversion of Equation (8.2), the first summand is a sigmoid designed to penalize selectionof idle servers for job execution. The parameter ε corresponds to the CPU load of an idleserver required to keep the operating system and virtual machines running. Figure 8.7presents a chart for FSk(t).

Balancing the load in federated data centers has also been proposed to reduce energyconsumption. In Ref. [57], the authors propose an algorithm to migrate virtual machinesto data centers, which use renewable energy. They consider a cloud computing scenariowith many data centers, some of them powered by brown (non-renewable) energy sourcesand others with access to green (renewable) energy. They propose an algorithm to decideon a set of long-lived VM’s to be migrated to data centers with access to green energy, tak-ing into consideration both the data center network topology and the energy consumption.They also consider the impact of migration and the use renewable energy availability toenhance their strategy. They show that the brown energy can be replaced by green energywith a low increase in overall the consumption.

A comprehensive work on load balancing in distributed data centers is presented inRef. [58]. Using an optimization modeling, the authors present distributed algorithmsfor achieving optimal geographical load balancing. They also present a study about theeffect of green and brown energy, highlighting its potential benefits.

In Ref. [59], the authors propose an approach to relocate workload in distributed datacenters. The algorithm is based on electricity prices in cities where the centers are located

“9780471697558c08” — 2015/3/20 — 11:59 — page 205 — #13


and in the cost of migration. The solution was modelled as an optimization problem.Using traces of social network applications and real cost of electricity in different regionsof the United States, the authors manage to reduce the average electricity cost. This workdoes not take into account the energy consumption of data center network, modellingonly the overall load of the data center.

The authors of Ref. [60] propose a framework called stochastic power reductionscheme (SAVE) for geographically distributed data centers. The approach was designedfor delay tolerant workloads, such as MapReduce jobs, and two different techniques areused for achieving energy savings: switching off the unused hosts, and DVSF. The pro-posed solution was modeled as an optimization problem in which each data center isrepresented as a set of physical hosts. Jobs can be executed either in the data centerto which they were submitted or in another data center in case the cost of the energyconsumed will be lower.

8.3.3 Data Replication

The performance of cloud computing applications, such as gaming, voice and videoconferencing, online office, storage, backup, and social networking, depends largelyon the availability and efficiency of high-performance communication resources. Forbetter reliability and low latency service provisioning, data resources can be broughtcloser (replicated) to the physical infrastructure, where the cloud applications are run-ning. A large number of replication strategies for data centers have been proposed inthe literature [61–65]. These strategies optimize system bandwidth and data availabil-ity between geographically distributed data centers. However, none of them focuses onenergy efficiency and replication techniques inside data centers.

In Ref. [61], an energy efficient data replication scheme have been proposed fordata center storage. Under utilized storage servers can be turned off to minimize energyconsumption, although one of the replica servers must be kept for each data object toguarantee availability. In Ref. [62], dynamic data replication in a cluster of data gridsis proposed. This approach creates a policy maker, which is responsible for the replicamanagement. It periodically collects information from the cluster heads,with significancedetermined by a set of weights selected according to the age of the reading. The pol-icy maker further determines the popularity of a file based on the access frequency. Toachieve load balancing, the number of replicas for a file is computed in relation to theaccess frequency of all other files in the system. This solution follows a centralized designapproach, however, leaving it vulnerable to a single point of failure.

Other proposals have concentrated on replication strategies between multiple datacenters. In Ref. [63], power consumption in the backbone network is minimized by lin-ear programming to determine the optimal points of replication on the basis of datacenter traffic demands and the popularity of data objects. This relation of the trafficload to power consumption at aggregation ports is linear and, consequently, optimizationapproaches that consider the traffic demand can bring significant power savings.

Another proposal for replication is designed to conserve energy by replicating datacloser to consumers to minimize delays. The optimal location for replicas of each data

“9780471697558c08” — 2015/3/20 — 11:59 — page 206 — #14


object is determined by periodically processing a log of recent data accesses. The replicasite is then determined by employing a weighted k-means clustering of user locationsand deploying the replica closer to the centroid of each cluster. Migration will take placefrom one site to another if the gain in quality of service from migration is higher than apredefined threshold.

Another approach is cost-based data replication [65]. This approach analyzes fail-ures in data storage and the probability of data loss probability, which are directly relatedto each other, and builds a reliability model. Time points for replica creation are thendetermined from the data storage reliability function.

The approach presented in Ref. [66] is different from all the others replicationapproaches discussed earlier due to (i) the scope of the data replication, which is imple-mented both within a single data center and between geographically distributed datacenters and (ii) the optimization target, which takes into account system energy consump-tion, network bandwidth and communication delay to define the replication strategy tobe employed.

Large-scale cloud computing systems comprise data centers geographically dis-tributed around the globe (see Fig. 8.8). The central database (Central DB) is located inthe wide-area network and hosts all the data required by the cloud applications. To speed

Datacenters 2, 3, …, N

Central

DB

Datacenter

DB

Rack DB

Aggregationnetwork

Accessnetwork

Core network

Rack

Wide-Area Network

(WAN)

Central DB access Datacenter DB access Rack DB access

Figure 8.8. Replication in cloud computing data centers. All database requests produced by the

cloud applications running on computing servers are first directed to the rack-level database

server. Rack DB either replies with the requested data or forwards the request to the Data

center DB. In a similar fashion, the Data center DB either satisfies the request or forwards it up

to the Central DB.

“9780471697558c08” — 2015/3/20 — 11:59 — page 207 — #15


up database access and reduce access latency, each data center hosts a local database,called a data center database (Data center DB), which is used to replicate the most fre-quently used data items from the central database. Moreover, each rack hosts at leastone server capable of running a local rack-level database (Rack DB), which is used forsubsequent replication from the data center database.

When data are requested, the information about requesting server, rack, and datacenter is stored. Moreover, the statistics showing the number of accesses and updates aremaintained for each data item. The access rate (or popularity) is measured as the numberof access events per period of time. While accessing data items, cloud applications canalso modify them. Such modifications must be sent back to the database so that all replicasites will be updated.

A module located at the central database, the replica manager, periodically analyzesdata access statistics to identify what items are the most suitable for replication and atwhich replication sites. The availability of these access and update statistics makes itpossible to project data center bandwidth usage and energy consumption.

Figure 8.9 presents the requirements of downlink bandwidth. Since it is proportionalto both the size of a data item and the rate of update, the bandwidth consumption growsrapidly and easily overtakes the corresponding capacities of the core, aggregation andaccess segments of the data center network requiring replication.

Figure 8.10 reports the trade-off between data center energy consumption, includingthe consumption of both the servers and network switches, and the downlink residualbandwidth. For all replication scenarios, the core layer reaches saturation first since it isthe smallest of the data center network segments and has capacity of only 320 GB/s. Theresidual bandwidth for all network segments generally decreases with increase in load,except for the gateway link, for which the available bandwidth remains constant for bothData center DB and Rack DB replication scenarios, since data queries are processed at the

15 30 45 60 75 90 105 120 135 1500

200

400

600

800

1000

1200

1400

Data size (MB)

Ba

nd

wid

th (

Gb

/s)

Ra = 0.2

Ra = 0.4

Ra = 0.6

Ra = 0.8

Ra = 1.0

Acess Network 1024 Gb/s

Aggregation Network 640 Gb/s

Core Network 320 Gb/s

Figure 8.9. Downlink bandwidth requirements.

“9780471697558c08” — 2015/3/20 — 11:59 — page 208 — #16


0 0.2 0.4 0.6 0.8 1 0

20

40

60

80

96

Resi

du

la b

an

dw

idth

(G

b/s

)

(a)

0 0.2 0.4 0.6 0.8 1 0

20

40

60

80

96

Access rate (s–1)Access rate (s–1)

Access rate (s–1)

Resi

du

al b

an

dw

idth

(G

b/s

)

(b)

0 0.2 0.4 0.6 0.8 1 0

20

40

60

80

96

Resi

du

al b

an

dw

idth

(G

b/s

)

(c)

Gateway

Core

Aggr

Access

Energy

692

600

500

400

300

0

Energ

y (

Kw

h)

725

600

500

400

300

0

Energ

y (

Kw

h)

0

300

400

500

600

692

Energ

y (

Kw

h)

Legend

Figure 8.10. Energy and residual bandwidth for (a) Central DB, (b) Data center DB, and (c) Rack

DB replication scenarios.

replica databases and only data updates are routed from the Central DB to the Data centerDB. The benefit of Rack DB replication is two-fold: on the one hand network, traffic canbe restricted to the access network, which has lower nominal power consumption andhigher network capacity, while on the other hand, data access becomes localized, thusimproving performance of cloud applications.

8.3.4 Placement of Virtual Machines

Virtualization represents a key technology for efficient operation of cloud data centers.Energy consumption virtualized data centers can be reduced by appropriate decision onwhich physical server a virtual machines should be placed. Virtual machine consolidationstrategies try to use the lowest possible number of physical machines to host a certainnumber of virtual machines. Some proposed strategies are described next.

In Ref. [67], the authors developed a strategy for traditional three-tier data centerarchitectures which takes into consideration the energy consumption of both servers andnetwork switches. The proposed strategy analyzes the load of each network switch toavoid overloading them. It tries to compromise load balancing of data center network traf-fic and consolidation of virtual machines. Such compromise is important to the operation

“9780471697558c08” — 2015/3/20 — 11:59 — page 209 — #17


of data centers running jobs that impose low computational load but produce heavy trafficstreams.

The problem of virtual machine placement has been addressed by different formu-lations of the bin-packing problem. The proposal in Ref. [46] employs a variation ofthe best fit decreasing algorithm. Although, in this case, only the energy consumptionof servers is considered, results showed potential energy savings without a significantnumber of violation of service level agreements. In Ref. [68], a heuristic is proposed toachieve server utilization close to an optimal level determined by the computation of theEuclidean distance of the allocation state. A first fit decreasing strategy was employed inRef. [69] for data centers processing Web search and MapReduce applications. The con-solidation approach is based on the analysis of CPU usage, and favors the placement ofcorrelated virtual machines in distinct physical servers, to avoid overloading the servers.

The formulation of virtual machine problem presented in [70] includes active cool-ing control besides the traditional approaches such as DPM and DVFS. This work alsodoes not take into account the contribution of network switches to the energy con-sumption of a data center and it shown that active cooling control result in small, butrelevant, gains.

The work in Ref. [71] promotes energy reduction by consolidating network flowsinstead of virtual machines; only the consumption of network switches are considered.Correlated flows are analyzed and assigned to network paths in a greedy way. Thisapproach employs link rate adaptation and shutting down of switches with low utilization.Results derived using simulations based on real traces of Wikipedia traffic demonstratedthat this approach can in fact reduce energy consumption.

8.3.5 Communications Infrastructure

The energy efficiency of a data center also depends on the underlying communicationinfrastructure. Indeed, at the average load level of a data center, the communication net-work consumes between 30 and 50% of the total power used by the IT equipment; thisin turn represents roughly 40% of the total energy budget.

Moreover, an analysis of the distribution of data traffic in clouds suggests that themajority of the traffic is transferred within the data center itself (around 75%), withrest being split between communication with users (18%) and data center to data centerexchanges (7%) [72].

Based on these facts, it is clear the need to develop energy efficient solutions forcommunication technologies and architectures to interconnect the servers in data cen-ters. Since high speed and high capacity are required, the most suitable communicationtechnology for cloud data centers is optical. In the remainder of this section, some pos-sible architectures addressing energy efficient solutions for internal communications indata centers are presented.

Optical interconnection networks are a novel alternative technology to provide highbandwidth, low latency and reduced power consumption. Up until recently, such opticaltechnology has been used only for point-to-point links to connect the electrical switches(fiber optics) thus reducing noise and leaving smaller footprints. However, since the

“9780471697558c08” — 2015/3/20 — 11:59 — page 210 — #18


switches operate in the electrical domain, power hungry electrical-to-optical (E/O) andoptical-to-electrical (O/E) transceivers are required.

New modules connecting the silicon chip directly with optical fibers have beendeveloped, thus enabling switching to be performed in the optical domain.

Optical interconnections can be based on circuit switching or packet switching, eachgenerating different trade-off in terms of energy versus performance. Solely in terms ofenergy efficiency, optical circuit switching represents the most efficient solution, butit leads to high reconfiguration times due to the nature of circuit switching. On theother side, packet switching, although less energy efficient, potentially the source ofgreater latency, achieves better performance, since its reconfiguration time is lower andits scalability higher.

One recent alternative is the usage of optical OFDM. Optical OFDM distributes thedata on a large number of low data rate subcarriers, and can thus provide fine-granularitycapacity to connections by the elastic allocation of subcarriers according to connectiondemands.

The use of optical OFDM as a bandwidth-variable and highly spectrum-efficientmodulation format can provide scalable and flexible sub- and super-wavelength gran-ularity, compared to the conventional, fixed-bandwidth fixed-grid WDM network.However, this new concept poses new challenges for the routing and wavelengthassignment algorithms. Indeed, traditional algorithms for routing and wavelengthassignment will no longer be directly applicable for such new kinds of communicationinfrastructure.

8.4 CONCLUSIONS AND OPEN CHALLENGES

Costs and operating expenses have become a growing concern in the cloud computingindustry, with energy consumption accounting for a large percentage of the operationalexpenses in the data centers used as backend computing infrastructure. This chapteremphasizes the role of communications and network awareness of this consumption andpresents suggested solutions for energy efficient resource allocation in clouds.

The challenge of energy efficiency will largely determine the future of cloud com-puting systems, at present experiencing unprecedented growth. Most of the existingenergy-efficient and performance optimization solutions in the IT domain focus oncomputing, with communications-related processes relegated to a secondary role orunaccounted for. In reality, however, communications are at the heart of cloud systems,and network characteristics, such as bandwidth capacity, transmission delay, delay jit-ter, buffering, loss rate and performance of communication protocols, often determinethe quality of task execution. However, most current research is restricted to processesinside data centers, yet the models must also account for communication dynamics in thewide-area network, and at the user end.

Open research challenges are essentially related to improving the energy scalabilityof cloud computing. The previous sections have underlined the need for the joint opti-mization of computing and communication while maintaining an appropriate balancebetween performance and energy consumption for the overall architecture.

“9780471697558c08” — 2015/3/20 — 11:59 — page 211 — #19

REFERENCES 211

The following specific research challenges have been identified:

• Integration of novel and more efficient energy consumption models for thedifferent components of the cloud computing architecture. As the concept ofenergy-proportional computing is emerging in the design of computing hardwareand software infrastructures, it is also becoming relevant in the design of commu-nication equipment. These emerging models will drive the need for improved andinnovative approaches for the joint optimization and balancing of performance andenergy consumption in cloud computing.

• The concept of Mobile Cloud, deriving from the clear trend toward user mobil-ity (and the “always on” paradigm) and the availability of ever more powerfuldevices in the hands of the cloud services’ users are shaping the possibility ofeven more pervasive usage of the cloud computing infrastructure. Users’ requestfor 24/7 availability of cloud services even in sparsely “covered” areas, will lead toa redefinition or least an evolution, of the cloud architecture, which will involve theneed for efficient dissemination of both information and services across the Inter-net, whether in data centers, on users devices, or somewhere in between. This issure to have an impact on the way data are replicated and services are provided.

REFERENCES

1. Internet World Statistics, available at http://www.internetworldstats.com. AccessedNovember 18, 2013.

2. J. Ekholm and S. Fabre “Forecast: Mobile Data Traffic and Revenue, Worldwide, 2010–2015,”Market report, Gartner Inc., 2011.

3. Google Glass project, available at https://plus.google.com/111626127367496192147.Accessed November 18, 2013.

4. A. Weiss, “Computing in the clouds,” netWorker, vol. 11, no. 4, pp. 16–25, 2007.

5. “State of the Data Center 2011,” Emerson Network Power, Columbus, OH, 2011.

6. X. Fan, W.-D. Weber, and L. A. Barroso, “Power provisioning for a warehouse-sizedcomputer,” Proceedings of the ACM International Symposium on Computer Architecture,San Diego, CA, June 2007.

7. R. Raghavendra, P. Ranganathan, V. Talwar, Z. Wang, and X. Zhu, “No ”Power“ Struggles:Coordinated Multi-level Power Management for the Data Center,” APLOS, ACM, New York,2008.

8. Interactive Data Corporation, available at: http://www.interactivedata.com/. AccessedNovember 18, 2013.

9. Gartner Group, available at: http://www.gartner.com/. Accessed November 18, 2013.

10. A. Vasan and A. Sivasubramaniam “Worth their watts?-an empirical study of data centerservers,” IEEE 16th International Symposium on High Performance Computer Architecture(HPCA), Bangalore, India, January 9–14, 2010.

11. IDC, “Worldwide Server Power and Cooling Expense 2006–2010,” Market Analysis, 2006.

12. J. G. Koomey, “Growth in Data Center Electricity Use 2005 to 2010,” Analytics Press,Oakland, CA, 2011.

“9780471697558c08” — 2015/3/20 — 11:59 — page 212 — #20


13. “Reducing Data Center Cost with an Air Economizer,” Intel IT@Intel Brief, August 2008.

14. N. Rasmussen, “Calculating Total Cooling Requirements for Data Centers,” White Paper #25,American Power Conversion, 2007.

15. The Green Grid Consortium, available at http://www.thegreengrid.org/. AccessedNovember 18, 2013.

16. A. Rawson, J. Pfleuger, and T. Cader, “Green Grid Data Center Power Efficiency Metrics:PUE and DCIE,” edited by C. Belady, White Paper #6, The Grid Grid, 2008.

17. R. Brown et al. “Report to congress on server and data center energy efficiency: public law109-431,” Lawrence Berkeley National Laboratory, Berkeley, 2008.

18. J. Liu, F. Zhao, X. Liu, and W. He, “Challenges Towards Elastic Power Management in InternetData Centers”, Proceedings of the 2nd International Workshop on Cyber-Physical Systems(WCPS 2009), in conjunction with ICDCS 2009, Montreal, Quebec, June 2009.

19. G. Chen, W. He, J. Liu, S. Nath, L. Rigas, L. Xiao, and F. Zhao, “Energy-aware serverprovisioning and load dispatching for connection-intensive internet services,” 5th USENIXSymposium on Networked Systems Design and Implementation, Berkeley, CA, April 16–18,2008.

20. Server Power and Performance characteristics, available at: http://www.spec.org/power_ssj2008/. Accessecd November 18, 2014.

21. J. Pouwelse, K. Langendoen, and H. Sips, “Energy priority scheduling for variable voltage pro-cessors,” International Symposium on Low Power and Design, ACM, New York, pp. 28–33,Huntington Beach, CA, August 6–7, 2001.

22. L. Benini, A. Bogliolo, and G. De Micheli, “A survey of design techniques for system-leveldynamic power management,” IEEE Transactions on Very Large Scale Integration (VLSI)Systems, vol. 8, no. 3, pp. 299–316, June 2000.

23. P. Mahadevan, P. Sharma, S. Banerjee, and P. Ranganathan, “A power benchmarking frame-work for network devices,” IFIP Networking, May 2009.

24. B. Heller, S. Seetharaman, P. Mahadevan, Y. Yiakoumis, P. Sharma, S. Banerjee, andN. McKeown, “ElasticTree: saving energy in data center networks,” 7th USENIX Conferenceon Networked Systems Design and Implementation (NSDI). USENIX Association, Berkeley,CA, 2010.

25. L. Shang, L.-S. Peh, and K. N. Jha, “Dynamic voltage scaling with links for power optimisa-tion of interconnection networks,” International Symposium on High Performance ComputerArchitecture, Anaheim, CA, February 12, 2003.

26. D. Kliazovich, S. T. Arzo, F. Granelli, P. Bouvry, and S. U. Khan, “Accounting for load vari-ation in energy-efficient data centers,” IEEE International Conference on Communications(ICC), Budapest, Hungary, June 9–13, 2013.

27. D. Abts, M. Marty, P. Wells, P. Klausler, and H. Liu, “Energy Proportional Datacenter Net-works,” Proceedings of the International Symposium on Computer Architecture, pp. 338–347,Saint-Malo, France, June 19–23, 2010.

28. L. Benini and G. D. Micheli, “Powering networks on chips: Energy-efficient and reliable inter-connect design for SoCs,” International Symposium on Systems Synthesis, ACM, New York,pp. 33–38, Montreal, Canada, October 1–3, 2001.

29. J. S. Kim, M. B. Taylor, J. Miller, and D. Wentzlaff, “Energy characterization of a tiled archi-tecture processor with on-chip networks,” International Symposium on Low Power Electronicsand Design, ACM, New York, pp. 424–427, Seoul, Korea, August 2003.

“9780471697558c08” — 2015/3/20 — 11:59 — page 213 — #21

REFERENCES 213

30. M. A. Sharkh, M. Jammal, A. Shami, and A. Ouda, “Resource allocation in a network-basedcloud computing environment: design challenges,” IEEE Communications Magazine, vol. 51,no. 11, pp. 46–52, November 2013.

31. X. Leon and L. Navarro, “Limits of energy saving for the allocation of data center resourcesto networked applications,” IEEE INFOCOM, pp. 216–220, Shanghai, China, April 10–15,2011.

32. B. Guenter, N. Jain, and C. Williams, “Managing cost, performance, and reliability tradeoffsfor energy-aware server provisioning,” IEEE INFOCOM, pp. 1332–1340, Shanghai, China,April 10–15, 2011.

33. J. Doyle, R. Shorten, and D. O’Mahony, “Stratus: load balancing the cloud for carbon emis-sions control,” IEEE Transactions on Cloud Computing, vol. 1, no. 1, pp. 1, January–June2013.

34. L. Hongyou, W. Jiangyong, P. Jian, W. Junfeng, and L. Tang, “Energy-aware schedul-ing scheme using workload-aware consolidation technique in cloud data centres,” ChinaCommunications, vol. 10, no. 12, pp. 114–124, December 2013.

35. J. Duato, “A theory of fault-tolerant routing in wormhole networks,” IEEE Transactions onParallel and Distributed Systems, vol. 8, no. 8, pp. 790–802, August 1997.

36. G. Wei, J. Kim, D. Liu, S. Sidiropoulos, and M. Horowitz, “A variable frequency parallelI/O interface with adaptive power-supply regulation,” Journal of Solid-State Circuits, vol. 35,no. 11, pp. 1600–1610, 2000.

37. S. K. Garg, Chee Shin Yeo, A. Anandasivam, and R. Buyya, “Energy-efficient scheduling ofHPC applications in cloud computing environments,” CoRR, abs/0909.1146, 2009.

38. B. Huffaker, D. Plummer, D. Moore, and K. Claffy, “Topology discovery by active probing,”Symposium on Applications and the Internet (SAINT), IEEE Computer Society, Washington,DC, pp. 90–96, 2002.

39. M. E. Crovella and R. L. Carter, “Dynamic server selection in the Internet,” Third IEEEWorkshop on the Architecture and Implementation of High Performance CommunicationSubsystems (HPCS), pp. 158–162, August 23–25, 1995.

40. X. Chen, L. Xing, and Q. Ma, “A distributed measurement method and analysis on Internethop counts,” 2011 International Conference on Computer Science and Network Technology(ICCSNT), IEEE, Dates, pp. 1732–1735, Harbin, China, December 24–26, 2011.

41. B. Nordman, “What the real world tells us about saving energy in electronics,” Proceedings of1st Berkeley Symposium on Energy Efficient Electronic Systems (E3S), Berkeley, CA, May2009.

42. IEEE Std 802.3az-2010, “Media access control parameters, physical layers, and managementparameters for energy-efficient ethernet,” pp. 1–302, October 27, 2010.

43. K. Christensen, P. Reviriego, B. Nordman, M. Bennett, M. Mostowfi, and J. A. Maestro, “IEEE802.3az: the road to energy efficient ethernet,” IEEE Communications Magazine, vol. 48,no. 11, pp. 50–56, November 2010.

44. G. Y. Li, Z. Xu, C. Xiong, C. Yang, S. Zhang, Y. Chen, and S. Xu, “Energy-efficient wirelesscommunications: tutorial, survey, and open issues,” IEEE Wireless Communications, vol. 18,no. 6, pp. 28–35, December 2011.

45. Y. Song, H. Wang, Y. Li, B. Feng, and Y. Sun, “Multi-tiered on-demand resource schedulingfor VM-based data center,” IEEE/ACM International Symposium on Cluster Computing andthe Grid (CCGRID), pp. 148–155, Shanghai, China, May 18–21, 2009.

“9780471697558c08” — 2015/3/20 — 11:59 — page 214 — #22


46. A. Beloglazov and R. Buyya, “Energy Efficient Resource Management in Virtualized CloudData Centers,” IEEE/ACM International Conference on Cluster, Cloud and Grid Computing(CCGrid), pp. 826–831, Melbourne, Australia, May 18–21, 2010.

47. Q. Tang, S. K. S. Gupta, and G. Varsamopoulos, “Energy-efficient thermal-aware task schedul-ing for homogeneous high-performance computing data centers: a cyber-physical approach,”IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 11, pp. 1458–1472,November 2008.

48. M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat, “Hedera: dynamicflow scheduling for data center networks,” Proceedings of the 7th USENIX Symposium onNetworked Systems Design and Implementation (NSDI’10), San Jose, CA, April 2010.

49. A. Stage and T. Setzer, “Network-aware migration control and scheduling of differentiated vir-tual machine workloads,” Proceedings of the 2009 ICSE Workshop on Software EngineeringChallenges of Cloud Computing, International Conference on Software Engineering. IEEEComputer Society, Washington, DC, May 2009.

50. X. Meng, V. Pappas, and L. Zhang, “Improving the scalability of data center networks withtraffic-aware virtual machine placement,” IEEE INFOCOM, San Diego, CA, March 2010.

51. Cisco, “Cisco Data Center Infrastructure 2.5 Design Guide,” Cisco Press, March 2010.

52. IEEE 802.1 Data Center Bridging Task Group, available at: http://www.ieee802.org/1/pages/dcbridges.html. Accessecd November 18, 2014.

53. C. Kopparapu, “Load Balancing Servers, Firewalls, and Caches,” John Wiley & Sons Inc.,New York, 2002.

54. D. Kliazovich, S. T. Arzo, F. Granelli, P. Bouvry, and S. U. Khan, “e-STAB: Energy-efficientscheduling for cloud computing applications with traffic load balancing,” IEEE Interna-tional Conference on Green Computing and Communications (GreenCom), Beijing, China,pp. 7–13, Beijing, China, August 20–23, 2013.

55. A. Sang and S.-q Li, “A predictability analysis of network traffic,” Nineteenth Annual JointConference of the IEEE Computer and Communications Societies (INFOCOM), vol. 1,pp. 342–351, 2000.

56. C. Barakat, E. Altman, and W. Dabbous, “On TCP performance in a heterogeneous network:a survey,” IEEE Communications Magazine, vol. 38, no. 1, pp. 40–46, January 2000.

57. U. Mandal, M. Habib, S. Zhang, B. Mukherjee, and M. Tornatore, “Greening the cloudusing renewable-energy-aware service migration,” IEEE Network, vol. 27, no. 6, pp. 36–43,November 2013.

58. Z. Liu, M. Lin, A. Wierman, S. Low, and L. Andrew, “Greening geographical loadbalancing,”IEEE/ACM Transactions on Networking, vol. PP, no. 99, pp. 1, 2014.

59. M. Ilyas, S. Raza, C.-C. Chen, Z. Uzmi, and C.-N. Chuah, “Red-bl: energy solution for loadingdata centers,” INFOCOM, 2012 Proceedings IEEE, pp. 2866–2870, Orlando, FL, March 25–30, 2012.

60. Y. Yao, L. Huang, A. Sharma, L. Golubchik, and M. Neely, “Data centers powerreduction: atwo time scale approach for delay tolerant workloads,” INFOCOM, 2012 Proceedings IEEE,pp. 1431–1439, Orlando, FL, March 25–30, 2012.

61. B. Lin, S. Li, X. Liao, Q. Wu, and S. Yang, “eStor: energy efficient and resilient data centerstorage,” 2011 International Conference on Cloud and Service Computing (CSC), pp. 366–371, Hong Kong, China, December 12–14, 2011.

“9780471697558c08” — 2015/3/20 — 11:59 — page 215 — #23

REFERENCES 215

62. R.-S. Chang, H.-P. Chang, and Y.-T. Wang, “A dynamic weighted data replication strategy indata grids,” IEEE/ACS International Conference on Computer Systems and Applications, pp.414–421, West Bay Lagoon, Doha, April 2008.

63. X. Dong, T. El-Gorashi, and J. M. H. Elmirghani, “Green IP over WDM networks with datacenters,” Journal of Lightwave Technology, vol. 29, no. 12, pp. 1861–1880, June 2011.

64. F. Ping, X. Li, C. McConnell, R. Vabbalareddy, and J.-H. Hwang, “Towards optimal datareplication across data centers,” International Conference on Distributed Computing SystemsWorkshops (ICDCSW), pp. 66–71, Minneapolis, MN, June 20–24, 2011.

65. W. Li, Y. Yang, and D. Yuan, “A novel cost-effective dynamic data replication strategy forreliability in cloud data centres,” International Conference on Dependable, Autonomic andSecure Computing (DASC), pp. 496–502, Sydney, Australia, December 12–14, 2011.

66. D. Boru, D. Kliazovich, F. Granelli, P. Bouvry, and A. Y. Zomaya, “Energy-Efficient DataReplication in Cloud Computing Datacenters,” Springer Cluster Computing, pp. 1–18, 2015.

67. D. Kliazovich, P. Bouvry, and S. U. Khan, “DENS: data center energy-efficient network-awarescheduling,” Cluster Computing, vol. 16, no. 1, pp. 65–75, 2013.

68. S. Srikantaiah, A. Kansal, and F. Zhao, “Energy aware consolidation for cloud comput-ing,” Proceedings of the 2008 Conference on Power Aware Computing and Systems, ser.HotPower’08, USENIX Association, Berkeley, CA, pp. 10, 2008.

69. J. Kim, M. Ruggiero, D. Atienza, and M. Lederberger, “Correlation-aware virtual machineallocation for energy-efficient data centers,” Design, Automation Test in Europe ConferenceExhibition (DATE), pp. 1345–1350, Grenoble, France, March 18–22, 2013.

70. D. G. d. Lago, E. R. M. Madeira, and L. F. Bittencourt, “Power-aware virtual machine schedul-ing on clouds using active cooling control and dvfs,” Proceedings of the 9th InternationalWorkshop on Middleware for Grids, Clouds and e-Science, ser. MGC’11. ACM, New York,pp. 2:1–2:6, Lisbon, Portugal, December 12, 2011.

71. X. Wang, Y. Yao, X. Wang, K. Lu, and Q. Cao, “Carpo: correlation-aware power optimizationin data center networks,” INFOCOM, 2012 Proceedings IEEE, pp. 1125–1133, Orlando, FL,March 25–30, 2012.

72. Cisco, “Global cloud index: forecast and methodology, 2011–2016,” White Paper, Cisco,2011.

“9780471697558c08” — 2015/3/20 — 11:59 — page 216 — #24

“9780471697558c09” — 2015/3/20 — 12:00 — page 217 — #1

9

PERFORMANCE MANAGEMENTAND MONITORING

Mark Shtern1, Bradley Simmons2, Michael Smit3,Hongbin Lu1, and Marin Litoiu2

1Department of Computer Science and Engineering, York University,Toronto, Ontario, Canada

2School of Information Technology, York University, Toronto, Ontario, Canada3School of Information Management, Dalhousie University, Halifax,

Nova Scotia, Canada

9.1 INTRODUCTION

Organizations are transitioning from private data centers to infrastructure-as-a-service(IaaS)-style resource management where resources are acquired on-demand from a largepool, managed internally (i.e., a private cloud), or by a third-party supplier (i.e., a publiccloud). Interest is growing in creating a single computational fabric across a set of cloudproviders, a multicloud [1–4]. Multiclouds are a natural evolution of cloud computing;also called the intercloud [5, 6], or clouds-of-clouds, in which multiple cloud systems(typically IaaS) are composed together to add value to users. For example, a privateand public cloud can be combined to address data privacy concerns while still enjoy-ing some public cloud benefits (i.e., hybrid clouds, or public/private cloud overlays [7]).Multiple public clouds can be federated to improve availability [1], reduce lock-in, andoptimize costs [8] beyond what can be achieved with a single cloud provider.


217

“9780471697558c09” — 2015/3/20 — 12:00 — page 218 — #2

218 PERFORMANCE MANAGEMENT AND MONITORING

As this transition to multicloud progresses, there are several critical differences thataffect application management: (i) Every application is sandboxed from every otherapplication. (ii) For an individual application, the potential for resource contention iseffectively zero. (iii) Resources can be acquired on-demand according to a pay-as-you-go pricing model. These differences permit applications to be more easily managed on aper-application basis, rather than managing the entire IT infrastructure of an organizationas a whole. This results in a shift of responsibility from established practices. A set ofcloud providers (or, for private clouds, the IT operations teams) manages the physicalinfrastructure and provides virtualized containers (for IaaS, virtual machines or VMs) toclients who wish to deploy applications. The client assumes responsibility for both thefunctional and nonfunctional quality of a deployed application; increasingly, the clientis the development team, a scenario referred to as devops1 and/or noops.2,3,4 Devopsrelies on automation for cost-efficient management of software systems. We providemore details about this transition in Section 9.2.

These changes in operational context (i.e., private datacenter versus multicloud)motivate an evolution in the approach to management of applications. If developers areexpected to manage nonfunctional aspects of their applications, there is value in sup-porting best-practices with regard to the design and implementation of management logicand infrastructural support, while simultaneously incorporating established managementbest-practices into the overall approach. Additionally, the developer should be shieldedfrom the complexity of acquiring and releasing resources in the context of the multicloud.Finally, they should be able to harness their own domain-specific languages (DSLs) andintimate knowledge of the application in support of management objectives instead ofbeing prescribed a particular approach.

We introduce the X-Cloud5 Application Management Platform (XCAMP), a plat-form to enable whomever has assumed responsibility for automating management—application developers, researchers, operations teams, and so on—to focus on how bestto manage their application’s runtime behavior (i.e., its management logic) rather thanfocusing on the minutiae of running on a multicloud. In the classic MAPE-k model ofautonomic systems [9], XCAMP implements and integrates both the Monitoring andExecution stages while placing the onus for Analysing and Planning on the user. Themanagement logic (i.e., operational policies guiding the runtime behaviour of the man-aged application) is specified in the language of the user’s choice using their preferredenvironment and according to the methodology of their choice. In Section 9.3 we posi-tion our work in relation to the state of the art. We then describe the architecture of thisplatform and the challenges in managing the complexity of the multicloud in Section 9.4.

1http://devopsdays.org2http://blogs.forrester.com/mike_gualtieri/11-02-07-i_dont_want_devops_i_want_noops3In devops, developers collaborate with the operations team to build and manage services while in noops it isonly the developers who do this.4In situations where there is no operations team devops is equivalent to noops. For the remainder of this chapter,we will simply refer to devops.5The X is pronounced “cross.”

“9780471697558c09” — 2015/3/20 — 12:00 — page 219 — #3

BACKGROUND CONCEPTS 219

The main contribution of this chapter is the creation, definition, implementation, andevaluation of a novel approach to application management on multi-clouds that confersautonomic properties on applications at runtime and that embraces devops-style manage-ment and facilitates experimentation with diverse autonomic management approaches(e.g., model-based, rules/threshold driven, classic control, etc.) while abstracting awaymany of the low-level cloud programming details and nuisances. An important use casefor XCAMP will be as the management framework for the SAVI6 testbed7 to streamlinethe life-cycle management of applications on a novel cloud architecture and to simplifythe process of deploying runtime management, facilitating research on this two-tier cloudsystem by noncloud experts and students. XCAMP has already been presented in a hands-on tutorial at the SAVI Annual General Meeting (2013) in Toronto, Canada to a group ofapproximately 75 project members (i.e., students, researchers, and industrial participants).

To demonstrate the effectiveness of our framework we have implemented a proto-type. We use this implementation to demonstrate the feasibility of our approach with anexperiment demonstrating the autonomic cloud bursting of a legacy application.8 Addi-tionally, we have run XCAMP on a two-tier cloud architecture and we present a casestudy in which we diagnosed the root cause of a performance bottleneck observed onthe SAVI testbed. Finally, an experiment measuring the throughput of our implementa-tion, ensuring it is practical for managing large systems, is presented. The experiments(described in Section 9.5) effectively demonstrate the capabilities of our approach. Basedon our implementation experience, we describe (Section 9.7) several ongoing challengesfor management in the multicloud.

We close the chapter (Section 9.8) by offering concluding remarks.

9.2 BACKGROUND CONCEPTS

Historically, a company owned a set of dedicated resources (e.g., a private data center)upon which their business applications were run. Typically, there were many such appli-cations and how these applications behaved in relation to each other was of paramountimportance. Specifically, issues of ownership and access were critical (i.e., could appli-cation A run on machine Z between 5 and 8 PM EST). Further, an IT operations teamwas responsible for ensuring both the security and operations of the physical infrastruc-ture and also with ensuring the effective functioning and security of all applications,including those developed in-house. Most applications ran on bare metal (i.e., servers)and a ceiling existed on total available resources that was relatively constant (unless

6Smart applications on virtual infrastructure (SAVI) is a national research project in Canada:http://savinetwork.ca.7The proposed architecture, implemented by the testbed, introduces a novel architecture (i.e., two tier cloud)where virtualized resources exist close to end-users (i.e., the smart edge), allowing applications to accesseither low-latency resources near end-users, or standard public cloud data centers (i.e., the core).8That is, acquiring additional resources from a public cloud when a private cloud does not have sufficientresources to handle its workload [10].

“9780471697558c09” — 2015/3/20 — 12:00 — page 220 — #4


machine upgrades were performed or new resources were added to the data center’s foot-print). Extensive work on management frameworks and methodologies supports theseprocesses.

As described in the introduction, cloud computing is fast removing many of thestandard management barriers that once defined the IT landscape. For example, therequirements to carefully plan for capacity is being eclipsed by the ability to program-matically launch VM using a pay-as-you-go model (i.e., the IaaS cloud) as required.The responsibility of managing the physical infrastructure has been separated from theresponsibility for managing applications. This affords significant flexibility, allowingfor the fine-tuned management of resource acquisition and release, and dynamic config-uration of managed applications. This new-found infrastructural flexibility has alloweddevelopers (or, has allowed managers to push developers) to focus on business-level con-siderations and effective operational strategies (e.g., devops) as an alternative to focusingon highly optimized and tuned code design. The adoption of cloud computing by mediumand large enterprises is expected to accelerate as a growing number of suppliers buildadditional datacenters, as virtualization technologies continue to improve, and as fasternetworking links provide high-speed connectivity.

While this development of technologies supporting the cloud continues to accel-erate, the challenge of how best to manage applications deployed to the cloud remainsunresolved. For example, in 2013 the Amazon.com website went down for longer than 20minutes.9 One popular approach to the management of applications (including those onclouds) is referred to as autonomic computing [9]. Autonomic computing was introducedas a way of dealing with the increasing complexity of systems. It is based on the con-cept of the autonomic nervous system, which in humans is responsible for the constantbeating of the heart among other things. The outwardly observable behavior of an auto-nomic application (i.e., one managed using this approach) is that of self-optimization,self-configuration, self-healing, and self-protection (i.e., self-*) behavior. This approachinvolves a key management component: the autonomic manager.

The autonomic manager is responsible for adjusting the behaviour of an applica-tion in response to both runtime and management policy constraints. More precisely, anautonomic manager’s function can be decomposed into a loop composed of four mainphases: monitoring, analysing, planning, and execution (i.e., the MAPE-k loop). In themonitoring phase, the autonomic manager monitors the performance of the application(and possibly the environment, etc.). In the analysis phase, the autonomic manager anal-yses this data to build up an understanding about what possible strategies to apply toimprove the application’s state. In the planning phase, the autonomic manager selects astrategy from among the possible strategies. In the execution phase, the chosen strategyis implemented.

A key characteristic of autonomic computing is automation. This is also truefor devops. Devops and related approaches are used by major industry trend-setters

9http://venturebeat.com/2013/08/19/amazon-website-down

“9780471697558c09” — 2015/3/20 — 12:00 — page 221 — #5

RELATED WORK 221

(e.g., Amazon10 and Netflix11). Complementary to devops is the process referred to ascontinuous deployment in which the release cycle is shortened from months to days(or even less). For example, Amazon.com deploys a release every 11.6 s [11]. Althoughtraditional applications are faster to develop and deploy and easier to manage in clouds,autonomic applications are still difficult to design, implement, and deploy, and stillrequire substantial knowledge and resources. A goal of this work is to make developmentand deployment of autonomic applications easier. XCAMP mechanisms for automatingthe life-cycle (i.e., deploy, manage, and undeploy) of not only the application, but alsothe management logic responsible for autonomically managing it.

The first step in cloud adoption is often transitioning an on-site datacenterinto a private cloud. However, private clouds, while providing many of the bene-fits of a general cloud (i.e., on-demand resources) lack many of the economies ofscale inherent in public clouds such as massive scale and freedom from equipmentstorage/maintenance/personnel costs, and so on. As a result, both hybrid public–privateclouds and cross-provider deployments are becoming more common. It is well knownthat one of the biggest challenges of constructing both hybrid clouds and/or the multi-cloud is the bridging together of multiple infrastructures. Difficulties include but are notlimited to abstracting away the details of the various provider-specific syntaxes [12],unifying/normalizing the various pricing models [13], providing seamless monitoringacross potentially quite disparate provider domains [14], ensuring data ownership, pri-vacy, locality, security, and so on. This motivates the need for abstraction of the low-leveloperations on the multicloud, a need XCAMP is designed to meet.

9.3 RELATED WORK

The notion of on-demand systems existed well before the advent of cloud computing[15, 16]. Noticing the scale and increasing complexity of systems, IBM introduced thenotion of autonomic computing [9] that popularized the notion of a MAPE-k loop andself-* functionality. The concept of autonomics has also been considered by [17, 18].These concepts can be understood as precursors and/or progenitors in one way or anotherof the current notion of the cloud.

Managing resources in this emerging cloud environment is a significant challenge;Jennings and Stadler enumerate key aspects of this challenge, including: “the scale ofmodern data centers; the heterogeneity of resource types and their interdependencies;the variability and unpredictability of the load; as well as the range of objectives of thedifferent actors in a cloud ecosystem” [19]. As the cloud has begun to take shape, severaltool-kits and frameworks have been introduced as possible approaches to addressing thechallenge of managing resources while extending the cloud’s capabilities. Some well-known examples of these include Reservoir [20], OPTIMIS [21], Aneka [22], and VDC

10http://aws.amazon.com11http://www.netflix.com

“9780471697558c09” — 2015/3/20 — 12:00 — page 222 — #6


Planner [23]. Often, these approaches include notions of federation, multicloud, hybridcloud, and so on. However, they are all devised from a more traditional perspective inwhich a deployment must be carefully designed and optimized in advance so that it maynegotiate a correct SLA to ensure its requirements are met. Where these frameworksare forward-looking and require complex architectural components we chose insteadto focus on the cloud as it is presently available. This design choice allows us to helpbring new users to the cloud and facilitates experimentation with various approachesto the design and implementation of management logic (e.g., model-based, rules and/orthreshold driven, and classic control). Our focus was on facilitating management of appli-cations by the developers not on how to manage the cloud from the perspective of aninfrastructure provider.

An important aspect of facilitating a multicloud involves the notion of a bro-ker [8, 24]. A broker acts to facilitate resource acquisition and release on behalf of aclient application in response to their dynamic requirements at runtime. While in somecases, as demonstrated in this chapter, the management logic suffices to determine fromwhere to obtain and/or release resources to; in other scenarios in which multiple potentialcompeting providers exist, a broker provides a logical component to obtain/release thebest selection of resources as required. Therefore, we envision future integration betweenmanagers and brokers. The broker will be responsible for resource acquisition/release,while the manager will be responsible for application management tasks.

9.4 X-CLOUD APPLICATION MANAGEMENT PLATFORM

The design of X-Cloud Application Management Platform (XCAMP) is based on theMAPE-k [9] loop, with framework components and developer-specified componentsworking in collaboration to perform MAPE-k-based management of an application. Themonitoring and executing portions of the loop are performed by framework components,while the analysis and planning portions are done by developer-specified managementlogic. This separation of concerns guides runtime operations and is presented graphi-cally in Figure 9.1a. XCAMP was designed to work at multicloud scale (i.e., massiveapplication deployments of thousands of nodes) and is able to support multiple appli-cation deployments simultaneously. XCAMP leverages a stream processor paradigm toachieve scalability, fault tolerance, and reliability, and to provide a useful abstractionof streams (long sequences of records) to transfer metrics, key performance indicators(KPI)s, and in general knowledge among the components, with each new tuple processedin transit by the various components. The following sections will provide an overview ofthe XCAMP architecture in terms of the MAPE-k loop and then delve more deeply intoits components and the abstraction features of the platform. First, two usage scenarioswill illustrate the use of the XCAMP platform, one focusing on the impact on a singleapplication, and the other from the perspective of a service provider.

9.4.1 Usage Scenarios

In this section we introduce two illustrative usage scenarios for XCAMP.

“9780471697558c09” — 2015/3/20 — 12:00 — page 223 — #7

Application

Multicloud

XCAMP

Execute

Plan

Monitor

Analyze

Knowledge

XCAMP

Execute

Plan

Monitor

Analyze

Knowledge

Application BApplication A

External data sourcesMETRICS

PlanAnalyze

ExecuteMonitor

Knowledge

Management logic for application A

Application A

Information aggregationservice

Abstraction engine

Execution engine

Deploymentservice

Knowledge Store

External datasources

METRICS

Plugin engine

Notificationengine

(a) (b)

(c)

Figure 9.1. Conceptual overviews of XCAMP: the components with red, dashed borders are

provided by the developer/deployer, components with solid lines are provided by the plat-

form. Dashed lines with arrows represent the flows of data; solid lines represent relationships.

Solid squares indicate VMIs and their color indicates the provider from which they have been

acquired. Inner blue rectangles indicate the management agent on the VMI. Arrows denote

monitoring data and execution command flows. To simplify the presentation we focus on a

single application deployment. However, as was described in the Usage Scenarios, XCAMP is

highly scalable and designed to handle multiple application deployments at the same time.

(a) High-level architecture view. (b) High-level deployed application view. (c) A detailed look at

the components implementing MAPE-k.

“9780471697558c09” — 2015/3/20 — 12:00 — page 224 — #8


9.4.1.1 Scenario 1, Hybrid Clouds. Company A would like to deploy an appli-cation to their private cloud. However, they are constrained by a lack of resources tosupport it during peak periods of demand. They wish to create a hybrid cloud, using pub-lic cloud resources when private resources are exhausted, with resources being addedand removed autonomically based on demand.

• Preparation: After deploying XCAMP to their private cloud12, they would reg-ister both their private cloud and Amazon Elastic Compute Cloud (EC2)13 withthe platform. Next, they would create a deployment document that describes thelayout of their application on cloud resources (i.e., this includes describing nodes,images, services, and communication links between services). They would capturetheir desired autonomic behavior in rules (e.g., one rule might be when resourceutilization in the web-tier exceeds 60%, add a node to the private cloud, unlessresources are exhausted, then add a node to the public cloud). The rules use theterminology defined in their deployment document (e.g., web-tier), and can refer-ence any metric captured by XCAMP. This set of rules is called Management Logicthroughout this chapter, and represents the management policies to be enforced,as implemented by an application capable of accepting monitored metric valuesat a specified URL, making management decisions based on this stream of met-rics, and returning actions to effect change in the deployed application as needed.This Web-based Application is implemented in whatever language the developerprefers, and is deployed automatically by XCAMP into an appropriate container(e.g. Apache Tomcat).

• Deployment: The administrator then submits their deployment documents togetherwith application and Management Logic to the system along with any additionalautomation scripts (i.e., to setup a database). XCAMP automatically instantiatecloud resources and dynamically builds the application according to the givendescriptions. Upon instantiation, the platform automatically begins capturing met-rics from all configured resources, and feeding this stream of metrics to theManagement Logic’s defined URL.

• Runtime Management: As the Management Logic receives metrics from XCAMP,it returns (as-needed) actions that are realized by XCAMP. In this scenario, thecompany would author their Management Logic application to detect increases indemand (as reflected by increases in utilization) and in response add applicationservers first on the private cloud, then on Amazon EC2 when private resourcesare exhausted. XCAMP handles the process of adding resources, including cre-ating instances (of the specific image) from the correct cloud provider (privatecloud, EC2), dynamically installing the correct packages, instantiating the cor-rect services, and connecting these new nodes within the application environmenttopology (i.e., adding them to the front end load balancer and pointing them to

12An automated installation using from 1 to 5 VMs13http://aws.amazon.com/ec2/

“9780471697558c09” — 2015/3/20 — 12:00 — page 225 — #9

X-CLOUD APPLICATION MANAGEMENT PLATFORM 225

the database). Similarly, as demand recedes, these resources can be automaticallyreleased and decommissioned.

9.4.1.2 Scenario 2, Edge-Core Clouds. The SAVI two-tier cloud is made ofedge nodes, close to the end user, and core nodes, located in a big data center. Thearchitecture is meant to support low latency and high bandwidth applications. SAVIadministrators want to provide a management service to the users of a testbed implement-ing this SAVI cloud architecture. The XCAMP platform must enable users to deploy andmanage their applications while accommodating a broad range of practical experience(from novice to expert) with regard to deploying and/or managing applications on theSAVI cloud.

• Preparation: The administrators must deploy XCAMP to their two-tier cloudarchitecture, then provide the XCAMP front-end URL to their users. Administra-tors can decide where to place the initial deployment, on the edge or core nodes,and then author a deployment descriptor.

• Deployment: SAVI researchers submit their jobs (i.e., the application and Man-agement Logic) through the Web interface or RESTful API. XCAMP deploysthe application on the edge and core nodes provisioning at the same time theconnectivity among components.

• Runtime Management: XCAMP ensures all monitored details about a user’sdeployed application environment is collected and routed to the user’s Manage-ment Logic, and accepts all commands issued in response, translating them tolow-level actions and executing these actions. The management logic can act onapplication components on edge or core nodes. Typical actions include scalingout/in on edge and core nodes, live migration of VMs for load balancing, and soon. A key design pattern of XCAMP is its utilization of stream-processing thatfacilitates its ability to scale to massive size and support large number of nodeswhile collecting massive numbers of measurements about runtime performanceand external monitored details. The entire SAVI testbed shares this single platform,avoiding duplication which results in high utilization efficiency.

9.4.2 MAPE-k Loop View

The key XCAMP components and their position in the MAPE-k loop are presentedin Figure 9.1c. In this section, we will describe these various components and theircontributions to the management of a deployed application on the multicloud.

The Information Aggregation Service is the main interface for gathering informationabout the deployed applications, environment, and from external sources (e.g., Twitter,CloudyMetrics [13], CloudHarmony,14 and others). Collected information is streamedto the Notification Engine which is used to forward an augmented stream of metrics to

14http://cloudharmony.com/

“9780471697558c09” — 2015/3/20 — 12:00 — page 226 — #10


the various Management Logic components. In one path, data traverse the AbstractionEngine. Given information from the knowledge store that describes all existing deployedapplications, each metric is translated to a more abstract form based on the terminologydefined in the deployment document (e.g., an IP address is translated to a unique identifierthat is marked as belonging to web-tier).

In a second path, data traverses the Plugin Engine (optionally first passing throughthe abstraction engine) where additional processing is applied to the data stream. Forexample, aggregation may be applied to individual server metrics constructing tier-specific information (e.g., mean CPU utilization per cluster) or the archive of metricshosted by the Information Aggregation Service may be queried to produce metric trends.The platform provides several plugins (e.g., calculating the cost of a deployment based oninformation from CloudyMetrics); the user may add their own. Information that leavesthe plugin engine is specific to a given application, either using information from theabstraction engine or from the user-supplied plugins.

The Management Logic represents developer-specified management directives (e.g.,management policies [25]) designed to guide the runtime behavior of the application.The XCAMP framework does not place any restriction on how the Management Logicis expressed/implemented; it is run within a sandboxed container, on its own VM. The

1 public class ManagerServletEx extends HttpServlet{2 ...3 public void doGet(HttpServletRequest request , HttpServletResponse res) {4 ...5 if (LOAD_ONE.equals(request.getParameter(METRIC_NAME)))6 updateMetricValue(request.getParameter(SOURCE), LOAD_ONE ,

toDouble(request.getParameter(VALUE)));7 ...8 if ((caculatedMeanLoadForAppTier appTierScaleDownTheshold) &&

(getSizeOfPublicFootprint() MIN_PUBLIC_SIZE)) {9 elasticScaleFootprint(PUBLIC_CONTAINER , size_public - SCALING_INCREMENT);10 return;11 ...12 }13 ...14 public class ActionGenerator {15 ...16 public void elasticScaleFootprint(String tierName ,int finalFootprintSize) {17 .....18 JSONMessage msg = new generateJSONMessage(SCALE_FOOTPRINT , tierName ,

finalFootprintSize);19 sendJSONResponse(msg);20 }21 ...22 }

<>

Figure 9.2. Sample code (with exception handling omitted to simplify readability) for the cloud

bursting Management Logic implementation is presented. This Management Logic is imple-

mented as a Java servlet. The doGet method is called from the monitoring components of the

XCAMP platform with updates about all relevant monitored metrics and the response is either

empty or carries an action to be performed by the XCAMP Execution Engine. On line five an

update about load_one is processed for a particular node in which the metric METRIC_NAME

for the node SOURCE of the application topology is updated with the value VALUE. The man-

agement rule presented on line eight is one of the four rules used to implement the elastic

bursting strategy and can be stated informally as follows: IF the mean load for the application

server tier is less than a threshold and the size of the application server tier on the public cloud

is greater than MIN_PUBLIC_SIZE THEN scale down the public footprint of the application server

tier by SCALING_INCREMENT. The ActionGenerator class on line 14 is used to generate and send

JSON messages to indicate what action the execution action should take (e.g., line 18).

“9780471697558c09” — 2015/3/20 — 12:00 — page 227 — #11

X-CLOUD APPLICATION MANAGEMENT PLATFORM 227

Management Logic represents a combination of both the Analyze and Plan componentsof the MAPE-K loop. It offers a URL to which metrics are submitted, and responds withJSON-formatted actions that are passed to the Execution phase. Each developer usesbest practices for filtering requests for their chosen platform (e.g., Java EE Filters) todecide which metrics reach the Web application logic. A Management Logic componentis responsible for managing its own data store if required. A partial excerpt from a Java-based implementation of Management Logic is presented in Figure 9.2.

The Execution Engine and the Deployment Service implement the Execution com-ponent of the MAPE-k loop. The Execution Engine accepts requests for changes fromthe Management Logic and converts this into high-level workflow statements. Thesestatements are forwarded to the Deployment Service, which executes a set of lower levelworkflows to implement the requested changes to the application’s deployment and/orconfiguration. For example, the Management Logic might request an additional resourcebe added to its web-tier; the Execution Engine translates this request to a parameterizedcall to the deployment service which creates and provisions the node and re-configuresthe load balancer. The components collaboratively maintain a knowledge base of sys-tem state through the Knowledge Store component that stores data about the historical,current, and predicted future state of the system.

9.4.3 Deployment View

The process of application deployment requires that the developer submit a declara-tive deployment document [26] that describes the application pattern to deploy [27],a deployable version of their application (e.g., a WAR file), and a Management LogicWeb application (e.g., a WAR file). The submission component (not shown) passes thisinformation to the deployment service, which deploys the application in accordance withthe deployment document (using user-supplied credentials) and registers the deploy-ment in the Knowledge Store. The Management Logic application is deployed, and isautomatically registered with XCAMP to receive pertinent information for its associatedapplication. The deployer may specify external data sources from which informationshould also be retrieved. Application-level metrics are submitted to the InformationAggregation Service and will be available to the Management Logic.

The result is the automatic collection and pushing of well-formatted, high-level,abstract, consistent metrics to the Management Logic. Using whatever approach andmethodology the developer prefers—ad hoc Java code, a Web interface wrapper toan existing management system, and so on the analysis and planning steps are com-pleted. If actions are required, a JSON message is passed to the Execution Engine,where it is de-abstracted and passed to the deployment service to modify the runningdeployment.

Once an application is deployed using XCAMP, its structure will be similar to thoseof the applications presented in Figure 9.1b. Functionally, an application deploymentrepresents a complex graph of an application in which nodes are VMIs running on thevarious cloud providers and edges represent communication channels between thesenodes. XCAMP deploys a management agent to each VMI in an application. This agentis responsible for transmitting collected monitored data to the Information Aggregation

“9780471697558c09” — 2015/3/20 — 12:00 — page 228 — #12


Service and for modifying configuration settings of the installed application stack, theoperating system, and/or altering the set of installed applications on the VMI in responseto commands from the deployment service.

To facilitate operations, XCAMP communicates directly with the various cloudprovider APIs (e.g., AWS and Openstack), which allows it to perform operations likeadding and removing instances, and to collect metrics from the provider when available.XCAMP also monitors data from sources other than cloud resources and further passes itto the Management Logic; for example, data from Twitter, CloudyMetrics, or CloudHar-mony can be passed to Management Logic to assist in decision making. For example,should there be a failure of a region in AWS, XCAMP will receive this status update.This data can be utilized by the Management Logic in order to make decisions aboutwhere to deploy nodes of the application. This might include transposing [1] applicationservers to alternative cloud providers on the fly or simply to avoid launching new VMIsin affected regions.

9.4.4 Information Abstraction in XCAMP

A key contribution of XCAMP is the abstraction of low-level details that differ amongvarious cloud providers, and a common metrics format. To illustrate how we hide com-plexity in XCAMP from the management logic, consider the following example ofadding an instance to a deployed application.

After the Management Logic determines that an instance must be added to the appli-cation server tier of a deployed application, it sends a JSON-formatted message to theExecution Engine saying There should be five servers similar to Web server A in clustermy-web-tier. The Execution Engine translates this declarative request for resources into ahigh-level workflow, which is passed to the Deployment Service with associated informa-tion (e.g., which application deployment to modify). The Deployment Service translatesthis into a low-level workflow as follows. First, the Deployment Service determinesupon which cloud provider Cluster my-web-tier is running. This allows it to connectto the correct cloud provider. It then requests two instances of the same configurationas web server A (determined by the deployment document, e.g., m1.large) and deploysthe management agents on both instances. The management agents are then instructed todeploy an identical software stack with the same configuration as Web server A. After theinstances are fully configured and ready, the agent will begin streaming monitored datato the Information Aggregation Service, which will ultimately inform the ManagementLogic of the successful addition of resources to the deployed application.

9.4.5 Management Logic

Much work has been done in the domain of distributed systems management (see, e.g.,the proceedings of IEEE/IFIP NOMS, IFIP/IEEE IM, and CNSM). Policy-based man-agement [25, 28, 29] represents an approach to management in which managementactions are decoupled from management logic and in which the management logicis interpreted at runtime. This affords a flexible management paradigm in which as

“9780471697558c09” — 2015/3/20 — 12:00 — page 229 — #13

IMPLEMENTATION 229

management logic changes, policies may be altered thus facilitating the design of elegantautonomic systems. Often, a policy specification language [30–32] is used to encapsu-late the rules that govern the behavior of the system. While we embrace the need forspecification languages, especially in the context of large distributed systems and net-work management, we feel that their utility is tightly coupled to the actor who is taskedwith using them and the specific environment in which they are used. In devops, wefeel freedom should be given to the developers to do it their way and to take advan-tage of all the intimate details that they possess with regards to the inner workingsof the application that they are managing. Unlike in traditional management situationswhere system administrators, operations teams and/or business people require a mecha-nism for automating the control of their systems that is semantically clear to them anddoes not place much emphasis on programming capabilities. Developers have an entirearsenal of tools, libraries, and methodologies for ensuring that a system is functionallycorrect. These can be embraced to ensure the nonfunctional requirements of a systemare being met as well. Due to their experience with programming languages (and likelylack of experience with DSLs like policy specification languages), use of programminglanguages may be a preferred approach for specifying management logic. Further, ourproposed approach does not in any way preclude the use of an existing policy specifica-tion language/PBM solution, which could be readily employed as the Management Logiccomponent.

9.5 IMPLEMENTATION

To demonstrate the feasibility of our approach, we authored a proof-of-concept imple-mentation of XCAMP. We leveraged existing libraries and frameworks where possibleto allow us to focus on the abstraction task.

Monitoring components are built on the Misure [14] extensible, distributed, andscalable monitoring system. Due to its central importance in XCAMP we provide abrief overview of it in Section 9.5.1. The Information Aggregation Service, AbstractionEngine, Plugin Engine, and Notification Engine were written as elements that used thestream-processing paradigm central to Misure to communicate and scale horizontally.

Execution is provided by the Execution Engine, which like the other engines is builton Misure; and by the deployment service, for which we used a customized version of thepattern-based deployment service15 (PDS) [26] developed by our team. Similar to Misurethe PDS plays an important role in XCAMP and so we elaborate on it in Section 9.5.2.The Execution Engine connects to the deployment service via a RESTful API.

Analysis and Planning is provided by Management Logic applications, one perdeployed application, running in a Java EE container (Tomcat) on a dedicated VMI. Theresponsibility for authoring the Management Logic application rests with the applicationdeveloper/deployer; they submit WAR files for deployment. Any container is adequate

15https://github.com/ceraslabs/pattern-deployer; the customizations have been pulled into the master version.

“9780471697558c09” — 2015/3/20 — 12:00 — page 230 — #14


for this purpose; others could be added with a straightforward extension to the currentimplementation.

9.5.1 Misure

In previous work, we defined a set of requirements for monitoring in heterogeneousfederated clouds [14], defined a suitable architecture built on stream-processing, andimplemented a prototype solution. Based on an enhanced publish–subscribe pattern,the design and implementation allow for the gathering of metrics at any level (sys-tem, application, etc.) from disparate sources like Ganglia [33], SNMP sources, AmazonCloudwatch, and various Web APIs. These streams of metrics are transformed (aggre-gated, annotated, split, etc.) in transit, and published to interested users as live streams(push-type). Metrics are also persisted to long-term storage which can be queried viaan API (poll type). The prototype was evaluated on pubic clouds and found effective athandling metrics at scale with low infrastructure cost.

The core abstraction underlying Misure is stream processing, in the family ofcomplex event processing [34]; as an abstract concept, it refers to the generation, manip-ulation, aggregation, splitting, and transformation of data organized in a long sequenceof records. One example is Storm,16 a Twitter open-source project, on which Misure isbuilt. Storm is billed as a “distributed, scalable, reliable, and fault-tolerant stream pro-cessing system,” and can be used for stream processing, continuous computation, anddistributed RPC.17

One of the key features of Storm is the effort to manage the complexity of dis-tributed computation on realtime data entirely behind the scenes. This includes guaran-teed message processing; aggressive resource management (garbage collecting defunctprocesses); fault detection and task reassignment after failure, efficient, and scalablemessage transmission; streams that consist of any data (serialization occurs behind thescenes); and local development environments for debugging. Storm also allows com-ponents to be implemented using many programming languages. Storm is parallel anddistributed; there is no central router and no intermediate queue. It is designed to scalehorizontally, and has been deployed at scale processing large Twitter data sets.

9.5.2 Pattern-Based Deployment Service

The pattern-based deployment service, PDS [26], emerged out of the need to sim-plify the process of deploying complex multitier applications to cloud environmentsand further to adapt them at runtime (i.e., dynamically add/remove nodes to an exist-ing application topology). Specifically, the notion of describing declaratively what youwant versus how to achieve it appealed to us. Further, the use of patterns is quite pow-erful in they allow for simplification, re-use and sharing. The PDS hides the low-level

16https://github.com/nathanmarz/storm17https://blog.twitter.com/2011/storm-coming-more-details-and-plans-release

“9780471697558c09” — 2015/3/20 — 12:00 — page 231 — #15

IMPLEMENTATION 231

<topology id="scale"><instance_templates>

<template id="openstack_small_vm"><cloud>OpenStack</cloud><instance_type>2</instance_type><key_pair_id>mark</key_pair_id><image_id>50</image_id><ssh_user>root</ssh_user>

</template></instance_templates><container num_of_copies="4" id="web_host_container">

<node id="web_host"><use_template name="openstack_small_vm"/><service name="web_server">

<database_connection node="data_host"/><war_file>

<file_name>petstore.war</file_name><datasource>jdbc/pet</datasource>

</war_file></service></node></container><node id="data_host">

<use_template name="openstack_small_vm"/><service name="database_server">

<script>petstore.sql</script><service/>

</node><node id="web_balancer">

<use_template name="openstack_small_vm"/><service name="web_balancer">

<member node="web_host"/></service>

</node></topology>

Figure 9.3. A sample deployment document written in an XML-based DSL. This particular doc-

ument describes a deployment on six nodes in total (i.e., one Web balancer, four Web hosts,

and one database server.

details about how to deploy services to any cloud provider, making it quite useful inthe context of multicloud. Specifically, the PDS facilitates application topologies to bedescribed, deployed, and adapted across multiple cloud providers. The PDS has beenused successfully on EC2 and Openstack and has support for all Fog18 compliant cloudproviders. Further, the PDS has been open-sourced and is available for download andcontributions.19

With PDS, a user describes the “pattern” of their application in an XML-baseddomain-specific language (DSL). This DSL is quite easy to understand and com-prises several key elements that can be grouped as functional elements (e.g., Topol-ogy, Node, and Service) and syntactic sugar (e.g., instance_templates, Container, andnum_of_copies). A sample deployment document written in this XML-based DSL ispresented in Figure 9.3.

18http://fog.io/about/provider_documentation.html19https://github.com/ceraslabs/pattern-deployer

“9780471697558c09” — 2015/3/20 — 12:00 — page 232 — #16


9.6 EXPERIMENTS AND A CASE STUDY

Using our implementation, we performed two experiments (Sections 9.6.1–9.6.3). In thefirst, we demonstrated the process of implementing Management Logic and using itto manage an application facing changing workloads; in the second, we showed earlyperformance results demonstrating that a Management Logic component can handlemillions of metrics. We also used this implementation in a case study demonstratinghow researchers running on an experimental testbed can more easily perform complexexperiments repeatedly to obtain meaningful results (Section 9.6.4).

9.6.1 Experimental Setup

Experiment 1: For the managed application, we used a sample Java EE web applica-tion that accepts requests; connects to a database (mysql) to perform selects, inserts, orupdates; and returns a response. We defined a declarative deployment document (seeSection 9.4.1) with a database server, a load balancer, and a cluster of web applicationservers initialized to a single instance running in the private cloud and no instances inthe public cloud.

We authored the application’s Management Logic, see Figure 9.2 for an excerptof this code, as a second Java EE Web application, implementing the RESTful interfacedefined by the platform to receive new metrics and information about resources deployed.The Management Logic collected the 1-min load averages20 for each web applicationserver, calculated an average, and requested additional resources when a configurablethreshold was surpassed. Resources were released when the average fell beneath a secondconfigurable threshold. Given the limited capacity of private clouds, after two instancesare running on the private cloud, the manager requests resources from the public cloud.To limit churn, a refractory period was introduced (as a configurable parameter that couldbe changed on the fly through the RESTful interface): 10 min between adding nodes, and5 min between removing nodes. Aside from features to allow run-time configuration ofvarious parameters, the Management Logic consisted of 65 source lines of code.21

The PDS was deployed to an Amazon EC2 m1.small instance. We created a deploy-ment package including the Web application WAR, its Management Logic WAR, and thevarious keys and credentials required to provision instances on the clouds in our topol-ogy, and submitted this package to the PDS. The Management Logic ran on a t1.microinstance; Web application servers were deployed to a local Openstack installation run-ning on a dedicated IBM Bladecenter in a university data center, with a 100 Mbps uplink.An openstack.small instance was defined with 2 GB of RAM and one virtual CPU. Thepublic cloud deployment; if necessary; was to m1.small instances on Amazon EC2.

20Load average refers to the number of processes ready for CPU time on average over some period of time,1 min in this case.21Clearly, there is substantial room to improve this algorithm; the focus is on the enabling platform and notthe adaptive scaling algorithm employed.

“9780471697558c09” — 2015/3/20 — 12:00 — page 233 — #17

EXPERIMENTS AND A CASE STUDY 233

The XCAMP implementation ran on a four-core cluster in Amazon EC2. Ganglia22

monitors acquired the metrics from each machine and passed them to the InformationAggregation Service.

Finally, an Apache JMeter23 test plan was used to generate load. The workload wastwo-thirds read requests (e.g., browse catalog), one-sixth write requests (e.g., checkout),and one-sixth update requests (e.g., modify user profile). The design was to peak work-load at 120 simultaneous threads sending requests as quickly as possible, launching in 4groups of 30 threads, each ramping up over a 5-min period. The first group launched atstart time t minutes; the second at t+10, the third at t+25, and the fourth at t+33. Peakworkload was maintained until t + 90, when the fourth group was terminated, followedby the second group at t+100, and the final two groups at t+120. This plan was executedby an m3.xlarge (quad-core) instance in Amazon EC2.

Experiment 2: Using the same implementation and components, we examined theperformance of the Management Logic when run on three different instance sizes inAmazon EC2 (t1.micro, m1.small, and m1.medium) to assess the scalability of thisapproach. We created simulated metrics and submitted them via the RESTful API asquickly as the service could handle them for a 1-h period. The mix of metrics was pro-portional to reality, with many requests being irrelevant to the actual decision-making.Our primary question was whether it would be necessary to autonomically scale theManagement Logic Container as a cluster for large topologies.

9.6.2 Results

Experiment 1: Figure 9.4 illustrates what happened during the experiment, showingthe addition and removal of instances, the size of the workload, the average load over alldeployed resources, the average response time, and the total throughput. The deploymentbegan with a single private instance. This was sufficient for the first workload group;24

but shortly after adding the second workload group, the autonomic manager detectedload average greater than 1.0 (Fig. 9.4a). A private cloud instance was requested (lightred band). There are brief spikes in response time (up to 4 seconds) when the node isadded to the balancer manager and when the node is first enabled and receives its firstrequests which are not shown due to the smoothing (for readability). The two privateinstances are sufficient to handle 60 workload threads, but not 90 where a third instanceis required. This instance is requested from Amazon EC2 (light orange band).

As the experiment continues, workload continues to increase and the load averageremains high. Amazon m1.small instances are substantially smaller than openstack.smallinstances; a total of five are required (added as soon as possible given the refractoryperiod) to meet the generated workload. After the activation of the final public instance,load average settles at around 1 and remains there, providing stable response time and

22http://ganglia.sourceforge.net23http://jmeter.apache.org24Note that instances running on this Openstack installation using KVM use a virtual CPU and report loadaverages differently, counting processes waiting in a queue and NOT running processes.

“9780471697558c09” — 2015/3/20 — 12:00 — page 234 — #18


0

1

2

3

4

5

6

7

00:15 00:30 00:45 01:00 01:15 01:30 01:45 02:00 02:15

Lo

ad

ave

rag

e (

1m

)

Th

rou

gh

pu

t

(a)

00:15 00:30 00:45 01:00 01:15 01:30 01:45 02:00 02:15

(b)

0

50

100

150

200

Response t

ime

Thro

ughput

Public instances

Private instances

Workload

Load

Max load target

Public instances

Private instances

Workload

Response time

Figure 9.4. Measurements from the scaling experiment, first adding private instances then

bursting to the public cloud. The stacked graphs show the instance counts, with the blue line

representing throughput (smoothed to be more readable). (a) One-minute load average, aver-

aged over all active instances. (b) Average response time over a 1 seconds window, smoothed

using splines to improve readability. Sharp spikes (peaking at 4 seconds, not shown) are due to

load balancer restarts when adding a new node, and an initial period of slow response times

for new nodes.

maximum throughput. Once the workload decreases, the additional instances are grad-ually removed—first the public instances, then the private instance (at the end of theexperiment).

Experiment 2: Figure 9.5 presents the results of the scaling, showing the totalthroughput and average response time achieved by the three instances running thestraightforward bursting adaptation policy. A gradual ramp-up was included in the loadgeneration. The t1.micro instance is specified for only periodic or bursting workloads,not for sustained load; this is evident in the results as performance varies dramatically.There are several drops in throughput, due to either other running tasks competing forresources or the variation inherent in the public cloud [35] which is most noticeable withsmaller single instance sizes. The t1.micro response time results make it difficult to seecorresponding degradation in response time.

“9780471697558c09” — 2015/3/20 — 12:00 — page 235 — #19


(a) (b)10000

Re

qu

ests

(sec)

8000

6000

4000

2000

0

300

200

250

150

100

Response tim

e (

ms)

50

0

00

:00

00

:10

00

:20

00

:30

00

:40

00

:50

01:0

0

00

:00

00

:10

00

:20

00

:30

00

:40

00

:50

01

:00

m1.medium

m1.smallt1.micro

m1.medium

m1.smallt1.micro

Figure 9.5. Measurements from a scaling test measuring the throughput of our implemented

Management Logic on three Amazon EC2 instance sizes. (a) Throughput (b) Average response

time; peaks (not shown) at 800 milliseconds.

9.6.3 Discussion

Experiment 1: We demonstrated the ability to use a standard Java EE Web applicationwith a simple adaptation policy written in a programming language familiar to the orig-inal application developer using the common RESTful API pattern. There is room toimprove the Management Logic; for example, to better handle the time required for anew instance to become active and handle requests.

While designing the Management Logic, we considered several metrics as the basisfor adaptation. We have been aware of the limitations of existing monitoring tools onpublic clouds for some time, but our trials with CPU utilization metrics and load aver-ages demonstrated the unreliability of these numbers. The individual metrics for each ofthe seven instances launched in Experiment 1 varied per cloud. The load averages fromOpenStack were zero even when the machine was clearly loaded; it was only when over-loaded that they would produce higher load averages, which resulted in slower reactionsfrom the Management Logic. The data from EC2 had higher peaks, particularly duringbootstrapping. More notably, despite high load averages, they rarely exceeded 20% CPUutilization (Fig. 9.6b).

In contrast with 1-minute load averages (Fig. 9.6a), the 15-min load average offersa better understanding of the overall trend of the system. Figure 9.6c shows this loadaverage for each instance. All of the managed instances trended toward a load average of1.0, the target set by our Management Logic. Much of the difficulty in achieving this loadaverage on EC2 instances hinged on load incurred during bootstrapping. This indicatesthat launching from machine images with the required software pre-installed is importantto effective adaptive management.

Experiment 2: The performance numbers measured indicate an m1.small instancerunning our Management Logic could process over 270,000 metrics per minute;

“9780471697558c09” — 2015/3/20 — 12:00 — page 236 — #20


0

2

4

6

8

1020:5

0

21:0

0

21:1

0

21:2

0

21:3

0

21:4

0

21:5

0

22:0

0

22:1

0

22:2

0

22:3

0

Load

avera

ge 1

m

(a)

Private_1Private_2Public_1Public_2Public_3Public_4Public_5


0

20

40

60

80

100

20:5

0

21:0

0

21:1

0

21:2

0

21:3

0

21:4

0

21:5

0

22:0

0

22:1

0

22:2

0

22:3

0

CP

U u

tiliz

ation, user+

syste

m (

%)

(b)


0

1

2

3

4

5

20:5

0

21:0

0

21:1

0

21:2

0

21:3

0

21:4

0

21:5

0

22:0

0

22:1

0

22:2

0

22:3

0

Load a

vera

ge 1

5m

(c)

Figure 9.6. Performance measurements from instances involved in the autonomic scaling

experiment. (a) One-minute load average (peaks reaching 15–20, 30 for public_2, not shown).

(b) CPU Utilization, for user and system processes. (c) 15-minute load average.

collecting the standard 18 Ganglia core metrics once per minute suggests an abilityto manage 15,000 active instances. This indicates there is currently no need to auto-nomically scale a cluster of containers. There is no strict bound on complexity foralternative Management Logic applications, and so this need may arise in the future ifcomputation-intensive planning and analysis is performed. The scalability of Misure has

“9780471697558c09” — 2015/3/20 — 12:00 — page 237 — #21


been discussed previously [14]; it is similarly capable of handling thousands and evenmillions of metrics.

9.6.4 Case Study

One of the goals of XCAMP is to make systems management painless for developers.This case study illustrates this ability in action. It was noted that in practice, when sev-eral SAVI users deployed applications simultaneously using XCAMP, they observed amajor degradation in performance of the SAVI testbed.25 We used XCAMP as the plat-form for a series of experiments to explore this phenomenon to contribute to the ongoingimprovement of the testbed.

Initial exploratory runs: Using XCAMP’s deployment service, we deployed a three-node Java EE application to the SAVI testbed (note that all three nodes are deployedsimultaneously by default). Once deployed, we dynamically added an additional nodeto the application topology. Once the scale out operation had completed we removedthis additional node by scaling down. Finally, we undeployed the application. We keptmeasurements of how long each stage of this process took. We conducted variations ofthis experiment with various numbers of simultaneous application deployments (1, 2, 5,and 10), each deploying three-node Java EE applications (for between 3–30 VM instantia-tions). Each experiment configuration was run three times. The mean timing results (withstandard deviations) are presented in Figure 9.7a. We observed that the Deploy stage isthe slowest of the four and that it was most impacted by the number of concurrent users;scale-out, which is like deploy but with one instance instead of three, was also impacted.

Examining the deployment stage: We decided to explore the Deploy stage more care-fully by examining the two phases: downloading required files from the PDS on theinternal network, versus downloading and installing software packages from an externalpackage repository. We also performed the same experiments on Amazon EC2 in orderto use the results as a comparator. Figure 9.7c shows a linear increase in download timeas the number of concurrent users increases for both EC2 and the SAVI testbed. How-ever, software installation on EC2 appears to be constant no matter how many concurrentusers there are while on the SAVI testbed a dramatic increase is observed as the numberof concurrent users increases. We hypothesized it might be the network and/or a disk IO-related problem. The network-bottleneck hypothesis is that additional concurrent usersare creating traffic on the network, causing congestion or bandwidth-cap related issues.The IO-bottleneck hypothesis is the increased number of VMs running on a single phys-ical host and performing random read-writes overextends disk resources. We know fromobservation that CPU and memory utilization are not excessive, so we did not createadditional resource-contention hypotheses.

Evaluating hypotheses: To confirm or reject each hypothesis, we designed a finalexperiment. We created a full image containing all required packages for the applica-tion. We compared the time required to deploy this image, versus the time required to

25The testbed implements a two-tier cloud extending OpenStack, with a single core and seven edges distributedacross Canada.

“9780471697558c09” — 2015/3/20 — 12:00 — page 238 — #22


0

500

1000

1500

2000

2500

1 2 5 10

Ela

psed tim

e (

s)

Number of concurrent users

DeployScale-outScale-inUndeploy

(a) (b)

0

400

800

1200

1600

2000

1 2 5 10

Ela

psed tim

e (

s)


Pre-installed image (SAVI)Standard image (SAVI)

(c)

SAVI download filesEC2 download filesSAVI install packagesEC2 install packages

0

200

400

800

600

1200

1000

1400

1 2 5 10

Ela

psed tim

e (

s)


Figure 9.7. Various experiment results for the case study exploring performance of the SAVI

two-tier cloud testbed. (a) Temporal breakdown of deploying and scaling an application on

the SAVI testbed. (b) Comparing the performance of bootstrapping a node versus using a node

with all software pre-installed on the SAVI testbed. (c) A breakdown of performance for two

phases of the Deployment stage on the SAVI testbed and Amazon EC2: downloading files from

the PDS, and installing software packages.

deploy a standard Ubuntu image and bootstrap it (i.e., download and install all requiredsoftware packages from a central repository). A complete image has similar bandwidthrequirements, but can be written to disk with sequential writes (versus random read-write) thus reducing IO load. If deploying the full image is faster than bootstrapping,we would regard the IO-bottleneck hypothesis as confirmed. The results are presented inFigure 9.7b. Notice that the full image outperforms the standard image, suggesting thepresence of an IO-bottleneck.

Using XCAMP in this case study allowed us to easily run and monitor a variety ofexperiment configurations, systematically and repeatedly, to collect and present evidenceof system performance issues in the SAVI testbed.

9.7 CHALLENGES IN MANAGEMENT ON HETEROGENEOUSCLOUDS

Based on our experience designing, implementing, and testing a multicloud adaptivesystem, we offer the following reflections on the particular challenges that apply toadaptively managing heterogeneous clouds.

“9780471697558c09” — 2015/3/20 — 12:00 — page 239 — #23

CONCLUSION 239

Heterogeneous monitoring systems: Many cloud providers offer monitoring servicesto provide information about the performance of provisioned resources; these systemsvary significantly, and typically require relatively detailed information to query (e.g.,instance IDs for Amazon EC2). The state of monitoring in private clouds is even morevaried, with various solutions deployed based on each organization’s whim. Existingcloud abstraction layers largely disregard monitoring, focusing instead on acquiringresources.

Rapid reaction: Automated management requires current and accurate monitoringdata. The ability of an aggregating monitoring service to meet this requirement dependson the timeliness of metrics received from third-party monitoring systems being aggre-gated. The automated manager can be run on the public cloud, which introduces moredelays outside of the control of the monitoring system. It remains an open question howto best ensure timely decisions are made.

Inaccuracy in traditional monitoring techniques: It is generally understood that vir-tualized resources offer more variable performance than bare-metal resources, and thevariance in the performance of Amazon EC2 instances has been benchmarked [35].However, it is less understood that standard monitoring techniques will report inaccurateinformation that can mislead adaptive managers.

In a public cloud environment, the desire to sell fractions of a CPU’s processingpower have removed the meaning of many of these standard metrics. For example, Ama-zon EC2 configures instances using a measure called elastic compute units (ECUs),which they document as a 1.0–1.2 GHz 2007 Opteron or 2007 Xeon processor. OneECU is approximately 40% of a single core of a Intel Xeon CPU E5430 @ 2.66 GHz,a common processor in the first-generation Amazon infrastructure. The default instancesize, small, is 1 ECU; the hypervisor enforces this 1 ECU limit. However, Xen’s par-avirtual mode offers limited ability to abstract the processor for performance reasons, soinstances and Linux kernels running on instances perceive a full core available to them.Xen enforces the limits by refusing access to the CPU if the allotted quota has beenused, which the Linux kernel reports as steal. The exact time spent in steal may varyover time, and in any case can only be measured when the system is operating at capac-ity. An idle machine will report 100% idle time, giving no indication of the actual limitson the CPU. It is not trivial to calculate the actual load on a machine reporting 20% userand 80% idle.

9.8 CONCLUSION

This chapter introduced a framework for managing the life-cycle of applications inmulticloud environments and for conferring autonomic properties on them at runtime.The framework allows the specification of the Management Logic, its deployment andinstantiation, and its execution alongside the managed application. The ManagementLogic runs in a container that is seamlessly connected to XCAMP’s monitoring andexecution engines. XCAMP provides application developers with effortless access tomonitoring sensors, third-party data sources and actuators (i.e., the monitoring and exe-cuting stages of the MAPE-k loop) from across the multicloud, while placing control of

“9780471697558c09” — 2015/3/20 — 12:00 — page 240 — #24


both the analysis and planning stages in their hands and allowing them to express theirmanagement policies in their own vernacular and harnessing all their personal expertise.

We validated XCAMP in multiple ways. First, we demonstrated the ability to elas-tically cloud burst a legacy Java EE application from a private cloud to a public cloudand reported our findings and our experience in automating applications in multicloudenvironments. Additionally, we demonstrated a capability of the XCAMP frameworkto facilitate the diagnosis of a bottleneck on the SAVI (i.e., two-tier cloud) testbed. Wedemonstrated, through experimentation, the capabilities of our design to scale to largesize and for our autonomic manager (i.e., Management Logic) to process massive num-bers of metric updates per minute. Finally, we presented a hands-on tutorial to a groupof approximately 75 SAVI members at the 2013 Annual General Meeting in Toronto,Canada and received positive feedback from the participants.

The XCAMP platform will provide a useful middleware upon which to base muchfuture research for both the SAVI project and other areas of multicloud research. Byallowing developers to harness their vast skill sets, different approaches to manage-ment can be considered with ease and this is a critical benefit we provided through theintroduction of XCAMP.

ACKNOWLEDGMENTS

This research was supported by IBM Centres for Advanced Studies (CAS), Natural Sci-ences and Engineering Council of Canada (NSERC) under the Smart Applications onVirtual Infrastructure (SAVI) Research Network, and Ontario Research Fund under theConnected Vehicles and Smart Transportation (CVST) project.

REFERENCES

1. M. Shtern, B. Simmons, M. Smit, and M. Litoiu, “Navigating the cloud with a MAP,” in13th IFIP/IEEE International Symposium on Integrated Network Management (IM), 2013,pp. 464–470.

2. G. Baryannis, P. Garefalakis, K. Kritikos, K. Magoutis, A. Papaioannou, D. Plexousakis, andC. Zeginis, “Lifecycle management of service-based applications on multi-clouds: a researchroadmap,” in Proceedings of the International Workshop on Multi-Cloud Applications andFederated clouds, 2013, pp. 13–20.

3. N. Loutas, V. Peristeras, T. Bouras, E. Kamateri, D. Zeginis, and K. Tarabanis, “Towards areference architecture for semantically interoperable clouds,” in Cloud Computing Technologyand Science (CloudCom), 2010, pp. 143–150.

4. D. Petcu, C. Craciun, and M. Rak, “Towards a cross platform cloud API,” in CLOSER, 2011,pp. 166–169.

5. D. Bernstein, E. Ludvigson, K. Sankar, S. Diamond, and M. Morrow, “Blueprint for theintercloud—protocols and formats for cloud computing interoperability,” in Proceedings ofthe 2009 Fourth International Conference on Internet and Web Applications and Services,2009, pp. 328–336.

“9780471697558c09” — 2015/3/20 — 12:00 — page 241 — #25

REFERENCES 241

6. R. Buyya, R. Ranjan, and R. N. Calheiros, “Intercloud: Utility-oriented federation ofcloud computing environments for scaling of application services,” in ICA3PP (1), 2010,pp. 13–31.

7. M. Shtern, B. Simmons, M. Smit, and M. Litoiu, “An architecture for overlaying private cloudson public providers,” in 8th International Conference on Network and Service Management,CNSM 2012, Las Vegas, USA, 2012.

8. P. Pawluk, B. Simmons, M. Smit, M. Litoiu, and S. Mankovski, “Introducing STRATOS:A cloud broker service,” in IEEE 5th International Conference on Cloud Computing, 2012,pp. 891–898.

9. J. Kephart and D. Chess, “The vision of autonomic computing,” Computer, vol. 36, no. 1,pp. 41–50, 2003.

10. M. Smit, M. Shtern, B. Simmons, and M. Litoiu, “Partitioning applications for hybrid andfederated clouds,” in CASCON ’12: Proceedings of the 2012 Conference of the Center forAdvanced Studies on Collaborative Research, 2012, pp. 27–41.

11. J. Jenkins, 2011, Presentation from O’Reilly Velocity Conference, http://assets.en.oreilly.com/1/event/60/Velocity%20Culture%20Presentation.pdf. Accessed November 20,2014.

12. M. Smit, M. Shtern, B. Simmons, and M. Litoiu, “Supporting application development withstructured queries in the cloud,” in New Ideas and Emerging Results (NIER) track, Proceedingsof the 2013 International. Conference on Software Engineering (ICSE), 2013.

13. M. Smit, P. Pawluk, B. Simmons, and M. Litoiu, “A web service for cloud metadata,” in IEEECongress on Services, 2012, pp. 24–29.

14. M. Smit, B. Simmons, and M. Litoiu, “Distributed, application-level monitoring of heteroge-neous clouds using stream processing,” Future Generation Computer Systems, vol. 29, no. 8,pp. 2103–2114, 2013.

15. I. Foster, C. Kesselman, and S. Tuecke, “The anatomy of the grid: Enabling scalable virtualorganizations,” International Journal of High Performance Computing Applications, vol. 15,no. 3, pp. 200–222, 2001.

16. M. A. Rappa, “The utility business model and the future of computing services,” IBM SystemsJournal, vol. 43, no. 1, pp. 32–42, 2004.

17. J. Strassner, N. Agoulmine, and E. Lehtihet, “Focale: A novel autonomic networking architec-ture,” International Transactions on Systems Science and Applications (ITSSA), vol. 3, no. 1,pp. 64–79, 2007.

18. L. Baresi, A. Ferdinando, A. Manzalini, and F. Zambonelli, “The cascadas framework forautonomic communications,” in Autonomic Communication, A. V. Vasilakos, M. Parashar,S. Karnouskos, and W. Pedrycz, Eds. New York: Springer US, 2009, pp. 147–168.

19. B. Jennings and R. Stadler, “Resource management in clouds: Survey and research chal-lenges,” Journal of Network and Systems Management, pp. 1–53, 2014.

20. B. Rochwerger, D. Breitgand, E. Levy, A. Galis, K. Nagin, I. M. Llorente, R. Montero,Y. Wolfsthal, E. Elmroth, J. Cáceres, et al., “The reservoir model and architecture for open fed-erated cloud computing,” IBM Journal of Research and Development, vol. 53, no. 4, pp. 4–11,2009.

21. A. J. Ferrer, F. Hernández, J. Tordsson, E. Elmroth, A. Ali-Eldin, C. Zsigri, R. Sirvent,J. Guitart, R. M. Badia, K. Djemame et al., “Optimis: A holistic approach to cloud serviceprovisioning,” Future Generation Computer Systems, vol. 28, no. 1, pp. 66–77, 2012.

“9780471697558c09” — 2015/3/20 — 12:00 — page 242 — #26


22. R. Buyya, C. Yeo, S. Venugopal, J. Broberg, and I. Brandic, “Cloud computing and emerg-ing it platforms: Vision, hype, and reality for delivering computing as the 5th utility,” FutureGeneration Computer Systems, vol. 25, no. 6, pp. 599–616, 2009.

23. M. F. Zhani, Q. Zhang, G. Simona, and R. Boutaba, “Vdc planner: Dynamic migration-awarevirtual data center embedding for clouds,” in 2013 IFIP/IEEE International Symposium onIntegrated Network Management (IM 2013), 2013, pp. 18–25.

24. N. Grozev and R. Buyya, “Inter-cloud architectures and application brokering: Taxonomy andsurvey,” Software: Practice and Experience, Vol. 24: 369–390, 2012.

25. M. Sloman, “Policy driven management for distributed systems,” Jounal of Network SystemsManagement, vol. 2, no. 4, pp. 333–360, 1994.

26. H. Lu, M. Shtern, B. Simmons, M. Smit, and M. Litoiu, “Pattern-based deployment servicefor next generation clouds,” in IEEE Congress on Services. IEEE Computer Society, 2013.

27. T. Eilam, M. Elder, A. Konstantinou, and E. Snible, “Pattern-based composite applicationdeployment,” in 2011 IFIP/IEEE International Symposium on Integrated Network Manage-ment (IM). IEEE, 2011, pp. 217–224.

28. D. C. Verma, Policy-based Networking: Architecture and Algorithms. Indianapolis, In: NewRiders Publishing, 2000.

29. J. Strassner, Policy-based Network Management: Solutions for the Next Generation. Boston,MA: Morgan Kaufmann, 2003.

30. N. Damianou, N. Dulay, E. Lupu, and M. Sloman, “The ponder policy specification language,”in Policies for Distributed Systems and Networks. M. Sloman, E. Lapu, and J. Lobo, Eds.Berlin: Springer, 2001, pp. 18–38.

31. L. Kagal, T. Finin, and A. Joshi, “A policy language for a pervasive computing environment,”in IEEE 4th International Workshop on Policies for Distributed Systems and Networks, 2003.Proceedings. POLICY 2003. IEEE, 2003, pp. 63–74.

32. R. Boutaba and I. Aib, “Policy-based management: A historical perspective,” Journal ofNetwork and Systems Management, vol. 15, no. 4, pp. 447–480, 2007.

33. M. L. Massie, B. N. Chun, and D. E. Culler, “The ganglia distributed monitoring system:design, implementation, and experience,” Parallel Computing, vol. 30, no. 7, pp. 817–840,2004.

34. D. Luckham, The Power of Events: An Introduction to Complex Event Processing in Dis-tributed Enterprise Systems. Boston, MA: Addison-Wesley, 2002.

35. J. Schad, J. Dittrich, and J.-A. Quiane-Ruiz, “Runtime measurements in the cloud: Observing,analyzing, and reducing variance,” Proceedings of the VLDB Endowment, vol. 3, no. 1, 2010.

“9780471697558c10” — 2015/3/20 — 12:01 — page 243 — #1

10

RESOURCE MANAGEMENTAND SCHEDULING

Luiz F. Bittencourt, Edmundo R. M. Madeira, andNelson L. S. da Fonseca

Institute of Computing, State University of Campinas, Campinas,São Paulo, Brazil

10.1 INTRODUCTION

As computer networks have evolved, processing demands have migrated from local com-puting devices to distributed computing environments. In this context, the capacity ofdistributed processing has also progressed from job-based computing to the more user-friendly service-oriented computing. This paradigm shift has been accompanied by anevolution of distributed system architectures: job-oriented cluster computing gives riseto through job- and service-oriented grid computing, and then to service-oriented utilitycomputing, now known as cloud computing [1].

Cloud computing is currently being offered and used by many companies [2]. Forexample, the Amazon Web Services (AWS – http://aws.amazon.com/) offers variousservices for database, e-Commerce, storage, and processing power. Google Applications1

also offers a variety of applications as services, including Google Application Engine(GAE),2 which allows application development to be performed directly in the cloud

1http://www.google.com/apps/2http://code.google.com/appengine/


243

“9780471697558c10” — 2015/3/20 — 12:01 — page 244 — #2

244 RESOURCE MANAGEMENT AND SCHEDULING

through Google’s application programming interfaces (APIs). Other examples areMicrosoft Azure, Salesforce.com, Globus Nimbus, and Eucalyptus [3].

It has been estimated3 that spending on public cloud services represented a mar-ket of US$132 billion in 2013, and that it would exceed US$244 billion by 2017. Insuch a competitive market, resource management is crucial for seizing a significant mar-ket share. A service-oriented distributed environment demands quality of service (QoS),which must be accompanied by cost reduction for both service provider and users. Thisraises new challenges which must be addressed [4, 5].

In the next section, basic concepts related to the management of cloud computingare introduced, followed by a discussion on types of application that can be allocatedto cloud resources, as well as formalization of the cloud system and a description ofthe problems of both application scheduling and virtual machine (VM) allocation. Afterthat, resource management and resource allocation in clouds are discussed, with a focuson infrastructure providers, followed by an overview of techniques for the scheduling oftasks of applications and the allocation of VMs. Challenges and future perspectives arepresented at the end of the chapter.

10.2 BASIC CONCEPTS

Clouds are capable of offering usually virtualized computing resources as dynamicallyscalable services to users over the Internet, without any need to worry about the technicalaspects of resource management. According to the National Institute of Standards andTechnology (NIST), cloud computing can be defined as follows [6]:

Cloud computing is a model for enabling convenient, on-demand network access toa shared pool of configurable computing resources (e.g., networks, servers, storage,applications, and services) that can be rapidly provisioned and released with minimalmanagement effort or service provider interaction.

Cloud providers add resource management layers over computing clusters and gridsto make their infrastructure available as computing services to users, who ideally requireminimal management effort and knowledge to use such infrastructure [7]. These man-agement layers conceal the physical infrastructure from the user through the adoptionof a series of automatic resource management actions. In this section, we introduce anoverview of the terminology of cloud computing as utilized in this chapter as well as ofsome problems in cloud management and resource allocation.

10.2.1 Cloud Service Models

Resource management in clouds must consider the type of computing resources to beoffered as a service. Clouds are usually classified by the service offered; the mostcommon are software as a service (SaaS), platform as a service (PaaS), and infrastructure

3https://www.gartner.com/doc/2598217

“9780471697558c10” — 2015/3/20 — 12:01 — page 245 — #3

BASIC CONCEPTS 245

Application

User managementstack in traditional

software

Management stackat different cloud

levels

Files/DB

Libraries

O.S.

PaaS

laaS

SaaS

VirtualizationProcessing,

storage,network

Software

Application

Files/DB

Libraries

O.S.

VirtualizationProcessing,

storage,network

Software

HW

HW

Figure 10.1. Management stack for different cloud service levels.

as a service (IaaS). In the SaaS model, the consumer simply utilizes an application pro-vided by the cloud provider, having control neither over the application development northe host on which the application is run. Popular examples of this model include GoogleApps and Salesforce.com. The PaaS model makes available a framework in which con-sumers can develop and deploy their own applications in the cloud. Examples of cloudsthat offer this model are Google App Engine and Amazon Web Services. In the Infras-tructure as a Service model (IaaS), a cloud provider offers computing resources andadministrative privileges for users, usually as VMs running on the provider infrastructure,so that the users can control their computing environment, including software devel-opment and deployment. The Amazon Elastic Compute Cloud (Amazon EC2), GlobusNimbus, and the Eucalyptus are examples of this model. Other models also exist, suchas network as a service (NaaS)4 and database as a service (DbaaS) [8].

The levels of the management stack involved in the three types of service can be seen inFigure 10.1. In traditional systems, the client is responsible for managing every layer in thestack, from hardware configuration, including operating system management and appli-cation deployment. For the IaaS service model, the provider is responsible for managingonly the lower layers in the stack, including hardware and software for networking, stor-age, and processing, as well as virtualization technologies to share these resources amongcloud clients. However, the clients are responsible for managing the operating system andits softwares (libraries, middleware), as well as data/databases and applications.

A PaaS provider must perform all the management performed by IaaS providers, aswell as managing the operating system, libraries, software, and middleware, thus offer-ing to the client a development platform over a self-managed execution environment. InSaaS, however, users have no responsability in managing any layer in the hardware/soft-ware stack. They can utilize the software provided and change the software configuration,but cannot change the software nor manage the infrastructure. This makes SaaS easy touse, but harder to customize than PaaS- or IaaS-based cloud services.

A cloud provider itself can also be a client of another provider offering different typeof services. Figure 10.2 illustrates a scenario in which an SaaS (or PaaS) provider relies

4http://www.scaledb.com/

“9780471697558c10” — 2015/3/20 — 12:01 — page 246 — #4


SLA

SLAIaaS

IaaS IaaS

IaaS

SLA SLA

SLA

Service interface

PaaS/SaaS provider

SLA

Private

IaaS

SLA

Figure 10.2. Scenario with SLAs at two levels.

on other IaaS providers to serve its customers. In this case, more than one level of service-level agreements (SLAs) are necessary: one between clients and the SaaS provider, andanother between the SaaS provider and IaaS providers. Clearly, arrangements betweenproviders on more than two service levels are possible, as in the case of various types ofbusiness.

Management problems involving different levels of SLAs must be dealt with, includ-ing the dependencies of the upper layers on the services of lower layers. In order to offera service level guarantee to the user, the cloud provider must receive guarantees fromits lower level providers. Moreover, the cloud provider (the SaaS in Figure 10.2) mustconsider its margin of profit when pricing its services, which depends on the agreementit has with other providers.

10.2.2 Cloud Types

Cloud computing can be classified according to access policy. The classification pre-sented here is generally associated with IaaS, although it can be extended to SaaS andPaaS. Depending on the access to cloud resources, an IaaS provider is classified as public,private, or hybrid:

• Public cloud offers virtualized computational resources as services to any userwho can access that service through the Internet; resources are provided in apay-per-use basis. Public IaaS cloud offers certain advantages because it hascomputational capacity on demand, while avoiding upfront investment in process-ing/storage pools for handling eventual peaks in demand. On the other hand, itdoes not offer controlled access to physical machines and communication chan-nels, which can results in a compromise of the security of critical applications orsensitive data.

“9780471697558c10” — 2015/3/20 — 12:01 — page 247 — #5

BASIC CONCEPTS 247

• Private cloud is more of a virtualized cluster or computational grid which offers amore transparent interface to the user. Usually restricted to a single organization,it can provide fine-tuned performance as well as flexibility. Although it does notcompletely avoid upfront investment, it can be implemented over an existing com-putational infrastructure and prevent further capital investment. Moreover, it offersbetter access control to computer resources, thus improving security, which couldbe critical for the data security of an organization.

• Hybrid cloud combines public and private clouds. This type of cloud allows usersand organizations to keep using their private resources, yet provides access toextended computational capacity when necessary using public cloud resources ona pay-per-use basis. The flexibility in meeting demands for computational capac-ity known as elasticity is fundamental in reducing costs during increased demandsin comparison to fully in-house computing infrastructures.

10.2.3 SLAs and Charging Models

The offering of cloud services is based on SLAs on a pay-per-use basis. In PaaS and SaaS,the charge to users is often based on a variety of criteria such as predefined quantity ofhours of use of storage space and number of I/O requests. The IaaS model, however, isoften more flexible, allowing users to choose the types of resources as well as the modelof charging.

Depending on the SLA established between users and cloud providers, differentmanagement systems will be necessary. In SaaS and PaaS, management should be ableto automatically increase or decrease computing power for a user according to his/herdemands. Monitoring entities, however, are essential to achieve automatic elasticity with-out compromising QoS. Moreover, the ability to increase/decrease computing powerinvolves deploying and/or resizing VMs to cope with demands.

In IaaS, on the other hand, elasticity involves client choice: It is the client whodecides the capacity and number of VMs to be leased. Commonly leased examples ofVMs features include: CPU cores/speed, amount of RAM, amount of storage and accessspeed, and network bandwidth. In this case, management entities are necessary to han-dle VM allocation. They must act according to client demands for different VM types,allocating these requests to physical machines on the basis of the policies defined by theprovider.

Actual VM allocation may depend on the model of charging selected by the client.On-demand VM leasing establishes a price per hour or minute of use, and the user ischarged for each VM from its deployment to release. A reserved model provides a pre-defined price for access to VMs on demand. The spot model, on the other hand, worksas a market, with VM prices varying with demand. In this case, the user offers a pricehe/she is willing to pay for a type of VM, and it can be used as long as the actual priceremains lower than the initial offer. If demands on the provider increase, however, theprice can increase, and the provider has the right to deny access to VMs which, at leastmomentarily, cost more on the market.

“9780471697558c10” — 2015/3/20 — 12:01 — page 248 — #6


10.2.4 Resource Allocation and Scheduling

The perspective of cloud providers and their clients are potentially conflicting in regardto computing resources. On the one hand, the clients want to receive a high QoS for theminimum payment, whereas cloud providers want to charge as much as possible.

Cloud clients who have their own computing resources are interested in makingthe best utilization of their “free” resources, utilizing the public cloud resources onlywhen necessary. They want to maximize the utilization of local resources while mini-mizing the monetary costs for the use of the public cloud, although they are unwillingto sacrifice QoS. In hybrid clouds, the users must schedule applications, with the sched-uler choosing how to schedule the application jobs on both private and public cloudresources on the basis of information of various kinds of processing capacity of the pri-vate and public cloud resources, job computing requirements, public cloud costs, and datatransfer costs.

Cloud providers, on the other hand, must allocate VMs in a way that respects SLAs,yet reduces costs so they can make a profit. This cost reduction comes from shar-ing resources among users and maximizing resource utilization. Thus, fewer physicalmachines are needed; moreover, power consumption can be reduced by turning off idlephysical machines. When the demand increases, the cloud management system mustdecide if new physical machines will need to be turned on to cope with the new VMrequests.

Resource allocation and scheduling are vital to both cloud users and providers, buteach has its own specifics. Different management entities and allocation algorithms arenecessary to make the best use of the cloud from both perspectives.

10.3 APPLICATIONS

Large hardware and networking capacities have leveraged a whole new set of applica-tions, many of which can benefit from cloud computing. To explore private and publiccloud resources to the full extent when running applications, computing resources mustbe efficiently used and resource allocation and scheduling play a fundamental role inthis efficiency. In particular, application scheduling should be able to decide on whichresource each application (or part) should be run, given the demands of the applicationsand resource capacities [9–12].

The decisions made by the scheduler have a direct impact on the QoS of the applica-tion. Taking application characteristics into consideration when allocating resources canlead to a variety of approaches to the problem. These characteristics include cost of thecomputation of jobs, data transfer between jobs, and data source localization, all whichcan be accounted in the scheduler objective function and have important influence on thedecision-making process.

One conceptual difference arising from service-oriented computing is the invocationof services instead of job dispatching. Service invocation assumes an already deployedcode with an interface to be called, and which will remain running after results are deliv-ered so that service can be called again with different parameters. Job dispatching, onthe other hand, involves code that is to be transferred, run, and finished with the resultsdelivered to the user or to another application. This conceptual difference leads to certain

“9780471697558c10” — 2015/3/20 — 12:01 — page 249 — #7

PROBLEM DEFINITION 249

distinct management needs, such as a service repository to control where each serviceis already running and service deployment to transfer and deploy services across theresources. In this chapter, we adopt the term task to refer to a job or a service invo-cation, regardless of whether the service is already running or must be transferred anddeployed.

A user can submit different applications to run in the cloud. The simplest applicationis a single task that must be run and results returned to the user. Such a single task mayor may not have parallel codes; but if it does, they can run on separate processors orcores in the same machine. The user can also submit parallel tasks that can be run inseparate machines, commonly called bag-of-tasks (BoTs), because they can be run withno communication with the other tasks. For example, parameter sweep applications areindependent tasks that can often be parallelized with no constraints, and job runningseveral times but with different input parameters.

Data transfers can have a strong impact on the running time of applications. Thesecan take place between different tasks, between a task and a data source, and between atask and a user. When the tasks of an application are dependent data transfers betweenthese tasks are precedence-constrained by such transfers, forming a workflow. The topo-logical ordering of workflows with dependent tasks that can be represented by a directedacyclic graph (DAG) G = (V,E) with n nodes (or tasks), where ta ∈ V is a work-flow task with an associated computation cost (weight) wta ∈ R

+, ea,b ∈ E represents adependency between ta and tb with an associated communication cost of ca,b ∈ R

+.Various scientific workflows can be represented by DAGs, these include Montage

(Fig. 10.3a; from Ref. [13]); AIRSN (Fig. 10.3b; from Ref. [14]); CSTEM (Fig. 10.3d;from Ref. [15]); LIGO (Fig. 10.3c; from Ref. [16]); and Chimera [17].

Variations in application types can also be found in the literature, including mixesof independent and dependent tasks as in campaign scheduling [18], where one levelof independent tasks must finish running before the next level can start, similarly toa concatenation of fork-join DAGs. In this type of application, the join task may noteven be a computer task, but rather a human-dependent one, such as setting up a newexperiment based on the results of the previous campaign. Other DAG-related varia-tions include applications where the DAG can change itself during execution, due to thepresence of conditional tasks or loops in the DAG specifications, which can generate adifferent number of tasks as a function of input parameters.

Different applications demand different scheduling algorithms and managementapproaches in the cloud. Various such approaches to scheduling applications in cloudsare detailed in this chapter.

10.4 PROBLEM DEFINITION

This section contains a formalization of the system model, the scheduling problem, andthe VM allocation problem in cloud computing are formalized. The two problems mustbe dealt in the resource management of clouds.

Figure 10.4 illustrates how entities that solve these problems act in resource alloca-tion. Scheduling output can direct applications to VMs in four different states: (1) VMsthat are already allocated and running in the private cloud, (2) unallocated VMs in the

“9780471697558c10” — 2015/3/20 — 12:01 — page 250 — #8


(c)

(d)

4.8

0.3 0.6

0.6

231.4 23

2.00.5

6.5

90

5.3

0.4

0.9

6.5

(b)(a)

1

2 2 2 2

3

4

5 5 5 5

6

7

5 5

2 2 2 2

1 1 1 1 1

Figure 10.3. Examples of DAGs representing workflow applications. (a) Montage; (b) AIRSN;

(c) LIGO; (d) CSTEM.

private cloud, (3) allocated VMs in the public cloud, and (4) unallocated VMs in the pub-lic cloud. Tasks scheduled to run on already deployed VMs (either in private or publicclouds) do not have to interact with the VM allocator, as their VMs are already runningon a given server. If, on the other hand, the scheduler decides that new, unallocated VMsare necessary, it must determine which physical machines can be used for these VMsin order to guarantee QoS. Thus, each submission is scheduled independently, and the

“9780471697558c10” — 2015/3/20 — 12:01 — page 251 — #9


Submission interface

Application(DAG, BoT, etc.)

Schedule

VM allocation

Existing VMs

in public

cloud(s)

New VMs in

private cloud

Existing VMs

in public

cloud(s)

New VMs in

public cloud(s)

Figure 10.4. Scheduler and VM allocation in user tasks submission.

scheduler sends the resource allocator a list of VMs to be created. The resource allocatorthen maps these requests to the cloud infrastructure.

10.4.1 Infrastructure

Let I = {i1, . . . , im} be the set of m IaaS providers available to the users. Each cloudprovider i has a set Mi = {mi

1,mi2, . . . ,mi

ni} of physical machines in the system. Each

mik ∈ Mi is a 4-tuple mi

k = {cik, pi

k, qik, di

k}, where cik ∈ N

+ is the number of processingcores, pi

k ∈ M+ the processing capacity of each core, qi

k ∈ N+ is the amount of memory,

and the 2-tuple dik = {di

k,a, dik,s} represents the amount of disk storage, di

k,a, and dataaccess speed, di

k,s. Physical machines in the IaaS i are connected by a set of links Li,where li

h,j ∈ Li, with 1 ≤ h ≤ ni; 1 ≤ j ≤ ni, is the bandwidth in the link between theresources mi

h and mij.

10.4.2 Service-Level Agreements

In IaaS, VMs are offered in accordance to SLA, which provides features of the VM tobe leased. The cloud provider usually defines a set of VM types and SLAs, and the userchooses from these options the number of VMs and their types. The definition of an SLAdepends on the underlying hardware in the data center. In other words, the VM typesoffered by provider i to cloud users rely on Mi and Li. Let Si = {si

1, . . . , sio} be the set

of SLAs offered by IaaS provider i. Each sij ∈ Si is a 7-tuple si

j = {cij, pi

j, qij, di

j , lij, pi

j, oij}

offering different VM QoS for the number of processing cores cij, the processing capacity

of each core (pij), the amount of memory (qi

j), disk storage space and data access speed(di

j = {dij,a, di

j,s}), the bandwidth of the link (lij), as well as price pi

j, and model for chargingoi

j. The lefthand side of Figure 10.5 illustrates the relation between provider infrastructureand SLA in the context of scheduling.

“9780471697558c10” — 2015/3/20 — 12:01 — page 252 — #10


SLAs

andcharging

models

SLAs

and

charging

models

IaaS 1

Application

(DAG, BoT, etc.)

IaaS 2

- 2 type 1 VMs from IaaS 1

for tasks 1, 2, and 7- 2 type 3 VMs from IaaS 2

for tasks 3 and 4- 1 type 1 VM from IaaS mfor tasks 5 and 6

...

IaaS m

SLAs

and

charging

modelsS

chedule

r

Figure 10.5. SLAs offered to the client scheduler rely on the provider infrastructure.

10.4.3 Scheduling

The scheduling problem is commonly defined as a 3-tuple α | β | γ; α describes theexecution environment, and has a single entry; β provides details about the proces-sor characteristics and constraints; and γ describes the objective to be minimized; itfrequently has a single entry [19]. In this chapter, we are interested in the following prob-lem [19]: α = Rm are nonrelated machines running in parallel. There are m machines inparallel, and machine i can process task t at a speed of vi,t. Note that given two tasks t1

and t2 and two resources ri1 and ri

2, vri1,t1

> vri2,t2

�=⇒ vri1,t2

> vri2,t1

, that is, the runningspeeds of different tasks on different resources are unrelated.

Let T = {t1, t2, . . . , tn} be the set of tasks submitted by the users for execution. Thescheduler receives as input the application, T , and the set of SLAs available for eachcloud provider, S = ∪i∈ISi. The scheduler is a noninjective and nonsurjective functionFs : T → S . The scheduler defines a multiset Rvm over S , that is, Rvm = (S, μ) whereμ: S → N>0. The multiset Rvm establishes the number of SLAs needed for each type(i.e., the number and type of VMs to be used in the application schedule). Moreover, thescheduler output includes information about the task sequencing on each VM type. Thisinformation as a whole defines both the number and type of VMs needed as well as thequeue of tasks for each VM. The righthand side of Figure 10.5 provides an example ofscheduler output.

10.4.4 VM Allocation

VM allocation is of paramount importance for the best utilization of physical machines.This allocation must comply with the provider objectives as well as fulfilling QoSrequirements specified in the SLAs.

“9780471697558c10” — 2015/3/20 — 12:01 — page 253 — #11


A VM allocation algorithm of a provider i receives as input the set of VMs, Rvm,to be instantiated and the set of physical machines Mi available in the data center. LetUi = {ui

1, ui2, . . . , ui

ni} represent the utilization in the datacenter, where the 3-tuple ui

k =

{uci

k , uqi

k , udi

k,s} contains the number of resources currently allocated to existing VMs in

resource ik, namely number of cores (uci

k ), amount of memory (uqi

k ), and disk storagespace (udi

k,s). The VM allocation algorithm produces a mapping of each VM from Rvm toa physical machine from Mi.

10.4.5 Optimization Techniques

To optimize the desired objective function, the scheduler and the VM allocator utilizeinformation about the current state of the system to guide the decision on tasks and VMsshould run. Information includes processing power, amount of volatile or nonvolatilememory, and bandwidth. The weight of each of these bits of information depends on theapplication or the objective function, as well as on the types of resources available in thesystem.

Scheduling in general is an NP-Complete problem [19]; therefore, no algorithmthat optimally and deterministically solves the problem in polynomial time is known.Some techniques attempting to approximate the optimal solution with polynomial timecomplexity have been proposed. A few of the more common techniques and the type ofsolutions provided are listed below:

• Heuristics: can produce solutions with low complexity and fast execution time;however, they ocassionally produce solutions that differ significantly from theoptimal one.

• Metaheuristics: can obtain good quality solutions, but they take longer to run.Execution time depends on the stopping condition (e.g., number of iterations)imposed by the programmer/configuration. Moreover, they do not guaranteebounds on the quality of the solution, and local optima are commonly taken asthe final solution.

• In Linear programming, the execution time and solution quality depend on therelaxation of constraints and a reduction in the number of variables in the problem.Heuristics can be adopted to reduce the search space, thus reducing the problemsize so that solutions can be found more rapidly.

• Approximation algorithms with low complexity and reasonable approximationfor generic problems are hard to obtain and involve tools for obtaining tightbounds [20] to the exact solution. Approximation algorithms provide solutions thatguarantee quality bounds at some distance from the optimum. The more genericthe problem specification is, the harder it is to obtain a satisfactory approximation.

Numerous heuristics have been proposed in the literature for scheduling and resourceallocation. An overview of some of these approaches for resource allocation andscheduling in clouds is presented next.

“9780471697558c10” — 2015/3/20 — 12:01 — page 254 — #12


10.5 RESOURCE MANAGEMENT AND SCHEDULING IN CLOUDS

In this section, we describe general solutions for scheduling and resource allocation inclouds. The aim is not to present a complete survey of these problems, but to provide anoverview of existing approaches that can be extended to the cloud context.

10.5.1 Scheduling

As described previously, a scheduler maps tasks from the application submitted by theuser to computational resources in the system. Scheduling in clouds has one importantdifference from scheduling in physical machines: the algorithm must consider the abilityof the system to “create” computer resources as needed, given the elasticity provided bythe cloud. In this section, we examine some well-known algorithms, and discuss howthey could be adapted to function in the cloud.

10.5.1.1 Independent Tasks. There are a handful of algorithms to scheduleindependent tasks in distributed computing systems. Two traditional straightforwardapproaches for the scheduling a set of tasks T = {t1, t2, . . .} are available: the ran-dom and round-robin algorithms. Both work in a first-come first-served (FCFS) basis,with the first task arriving, t1, being the first to be scheduled. They are more suitablefor homogeneous systems. Let Rvm be the set of VMs already rented (i.e., SLAs alreadyestablished). For each task in ti ∈ T , i = 1, 2, . . ., a scheduler random (T ,Rvm) randomlytakes a resource rj ∈ Rvm and sends ti to rj’s queue. A scheduler round-robin (T ,Rvm),on the other hand, first transforms the Rvm into a circular queue, it then takes one taskfrom the incoming queue and sends it to execution on the next rj from the circular queue.No information about the duration of tasks or the capacity of resources is needed for thistype of scheduling. Moreover, knowledge about the tasks queue length is not necessary,since the number of tasks existing in the queue is not taken into account in schedulingdecisions.

From the client’s point of view, both random and round-robin algorithms are directlyapplicable to scheduling in IaaS clouds over a set of already instantiated VMs. In orderto take advantage of the elasticity of the cloud, both would need support from an elas-ticity management entity that decides when to lease new VMs (or release existing ones),dynamically changing Rvm without the interference of the schedulers. Such an entitycould, for example, be invoked when the length of queues for resources execution sur-passes a certain threshold. After that, one possible action would be to reschedule queuedtasks using the same algorithm, or to fill new VM queues up to the size of previouslyexisting VMs before resuming the original scheduling algorithm.

Algorithm 1 illustrates the utilization of a cloud management entity along with ran-dom or round-robin schedulers. Figure 10.6 shows an example scenario: five tasks tobe scheduled using a random or round-robin algorithm, where three VMs are alreadyrented through SLAs established with IaaS providers. Assume that tasks 1, 2, and 3are scheduled to the first, second, and third VMs available, respectively. After that, anattempt is made to allocate task 4 on VM1, but the maximum threshold for the queue(i.e., turnaround time for task 4) is exceeded. Therefore, the elasticity manager is called

“9780471697558c10” — 2015/3/20 — 12:01 — page 255 — #13

RESOURCE MANAGEMENT AND SCHEDULING IN CLOUDS 255

Algorithm 1 Random and round-robin adaptation for clouds1: scheduler Fs = random OR round-robin2: while T �= ∅ do3: t = first task in T4: select r from Rvm following scheduler policy5: if queue(r) > thresholdmax OR queue(r) < thresholdmin then6: Call elasticity management entity to determine new Rvm from S7: Reschedule queued tasks8: end if9: end while

Set of tasks

54321

Queue(r) > threshold

New VM

SLAs

andchargingmodels

- 2 type 1 VMs from IaaS 1- 1 type 3 VM from IaaS 2- New type 2 VM from IaaS 1

54321

Ela

sticity m

anager

Scheduler

Figure 10.6. Example scenario for Algorithm 1. Task 4 exceeds the resource queue threshold,

triggering a call to the elasticity manager to acquire a new VM.

to add a new VM to the pool, establishing a new SLA with an IaaS provider. As a con-sequence, task 4 can be scheduled to this new VM and will finish before the thresholdis reached, and task 5 can also benefit from this new VM set. From this point on, a newVM will be added if new tasks arrive and the threshold is again exceeded.

The round-robin algorithm can be also adapted for use in heterogeneous systems.If relative performance among machines can be established, round-robin algorithm canassign a performance value to each machine, and assign a number of tasks proportionalto this value, and advance in the circular queue. Moreover, both random and round-robinalgorithms can process BoTs as well as sequences of incoming tasks over time (i.e., in an“online scheduling”, in which the whole set of tasks is not known beforehand). For onlinescheduling, another common approach is to schedule the incoming task to the machinethat currently has the shortest queue, in accordance with a specific load-balancingpolicy.

When tasks running time can be estimated, heuristics can utilize this informa-tion jointly with an estimation of machine performance to improve the decision taken.Two of the most well-known scheduling heuristics for BoT are the min–min and

“9780471697558c10” — 2015/3/20 — 12:01 — page 256 — #14


max–min algorithms [21]. Min–min first selects the task which can be completed in theshortest time and schedules it to the machine that can finish it at this earliest time. Thisis repeated until all tasks have been scheduled. Similarly, max–min selects first the taskthat takes longest to run, and schedules it on the machine that can finish it most quickly.Another well-known algorithm is suffrage [22], which establishes a value for a task thatis the difference between its minimum completion time and its possible second minimumcompletion time. Tasks with the maximum suffrage are scheduled first. As with randomand round-robin algorithms, these heuristics can also be adapted following an approachsimilar to that adopted in Algorithm 1. Several other heuristics exist and can be adoptedby the elastic management entity in clouds. A comparison of eleven heuristics for thescheduling independent tasks can be found in Ref. [23].

10.5.1.2 Elasticity Management Entity. Elasticity provided by the use ofcloud resources increases and decreases the capacity available to cope with currentdemands, as shown in Figure 10.7. The available computing power must be as closeas possible to the demand, considering the QoS requirements involved in providing theresource. In this way, the elasticity management entity avoids both over provisioning inlow-demand periods and under provisioning upon peak demand, consequently reducingcosts.

The elasticity entity must be oriented by an objective function, which can either beincorporated into a monitoring system that invokes the elasticity entity, or invoked bythe scheduler itself whenever more resources are necessary. In the first case, the mon-itoring system must detect when the current system load is above the desired capacityfor handling the incoming workload and invoke a decision-maker in the elasticity entityto decide which is the best type of VM to be leased at that time. In the second case,the scheduler output computes the number and types of VM needed to cope with thecurrent workload submitted. In both cases, the elasticity entity is responsible for instan-tiating the necessary VMs in the corresponding IaaS provider and preparing them torun tasks.

Figure 10.8 illustrates the interactions and components of the elasticity manage-ment entity. Such a decision to request more resources will be supported by information

Elastic capacity

Time

Com

puta

tional m

etr

ic

UnderprovisioningOverprovisioningDemand

Figure 10.7. Desired elasticity versus demand in comparison to underprovisioning and over-

provisioning.

“9780471697558c10” — 2015/3/20 — 12:01 — page 257 — #15


VM allocation

Scheduler

Monitoring

RepositoryS

ubm

issio

n inte

rface

public orprivate cloud public or

private cloud

public orprivate cloud

Multicloud API/interface

Elasticity

manager

Figure 10.8. Elasticity manager and its interactions in the management system.

maintained in a repository containing data about all resources currently available(leased), including load and performance, as well as those which can be made available(SLAs offered by IaaS providers). Objective functions commonly found in the literatureinclude the minimization of the application makespan and minimization of monetarycosts [5, 10, 24].

The elasticity manager must also be able to act over multiple IaaS providers in orderto manage VMs. Different providers have different management interfaces, which canlead to a cloud lock-in problem if a single interface is utilized by the elasticity manager.To avoid this problem, standardization efforts, such as the multicloud toolkit of ApacheJClouds5 can be utilized.

10.5.1.3 Dependent Tasks. Dependent tasks, as in directed acyclic graphs(DAGs) and workflows, present a topological order which must be respected by theschedule. Moreover, data transfer costs between tasks must be considered when scat-tering them to the available resources. As a consequence, algorithms for independenttask scheduling are not directly applicable for the scheduling of dependent tasks, unlesssome prior selection of tasks is carried out.

Schedulers for independent tasks can be utilized for DAGs if a ready task selec-tion is performed. A ready task has all predecessors already scheduled. Therefore, byconstruction, a set of ready tasks in a DAG is composed of independent tasks. At anymoment, an independent task scheduler can take a task from the ready set and sched-ule it without violating precedences. By doing so, the scheduling of dependent tasksis transformed into a sequence of the scheduling of independent tasks, as illustrated inAlgorithm 2.

5http://jclouds.apache.org/

“9780471697558c10” — 2015/3/20 — 12:01 — page 258 — #16


Algorithm 2 Dependent task scheduling using algorithms for independent taskscheduling

1: schedulerFs = independent task scheduler (e.g., Algorithm 1 with random or round-robin)

2: G = DAG to be scheduled3: while there exist a not scheduled task ta ∈ G do4: T = set of tasks in G with all predecessors already scheduled5: Call scheduler (T ,Rvm)6: end while

Queue(r) > threshold

New VM

Scheduler

First set of ready tasksSLAs

and

charging

models

6

6

5

5

4

4

3

3

Set of tasks

2

2

1

11 1 1 1 1 1

2 2 2 2

3

4

5 5 5

6

7

5 5 5

2 2 2 2

- 2 type 1 VMs from IaaS 1- 1 type 3 VMs from IaaS 2

- New type 2 VM from IaaS 1- New type 1 VM from IaaS 2

Ela

sticity m

anager

Figure 10.9. Example scenario for Algorithm 2. Ready tasks are selected to be scheduled inde-

pendently using the same technique from Algorithm 1. Tasks 4 and 6 trigger the elasticity

manager in this example.

Figure 10.9 shows an example of a scenario for Algorithm 2. The DAG is brokeninto sets of independent tasks with the first comprising all the tasks at the first level ofthe DAG. These tasks are sent to the scheduler as independent tasks, and then scheduled.The next set of independent tasks is not, however, necessarily composed of all tasks at thesecond level of the DAG, since a subset of the tasks at the first level can finish earlier, andif so a new set of independent ready tasks can be immediately computed and scheduled.The DAG is scheduled as a set of independent tasks, thus the elasticity manager can actin the same way as in the example in Figure 10.6.

The approach presented in Algorithm 2 is used by HTCondor DAGMan.6 The maindrawback of this approach is that tasks are scheduled regardless of their dependencies,since these are not considered during resource selection, even if the scheduler consid-ers resources performance and task duration (e.g., min–min, max–min, and suffrage).In systems such as hybrid clouds and multiple IaaS providers, which resources canbe geographically distant, application characteristics (e.g., high edge density and high

6http://research.cs.wisc.edu/htcondor/dagman/dagman.html

“9780471697558c10” — 2015/3/20 — 12:01 — page 259 — #17


communication-to-computation ratio (CCR) can make data transfer times a dominantpart of the application makespan. If data transfer times are disregarded during scheduling,the results can slow down the application rather than speeding it up.

Special DAG scheduling algorithms have been proposed to consider communica-tion delays during the execution of an application. One technique commonly used forthe scheduling of dependent tasks is list scheduling, in which tasks are first prioritized ina list according to an objective function, and then taken in order of priority for schedul-ing. This prioritization often takes dependencies and running times of tasks into account,resulting in higher priorities for tasks on longer paths in the DAG. One well-known exam-ple of a scheduling list for DAGs is the Heterogeneous earliest finish time (HEFT) [25].HEFT considers heterogeneous tasks in heterogeneous nonrelated systems, and has beenreported to provide good results [26–28]. Moreover, HEFT has been modified by otherauthors [29, 30].

HEFT and similar algorithms are usually focused on the single objective ofmakespan minimization. With the emergence of utility computing and clouds, a varietyof budget-oriented scheduling algorithms have been proposed, including heuristics [31]and meta-heuristics [12]. Makespan minimization may not always be the main issue inscheduling, such as when an application must run within a certain timeframe, althoughspeed is not necessary an issue. By setting a maximum finish time for the application,that is a deadline, the user can control his needs for results coming from applicationsbeing run. In clouds, there is usually a trade-off between cost and makespan: the userpays more for a faster resource to run applications.

Running applications with deadline constraints in clouds (e.g., hybrid clouds and/ormultiple IaaS providers) requires a scheduler that is both makespan- and cost-aware,transforming the scheduling problem into a cost minimization within a maximummakespan. The hybrid cloud optimization cost (HCOC) algorithm [24] approaches thisproblem by scheduling the DAG on the private cloud resources (“costless” resources)by use of a DAG scheduling algorithm (e.g., HEFT [25] or PCH [32]), and then itera-tively selecting tasks to be run in public clouds while the deadline is obeyed. The DAGscheduling algorithm utilized is typically a list scheduling and tasks to be sent to publicclouds are iteratively selected based on their priorities. A generalization of this approachis presented in Algorithm 3.

Figure 10.10 presents an example of scenario for Algorithm 3. First, the DAG isscheduled on the costless resources, and a deadline violation is detected. Task 1 is thehighest priority task, and it is rescheduled during the first iteration of the algorithm.Then, assuming the deadline is not satisfied with this first iteration, in the second iter-ation task 3, which is the task with the second highest priority in this hypothetical listscheduling, is also added to the set T . The final schedule is achieved with tasks 1 and 3on VM3 from a public cloud provider. The elasticity manager is invoked after the dead-line is satisfied in the scheduling, and only after that the VM3 leased through the IaaSprovider interface. Moreover, after task 3 finishes at VM3, it transfers the necessary datato its successor (task 6), and then VM3 can be released.

Lines 8 and 10 in Algorithm 3 can vary depending on the scheduler and policies uti-lized. Different algorithms use different prioritization schemes in line 3, and the sequence

“9780471697558c10” — 2015/3/20 — 12:01 — page 260 — #18


Algorithm 3 Dependent task scheduling in clouds with deadline constraints1: schedulerFs = dependent task scheduler (e.g., HEFT, PCH)2: G = DAG to be scheduled3: Rc = costless VMs available (e.g., in private cloud or grid)4: Schedule G in Rc using scheduler5: T = ∅ //stores tasks to reschedule; initially empty6: while makespanG ≥ deadlineG do7: Rvm = Rc

8: tp = task from G with highest priority in the list scheduling9: T = T ∪ {tp}

10: Rpub = select VMs from public cloud according to a resource selection policyconsidering the set T of tasks to be rescheduled

11: Rvm = Rc ∪ Rpub

12: reschedule tasks in T to Rvm using scheduler13: end while14: Call elasticity management entity to allocate/setup the necessary VMs from Rvm to

compose the hybrid cloud

Deadline

VM1

1

1

1

2

2

24

4

47

7

7

9

8

8

8

9

966

6

3

3

3

5

5

5

Initial schedule in

costless resources

Final schedule in

hybrid cloud

VM2 VM3 VM2VM1Reschedule

Figure 10.10. Example scenario for Algorithm 3. Tasks 1 and 3 are rescheduled to the VM3 in

a public cloud provider.

of tasks rescheduled can be different. In line 8, the resource selection policy can beadapted according to the application and/or system characteristics. For example, HCOC,utilizes a multicore-aware policy to select new VMs to be leased according to the numberof parallel tasks being rescheduled. This multicore-awareness also takes into account theprocessor performance and prices to reduce application running costs.

Yu et al. [33] have also proposed a deadline-driven cost-minimization algorithm.The Deadline-Markov decision process (MDP) algorithm breaks the DAG into parti-tions, assigning a maximum finish time for each partition (subdeadlines) according to

“9780471697558c10” — 2015/3/20 — 12:01 — page 261 — #19


the deadline given by the user. Based on this, each partition is scheduled to the resourcewhich results in both lowest cost and lowest estimated finishing time.

10.5.2 VM Allocation

While cloud clients focus on scheduling applications, providers focus on client requestsfor VM allocation. Thus, IaaS cloud providers (both public and private) are concernedwith the characteristics of the VMs requested by the client (expressed by an SLA),and then allocating such a VM to the available physical machines (e.g., a datacenter),according to a pre-defined objective. Allocation is important in orchestrating VMs in thecomputational infrastructure [34].

Objective functions for VM allocation commonly include maximization of theutilization of physical machines [35, 36], reducing power consumption [36, 37], and min-imization of network traffic [38], as well as increasing security [39]. Allocation decisionsshould also consider the QoS requirements in accordance with the SLAs established withthe cloud clients [40]. To achieve this, the cloud provider can allocate a VM to a selectedphysical machine, but it can also try to migrate VMs already allocated if this wouldsignificantly improve the achievement of objectives. Therefore, the cloud managementsystem must be able to detect when VM migrations are necessary. Such a necessity canarise when users deallocate VMs and leave physical resources partially allocated. Real-locating VMs can improve resource utilization and allow physical machines to be turnedoff to reduce power consumption.

A general view of VM allocation is presented in Algorithm 4. The algorithm receivesa set of VMs to be allocated (Rvm), a set of physical machines available (M), and thecurrent utilization of physical machines (U ). The algorithm allocates all VMs to physicalmachines by first selecting a VM to be allocated, and then selecting a physical machinethat can run this VM. This selection is based on the VM characteristics (according to theSLA), and the characteristics of the physical machine, as well as on the resources cur-rently unallocated in physical machines (i.e., utilization U ). This algorithm also servesas a basis for resource reallocation when a need is detected by monitoring the amount ofunallocated resources of physical machines. In this case, the set of VMs to be allocated,Rvm, would comprise all VMs currently allocated.

Other algorithms that follow the reasoning of Algorithm 4 can work for differentobjective functions. Beloglazov and Buyya focus on energy efficient allocation [37]; they

Algorithm 4 Virtual machine allocation overview1: Rvm = set of VMs to be allocated to the physical machines M2: U = current utilization of physical machines in M3: while there are not allocated VMs in Rvm do4: VM = virtual machine from Rvm //selected according to a heuristic5: r = select_machine (VM, M, U ) //selected according to an objective function6: Update ur: information about resources already allocated to VMs in machine r7: end while8: Turn off idle physical machines

“9780471697558c10” — 2015/3/20 — 12:01 — page 262 — #20


propose heuristics that consolidate virtual machines by constantly calling a VM realloca-tion algorithm, using live migration to switch off underutilized hosts. On the other hand,physical machines with high utilization can also trigger VM migration in order to avoidSLA violations.

VM migration can be triggered as a result of the monitoring of three situations:(1) users switching VM off, (2) users switching VMs on, and (3) monitoring physicalmachine utilization. The first two cases are covered, respectively, by periodically callingthe VM reallocation algorithm when the number of unallocated resources in physicalmachines surpasses a threshold and there are a sufficiently large number of unallocatedresources distributed over the physical machines to handle the new VM requests. In thefirst situation, reallocating VMs will probably lead to switching off physical machines,while in the second, the reallocation can avoid the turning on of new physical machinesto cope with demands. The third situation requires the use of a more sophisticated mech-anism to detect hotspots of utilization in VMs to decide whether it is possible to avoidoverconsolidation [41]. Overconsolidation means that the amount of resources allocatedto VMs on a host exceeds the physical resources available in that host. Thus, if theapplications running in all VMs use all the resources available in the VMs, performancebottlenecks can be created and SLAs will potentially be violated.

10.6 CHALLENGES AND PERSPECTIVES

Cloud computing resource management can be partially handled by adapting techniquesdeveloped for other distributed systems such as grids and clusters. Some new man-agement problems arise from handling any type of data coming from the omnipresentcomputing devices connected to the Internet. Here, we discuss some of the challengesand perspectives during cloud computing management for the next few years.

10.6.1 Scheduler and VM Allocation Cooperation

Currently, resource allocation involves in two separate phases: application schedulingand VM allocation. These two phases are often treated as independent of each other sincea client is uninformed about the underlying physical system to be used for an applica-tion, and the cloud provider has no knowledge about the application requirements whenallocating VMs. One way to improve QoS, as well as resource utilization is to connectapplication scheduling (client) with VM allocation (provider). By feeding the VM allo-cator with information about application requirements, VM allocation algorithms couldconsider the computational/networking demands of applications in each VM, thus beingable to allocate VMs better to physical machines. This need for cooperation may resultin privacy issues and challenges, although these could be resolved in different ways,depending on the relationship between client and provider.

10.6.2 Big Data

In the era of Big Data, large datasets are constantly generated and often accessed and pro-cessed to summarize information. One challenge in this scenario is when/where to move

“9780471697558c10” — 2015/3/20 — 12:01 — page 263 — #21

CHALLENGES AND PERSPECTIVES 263

these large datasets to achieve faster application execution/response time and reducecosts. This involves making decisions about where application (or parts of applications)will be executed, depending on input/output data size and frequency of use. Some tasksmay involve large datasets that are often utilized; in this case, it may be better to leavethis data ready for use in the cloud to prevent incurring costs for transferring the data, topotentially even higher than those of running the task. In other cases, it may be worth-while to remove the task output from the public cloud and regenerate it. This dependson the data size, computational demand for its generation and how often this data is uti-lized, as well as the cost of storage of the data in the cloud. Scheduling algorithms thatconsider data transfer times, data transfer costs, and storage costs in this context will bechallenging yet necessary.

10.6.3 Greeness

A current concern in cloud computing is energy consumption. This can be reduced incloud data centers by energy-efficient hardware design, but data center managementefforts can also play a role. Two main aspects of energy-aware data center manage-ment are VM consolidation and green networking. VM consolidation allows physicalmachines to be turned off while not in use, while green networking techniques allow theswitching off of network equipment, at least or partially, or reduction of power consump-tion by reducing port operating speeds. Both VM consolidation and green networkinginvolve decisions to improve utilization and reduce network usage during allocation.Both profiling or cooperation between VM allocation and schedulers can help to achievea greener usage of the cloud infrastructure.

10.6.4 Scheduling Multiple Workflows

The problem of scheduling a single workflow on clouds has been studied extensively[12, 42–45]. Nevertheless, the scheduler must also handle the concurrent execution ofmultiple workflows this issue has yet been barely considered [46–48]. When multipleworkflows share the same execution environment, they compete for the same set of com-putational resources. In such a situation, there may be conflicts which must be dealt withto guarantee the efficiency of the workflow management system as a whole. For exam-ple, it is important that the execution of a workflow fulfil the objective function of that,but also that the agreed upon QoS be guaranteed. Therefore, besides coping with datasetmanagement for each workflow, a scheduling algorithm should consider fairness in thesharing of resources as equally as possible among workflows. Moreover, datasets uti-lized/generated by one workflow may be reutilized by other workflows to be run withina limited timeframe. The decision on when to maintain or remove such datasets from thecloud will have an impact on workflow execution time and costs.

10.6.5 Hybrid Clouds and Uncertainty

Hybrid cloud management by itself is a challenge, as discussed in this chapter. Onecomplicating factor is the uncertainty present in the public communication channels

“9780471697558c10” — 2015/3/20 — 12:01 — page 264 — #22


traversing the Internet and interconnecting the hybrid cloud components. A decision tocompose a hybrid cloud is based on current computing demands, while the distribu-tion of applications in that cloud must take into account data transfers to/from publicclouds. Estimation of data transfer time is strongly related to the available bandwidth,which cannot be precisely predicted for the application execution horizon. Often, band-width is inaccurate, as is the bandwidth availability predicted. The uncontrollable andunpredictable of bandwidth variation in public Internet channels makes the applicationexecution times in hybrid clouds prone to variation. These uncertainties in the estima-tion of bandwidth availability in communication channels must be considered by anapplication scheduler when deciding whether to lease VMs from public clouds to runapplications, specially dependent tasks and applications involving large data sets.

10.7 CONCLUSION

Resource management and scheduling in cloud computing can be seen from two per-spectives: viewpoint of the client and that of the provider. While the client is focusedon running his/her applications with the best possible QoS and lower costs, the provideris willing to provide these services for the client with the agreed on QoS. Applicationscheduling and management by the cloud user involve resource management, with theuser responsible for keeping track of the resources leased and current demands to deter-mine if new machines must be leased to maintain QoS or if the currently leased machinescan be released to reduce costs. VM management also encourages cost reduction bymaximizing the utilization of physical infrastructure. The intrinsic conflict in these twoobjectives brings independent challenges to both entities. Cooperation between the twoparties can, however, result in a gain–gain scenario, where application information couldbe explored by VM allocation to improve QoS while reducing costs.

In this chapter, we have provided an overview of aspects and requirements inresource management and application scheduling. We have discussed how these twoframeworks can be handled in the cloud computing context and presented a summarizingpromising research topics. Both a cloud computing model and a resource managementmodel were presented along with VM allocation and scheduling issues. Moreover, wehave discussed how existing resource allocation approaches can be extended to incorpo-rate the elasticity intrinsic to cloud computing. A brief discussion of challenging aspectsfacing further developments in cloud computing research is also presented.

REFERENCES

1. M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson,A. Rabkin, I. Stoica, and M. Zaharia. A view of cloud computing. Communications of theACM, 53:50–58, 2010.

2. A. Khajeh-Hosseini, I. Sommerville, and I. Sriram. Research challenges for enterprise cloudcomputing. CoRR, abs/1001.3257, 2010.

“9780471697558c10” — 2015/3/20 — 12:01 — page 265 — #23

REFERENCES 265

3. D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, and D. Zagorod-nov. The Eucalyptus open-source cloud-computing system. In 9th IEEE/ACM InternationalSymposium on Cluster Computing and the Grid (CCGRID), pages 124–131, 2009.

4. Q. Zhang, L. Cheng, and R. Boutaba. Cloud computing: state-of-the-art and research chal-lenges. Journal of Internet Services and Applications, 1(1):7–18, 2010.

5. L. F. Bittencourt, E. R. Madeira, and N. L. Da Fonseca. Scheduling in hybrid clouds.Communications Magazine, IEEE, 50(9):42–47, 2012.

6. P. Mell and T. Grance. The NIST Definition of Cloud Computing. Technical Report, NationalInstitute of Standards and Technology (NIST), 2009.

7. I. Foster, Y. Zhao, I. Raicu, and S. Lu. Cloud computing and grid computing 360-degreecompared. In Grid Computing Environments Workshop, 2008. GCE ’08, pages 1 –10,2008.

8. P. Costa, M. Migliavacca, P. Pietzuch, and A. L. Wolf. NaaS: Network-as-a-service in thecloud. In Proceedings of the 2nd USENIX conference on Hot Topics in Management ofInternet, Cloud, and Enterprise Networks and Services, Hot-ICE, 2012.

9. M. de Assunção, A. di Costanzo, and R. Buyya. A cost-benefit analysis of using cloudcomputing to extend theÂcapacity of clusters. Cluster Computing, 13:335–347, 2010.10.1007/s10586-010-0131-x.

10. R. Van den Bossche, K. Vanmechelen, and J. Broeckhove. Cost-optimal scheduling in hybridIaaS clouds for deadline constrained workloads. In 2010 IEEE 3rd International Conferenceon Cloud Computing (CLOUD), pages 228–235, 2010.

11. C. Vecchiola, R. N. Calheiros, D. Karunamoorthy, and R. Buyya. Deadline-driven provision-ing of resources for scientific applications in hybrid clouds with aneka. Future GenerationComputer Systems, 2011.

12. S. Pandey, L. Wu, S. Guru, and R. Buyya. A particle swarm optimization-based heuristic forscheduling workflow applications in cloud computing environments. In 2010 24th IEEE Inter-national Conference on Advanced Information Networking and Applications (AINA), pages400 –407, 2010.

13. E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B.Berriman, J. Good, A. Laity, J. C. Jacob, and D. S. Katz. Pegasus: A framework for map-ping complex scientific workflows onto distributed systems. Scientific Programming Journal,13(3):219–237, 2005.

14. Y. Zhao, J. Dobson, I. Foster, L. Moreau, and M. Wilde. A notation and system for express-ing and executing cleanly typed workflows on messy scientific data. SIGMOD Records,34(3):37–43, 2005.

15. A. Dogan and F. Özgüner. Biobjective scheduling algorithms for execution time-reliabilitytrade-off in heterogeneous computing systems. Computer Journal, 48(3):300–314, 2005.

16. A. Ramakrishnan, G. Singh, H. Zhao, E. Deelman, R. Sakellariou, K. Vahi, K. Blackburn,D. Meyers, and M. Samidi. Scheduling data-intensive workflows onto storage-constraineddistributed resources. In CCGRID ’07: Proceedings of the Seventh IEEE InternationalSymposium on Cluster Computing and the Grid, pages 401–409, 2007.

17. J. Annis, Y. Zhao, J. Voeckler, M. Wilde, S. Kent, and I. Foster. Applying Chimera virtual dataconcepts to cluster finding in the Sloan Sky Survey. In Supercomputing ’02: Proceedings ofthe 2002 ACM/IEEE Conference on Supercomputing, pages 1–14, 2002.

18. V. Pinheiro, K. Rzadca, and D. Trystram. Campaign scheduling. 2012 19th InternationalConference on High Performance Computing, pages 1–10, 2012.

“9780471697558c10” — 2015/3/20 — 12:01 — page 266 — #24


19. M. L. Pinedo. Scheduling: Theory, Algorithms, and Systems. Springer, New York, 2008.

20. V. V. Vazirani. Approximation Algorithms. Berlin: Springer, 2004.

21. O. H. Ibarra and C. E. Kim. Heuristic algorithms for scheduling independent tasks onnonidentical processors. Journal of ACM, 24(2):280–289, 1977.

22. M. Maheswaran, S. Ali, H. Siegal, D. Hensgen, and R. Freund. Dynamic matching andscheduling of a class of independent tasks onto heterogeneous computing systems. In Het-erogeneous Computing Workshop, 1999. (HCW ’99) Proceedings. Eighth, pages 30–44,1999.

23. T. D. Braun, H. J. Siegel, N. Beck, L. L. Bölöni, M. Maheswaran, A. I. Reuther, J. P. Robertson,M. D. Theys, B. Yao, D. Hensgen, et al. A comparison of eleven static heuristics for mappinga class of independent tasks onto heterogeneous distributed computing systems. Journal ofParallel and Distributed Computing, 61(6):810–837, 2001.

24. L. F. Bittencourt and E. R. M. Madeira. HCOC: A cost optimization algorithm for workflowscheduling in hybrid clouds. Journal of Internet Services and Applications, 2(3):207–227,2011.

25. H. Topcuoglu, S. Hariri, and M.-Y. Wu. Performance-effective and low-complexity taskscheduling for heterogeneous computing. IEEE Transactions on Parallel and DistributedSystems, 13(3):260–274, 2002.

26. H. Zhao and R. Sakellariou. An experimental investigation into the rank function of the het-erogeneous earliest finish time scheduling algorithm. In Euro-Par 2003 Parallel Processing,pages 189–194, 2003.

27. M. Wieczorek, R. Prodan, and T. Fahringer. Scheduling of scientific workflows in the askalongrid environment. ACM SIGMOD Record, 34(3):56–62, 2005.

28. H. Arabnejad and J. Barbosa. List scheduling algorithm for heterogeneous systems by anoptimistic cost table. IEEE Transactions on Parallel and Distributed Systems, 25(3):682–694,2014.

29. F. Suter, F. Desprez, and H. Casanova. From heterogeneous task scheduling to heterogeneousmixed parallel scheduling. In Euro-Par 2004 Parallel Processing, pages 230–237, 2004.

30. L. F. Bittencourt, R. Sakellariou, and E. R. M. Madeira. DAG scheduling using a lookaheadvariant of the heterogeneous earliest finish time algorithm. In 2010 18th Euromicro Inter-national Conference on Parallel, Distributed and Network-Based Processing (PDP), pages27–34, 2010.

31. R. Sakellariou, H. Zhao, E. Tsiakkouri, and M. D. Dikaiakos. Scheduling workflows withbudget constraints. In Integrated Research in GRID Computing, pages 189–202, 2007.

32. L. Bittencourt and E. Madeira. Hcoc: A cost optimization algorithm for workflow schedulingin hybrid clouds. Journal of Internet Services and Applications, 1–21, 2011. 10.1007/s13174-011-0032-0.

33. J. Yu, R. Buyya, and C. K. Tham. Cost-based scheduling of scientific workflow applicationson utility grids. In e-Science and Grid Computing, pages 140–147, 2005.

34. B. Sotomayor, R. S. Montero, I. M. Llorente, and I. Foster. Virtual infrastructure managementin private and hybrid clouds. Internet Computing, IEEE, 13(5):14–22, 2009.

35. J. Xu and J. A. B. Fortes. Multi-objective virtual machine placement in virtualized data centerenvironments. In Green Computing and Communications (GreenCom), 2010 IEEE/ACM Int’lConference on Int’l Conference on Cyber, Physical and Social Computing (CPSCom), pages179–188, 2010.

“9780471697558c10” — 2015/3/20 — 12:01 — page 267 — #25

REFERENCES 267

36. D. G. d. Lago, E. R. M. Madeira, and L. F. Bittencourt. Power-aware virtual machine schedul-ing on clouds using active cooling control and dvfs. In Proceedings of the 9th InternationalWorkshop on Middleware for Grids, Clouds and e-Science, MGC ’11, pages 2:1–2:6, 2011.

37. A. Beloglazov and R. Buyya. Energy efficient allocation of virtual machines in cloud data cen-ters. In 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing(CCGrid), pages 577–578, May 2010.

38. X. Meng, V. Pappas, and L. Zhang. Improving the scalability of data center networks withtraffic-aware virtual machine placement. In INFOCOM, 2010 Proceedings IEEE, pages 1–9,2010.

39. D. Stefani Marcon, R. Ruas Oliveira, M. Cardoso Neves, L. Salete Buriol, L. Gaspary, andM. Pilla Barcellos. Trust-based grouping for cloud datacenters: Improving security in sharedinfrastructures. In IFIP Networking Conference, 2013, pages 1–9, 2013.

40. T. Wood, P. J. Shenoy, A. Venkataramani, and M. S. Yousif. Black-box and gray-box strategiesfor virtual machine migration. In USENIX Symposium on Networked Systems Design andImplementation (NSDI), 7, 17–17, 2007.

41. C. Hyser, B. Mckee, R. Gardner, and B. J. Watson. Autonomic virtual machine placement inthe data center. Hewlett Packard Laboratories, Technical Report HPL-2007-189, 2007.

42. T. A. L. Genez, L. F. Bittencourt, and E. R. M. Madeira. Workflow scheduling for SaaS /PaaS cloud providers considering two SLA levels. In IEEE/IFIP Network Operations andManagement Symposium - NOMS 2012. IEEE, 2012.

43. C. Lin and S. Lu. Scheduling scientific workflows elastically for cloud computing. In 2011IEEE International Conference on Cloud Computing (CLOUD), pages 746–747, 2011.

44. K. Bessai, S. Youcef, A. Oulamara, C. Godart, and S. Nurcan. Bi-criteria workflow tasksallocation and scheduling in cloud computing environments. In IEEE 5th InternationalConference on Cloud Computing (CLOUD), pages 638–645, 2012.

45. W.-N. Chen and J. Zhang. A set-based discrete pso for cloud workflow scheduling with user-defined qos constraints. In IEEE International Conference on Systems, Man, and Cybernetics(SMC), pages 773–778, 2012.

46. H. Zhao and R. Sakellariou. Scheduling multiple DAGs onto heterogeneous systems. In 20thInternational Conference on Parallel and Distributed Processing (IPDPS), 2006.

47. L. F. Bittencourt and E. R. M. Madeira. Towards the scheduling of multiple workflows oncomputational grids. Journal of Grid Computing, 8(3):419–441, 2010.

48. R. F. da Silva, T. Glatard, and F. Desprez. Workflow fairness control on online and non-clairvoyant distributed computing platforms. In 19th International Conference on ParallelProcessing (Euro-Par), pages 102–113, Springer-Verlag, 2013.

“9780471697558c10” — 2015/3/20 — 12:01 — page 268 — #26

“9780471697558c11” — 2015/3/20 — 12:02 — page 269 — #1

11

CLOUD SECURITYTianyi Xing1, Zhengyang Xiong1, Haiyang Qian2, Deep Medhi3,

and Dijiang Huang4

1School of Computing, Informatics, and Decision systems Engineering, ArizonaState University, Tempe, AZ, USA

2China Mobile Technology, Milpitas, CA, USA3Computer Science and Electrical Engineering Department, University of

Missouri-Kansas City, Kansas City, MO, USA4School of Information Technology and Engineering, Arizona State University,

Tempe, AZ, USA

Security is one of the highest concerns with cloud-based services. Intrusion detectionand prevention systems (IDPS) have been widely deployed to enhance cloud security.The use of software-defined networking (SDN) approaches to enhance system secu-rity in a virtualized cloud networking environment has been recently presented [1, 2].These approaches incorporate IDS/IPS agents in cloud servers by reconfiguring the cloudnetworking environment on-the-fly to counter malicious attacks. However, the perfor-mance and feasibility studies have not been well investigated. In this chapter, we providea comprehensive study on the existing cloud security solutions and analyze its chal-lenges and trend. Then we present an OpenFlow-based IDPS solution, called FlowIPS,that focuses on the intrusion prevention in the cloud virtual networking environment.FlowIPS is a software-based approach that implements SDN-based control functionsbased on Open vSwitch (OVS). FlowIPS provides network reconfiguration (NR) features


269

“9780471697558c11” — 2015/3/20 — 12:02 — page 270 — #2

270 CLOUD SECURITY

by programming POX controllers to enable the FlowIPS mitigation approaches. Finally,the performance evaluation of FlowIPS demonstrates the feasibility of the proposedsolution, which is more efficient compared to traditional IPS approaches.

11.1 INTRODUCTION

11.1.1 Cloud Network Security Issues

Cloud computing technologies have been widely adopted today due to its resourceprovisioning capabilities, such as scalability, high availability, efficiency, and so on.However, security is one of the critical issues [3] that have not been fully addressed.Attackers may compromise vulnerable virtual machines (VMs) to form botnets, and thendeploy distributed denial-of-service (DDoS) attacks or send spams, which have become amajor security concern of using cloud services. We highlight four critical cloud networksecurity issues:

1. Abuse and Nefarious Use of Cloud Computing: IaaS providers offer their cus-tomers the illusion of unlimited compute, network, and storage capacity. Byabusing the relative anonymity behind these registration and usage models, spam-mers, malicious code authors, and other criminals have been able to conduct theiractivities with relative impunity. Platform-as-a service (PaaS) providers have tra-ditionally suffered most from this type of attacks; however, recent evidence showsthat hackers have begun to target infrastructure-as-a service (IaaS) vendors aswell [4]. Future areas of concern include password and key cracking, launchingdynamic attack points, hosting malicious data, botnet command and control, andso on.

2. Malicious Insiders: The threat of a malicious insider is well known to mostorganizations. In traditional computer networking systems, security protectionis usually deployed at the edge of the system, for example, the firewall system.However, an attacker can break the firewall or DMZ and get access into the inter-nal network, these attack consequences can be very servere. Since all resourcesin the same domain is trusted among each other by default, insider attacks cancause more damage than outsider attacks.

3. Data Integrate: Storage is one of the most important and common scenarios inclouds. Therefore, compromising stored data, for example, deletion or alterationof records without a backup of the original content, becomes another critical secu-rity issue in clouds. The authentication and authorization of the data must securelyguarantee that unauthorized or unauthenticated parties must be prevented fromgaining access to privacy data. The threat of data compromise increases in thecloud, due to the number of and interactions between risks and challenges thatare either unique to cloud, or more dangerous because of the architectural oroperational characteristics of the cloud environment.

4. Virtualization Hijacking: One of the significant characteristics of the cloudcomputing is the virutalization, which enables better resources utilization andfine-grained resource isolation. IaaS vendors provide their services by sharing

“9780471697558c11” — 2015/3/20 — 12:02 — page 271 — #3

INTRODUCTION 271

the physical infrastructure in a scalable fashion. However, the underlyingcomponents building up the infrastructure (e.g., CPU and GPU) were not ded-icated designed to deliver strong isolation capability in a multi-tenant environ-ment. To address this issues, hypervisor is designed and introduced to fill the gapbetween the physical infrastructure and guest operating system. However, theexisting hypervisor is not flawless and can still be compromised in that it enablesusers to gain access to inappropriate level of control to guest OS. A defense indepth strategy is recommended, and should include compute, storage, and net-work security enforcement and monitoring. Strong compartmentalization shouldalso be employed to guarantee that individual customers do not impact the oper-ations of other tenants running on the same cloud service provider. Customersshould not have access to any other tenant’s actual or residual data, networktraffic, and so on.

11.1.2 Cloud Security Approach Design Challenges

Here, we describe two major design requirements that should be considered to establisha secure cloud networking environment:

1. Robust Network Architecture Design: Before building the cloud system, a robustsecurity system design is highly desired. The following criteria should be fol-lowed when designing the cloud network architecture:

• Network isolation should be provided for multiple purposes. For example, datanetworks should be separated from the management network because it is notsecure to make users have privilege to access the cloud management network.Moreover, different networks needs to be separated from each other physicallyby using different network devices, e.g., switch, or virtually by deploying thenetwork virtualization technology, such as VLAN, GRE tunnel, and so on.

• The system should allocate sufficient resources based on the usage of systemcomponents. For example, storage network usually should be allocated withmore bandwidth than management network because only control messages aresent over management network while Gigabit-sized VM images may be trans-mitted over the storage network. Besides network resources, host resourcesshould be also considered. For example, the rabbitMQ, that is, the messagequeuing system, should be allocated with better resources due to its higher pro-cessing workload than other servers, otherwise, it will become the bottleneckand introduce vulnerability for the whole system.

• They system should be enabled with high availability (HA), for example, redun-dant or backup, to avoid the single point (link) failure. The HA can be reflectedin either network or host perspective. It is recommended to enable the HA forthe services that especially can be directly accessed by users.

The challenge of building a robust network architecture is that there cannotbe a perfect system design, which means, system architect or administrator canonly design a near perfect system and always have other security solutions toprevent the system from being impact by any possible malicious behavior.

“9780471697558c11” — 2015/3/20 — 12:02 — page 272 — #4

272 CLOUD SECURITY

2. Intrusion Detection and Prevention System (IDS/IPS): An intuitive solutionto address the cloud security issues is to deploy an IDPS (IDS/IPS), forexample, Snort [5], Suricata [6], and so on. Detecting and alerting natures ofIDS solutions demand the human-in-the-loop to inspect the generated securityalerts, which cannot respond to attacks in a prompt fashion. Recently, the SDNtechnologies provide a programmable networking environment, which enablesthe IPS to become a key research area in the cloud automated defensive mech-anism. In general, the IPS can be constructed based on IDS. For instance, Snortcan be configured as inline mode and work with a common firewall system, forexample, Iptables, to implement the IPS in the cloud networking environment [7].However, there are several issues in the Snort+Iptables based IPS system, and ourpresented solutions target at addressing these issues:

• Latency: The IPS detection engine usually uses a buffer to queue incomingpackets for inspection purpose, and a packet will be dropped when the incom-ing packets exceeds the buffer’s capacity. This mechanism ensures the IPS forpackets inspection and possible blocking actions on each network packet. IPSusually consumes more cloud system resources compared to IDS, and it alsoincreases the packet delivery delay due to the packets inspection procedure.

• Resource Consumption: Enabling new services in the system will consumemore resources and downgrade the system performance. For the service thatis highly interactive with all the network traffic generated in the cloud virtualnetworking system, resources utilization becomes very critical since the secu-rity services availability depends on it. Under the same hardware resources, theone with better processing capability, for example, detection rate, has betterresources consumption performance.

• Network Reconfigurations: Programmable virtual networking system in thecloud environment provides the IPS a flexible way to reconfigure the virtualnetworking system and provide a secure traffic inspection and control. Howto incorporate the deep-packet inspection (DPI) with fine-grained traffic con-trol in the cloud virtual networking environment to reduce the intrusiveness tonormal traffic is a key research challenge.

11.1.3 Arrangement of the Book Chapter

In the rest of this book, chapter is organized as follows. Section 11.2 discusses the techni-cal background of the SDN (i.e., OpenFlow) and intrusion detection system. Section 11.3presents the existing solutions of the cloud security. Section 11.4 disscusses the trans-formation from the existing cloud security solutions to the next-generation SDN-basedsolutions. The FlowIPS design and process flow are presented in Section 11.5. TheFlowIPS is compared with traditional Snort/Iptables IPS from principle perspective inSection 11.6. NR is proposed based on the proposed architecture in Section 11.7. Thethorough evaluation is conducted in Section 11.8. Finally, the future work is discussedin Section 11.9 and this book chapter is concluded in Section 11.10.

“9780471697558c11” — 2015/3/20 — 12:02 — page 273 — #5

TECHNICAL BACKGROUND 273

11.2 TECHNICAL BACKGROUND

In this part, we will discuss the technical background of the SDN and IDPS that will beutilized in the proposed SDN-based cloud security solutions.

11.2.1 Software Defined Networking

SDN is a new concept to evolve traditional networking technologies by separating thecontrol plane and data plane. OpenFlow is the most representative protocol implementingthe SDN concept to manage SDN enabled devices, and defines standard control interfacesimplement packet-forwarding rules in OpenFlow switch (OFS)’s flow tables, which canhandle data packets in line rate. As shown in Figure 11.1, OpenFlow introduces a central-ized and separate controller and defines standard interfaces to the controller for installingthe packet-forwarding rules in the flow table, which can rapidly handle incoming packets.In the OpenFlow architecture, a controller executes all control tasks of the switches andis also used for deploying new networking frameworks, such as new packet forwardingprotocols or optimized cross-layer packet-switching algorithms. When a packet arrivesat an OFS, the switch processes the packet in the following three steps:

1. It checks the header fields of the packets and attempts to match any entry in thelocal flow table. If there is no any matching entry in the flow table, the packetwill be sent to the controller for further processing, for example, installing a flow

OpenFlow protocolController

Host 1

Host 2

Host m

Priority

In portEthernet IP

src src srcdst dst dstProtocoltypePortVLAN

ID

Counters Instructions Timeouts CookieMatch

fields

SSL

Secure channel

Po

rt 0P

ort 1

Port m

Flow entry 1Flow entry 2Flow entry 3

Flow entry n

Sw

itch p

orts

Figure 11.1. OpenFlow architecture.

“9780471697558c11” — 2015/3/20 — 12:02 — page 274 — #6

274 CLOUD SECURITY

table rule for forward this traffic flow in the future. There can be multiple flowcontrol rules in the controller. It follows a best matching procedure to pick thebest rule.

2. It then updates the byte and packet counting information associated with the rulesfor statistic logging purposes.

3. Once a matching rule is decided, the OFS takes the action based on the corre-sponding flow table entry, for example, forward to a specific port, or drop.

OFS separates the control plane and data plane by virtualizing the network control asa network OS layer. The network controller is considered as the software engine to deploythe control functions that can be implemented through automatic control algorithms.With these features, dynamic NR can be implemented in the cloud virtual network-ing environment. There are several OFS controllers available following the OpenFlowstandard, such as NOX/POX [8]. Both OVS and OFS are OpenFlow protocol enabledswitches. OVS is implemented as a software OFS in the cloud environment, where OVSis usually implemented in the privilege domain of a cloud server, for example, Domain0 of XenServer [9] and the host domain of KVM [10]; while we use OFS to representphysical OFS.

11.2.2 Intrusion Detection and Prevention System

Snort is a multi-mode packet analysis IDS/IPS tool, which basically consists of sniffer,packet logger, and data analysis tools. In its detection engine, rules form signature tojudge if the detected behavior is a malicious behavior or not. It has both host and network-based detection engines, and it has a wide range of detection capabilities including stealthscans, OS fingerprinting, buffer overflows, back doors, and so on. Network IntrusionDetection System (NIDS) mode has been widely used and focuses only on detection;thus, the action to be taken when the rules are matched are usually log or alert, withoutdisabling the ongoing attacks. The combination of Snort and Iptables is the most commonway to implement the Snort IPS mode that is also known as inline mode [11]. The IPSmode is different from IDS mode in that the IPS can prevent the attacks from happening inaddition to intrusion detection. As mentioned in Section 11.1, one of the main challengesof IPS is the performance issue since Snort serves as a traffic proxy and every packetneeds to wait until Snort tells if it is safe to pass or block. In Section 11.8, we will bediscussing the performance issues of snort under IDS mode and IPS mode.

11.3 EXISTING SOLUTIONS

The related and representative security solutions are discussed in the following way:we first discuss the traditional security solution for general attacks in cloud environ-ment, and then investigate the general SDN-based security solutions. We then investigatethe solutions addressing the security issues of SDN itself. At last, we discuss currentSDN-enabled IDS/IPS solutions that is highly related to the proposed SDN-based IPS inthis book chapter.

“9780471697558c11” — 2015/3/20 — 12:02 — page 275 — #7

EXISTING SOLUTIONS 275

11.3.1 Traditional Non-SDN Solutions

IDS and IPS are traditional efforts to monitor and secure cloud computing system. Asan efficient security appliance, IDS can utilize signature-based, statistical-based, andstateful protocol analysis method to achieve its detection capability. Furthermore, IPScan respond detected threats by triggering variety prevention actions to tackle mali-cious activities. In Ref. [12], the authors introduce an effective management model tooptimally distribute different NIDS and NIPS across the whole network. This work dif-fers from single-vantage viewpoint NIDS/NIPS placement, it is a scalable solution fornetwork-wide deployment.

A stateful intrusion detection system is introduced in Ref. [13]. This paper appliesa slicing mechanism to divide overall traffic into subsets of manageable size and eachsubset contains enough evidences to detect a specific attack. As a distributed networkdetection system, after the system is configured, it is not easy to reconfigure and migratethe detection sensors for users on demand.

In Ref. [14], the authors propose a Host-based intrusion detection system thatdeploys IDS to each host in the cloud computing environment. This design enablesbehavior-based technique to detect unknown attacks and knowledge-based one to iden-tify known attacks. However the data are captured from system logs, services, and nodemessages, this system cannot detect any intrusion which running on the VM. To applyVM compatible IDS, another architecture is provided in Ref. [15]. It extends the capa-bility of typical IDS to incorporate VM-based detection approach to the system. Theprovided IDS management can detect, configure, recover the VMs, and prevent VMsfrom visualization layer threats. However, this is not a lightweight solution, and multipleIDSs instances are needed to build this system.

FireCol [16] is a dedicated flooding DDoS attacks detection solution implementedin traditional network system. In this design, IPSs are distributed in the network to form avirtual protection rings with selected traffic exchange to detect and defend DDoS attacks.This collaborative system addresses the hardly detection problem and single IDS/IPScrashing problem under overwhelming traffic. However, this method is not a lightweightsolution such as [17], and the flexibility and dynamism is limited in this system and thedeployment and management is complicated.

In Ref. [18], the authors propose a dynamic resource allocation strategy to counterDDoS attacks against individual cloud customers. When a DDoS attack occurs, theyemploy the idle resources of the cloud to clone sufficient intrusion prevention servers forthe victim in order to quickly filter out attack packets and guarantee the quality of theservice for benign users simultaneously. However, this paper focused on how to allocatedidle resource for IPS but did not discuss how the IPS prevent the DDoS attack.

Similar to the FireCol, [19] presents a multiple layers game-theoretic framework forDDoS attack and defense evaluation. An innovative point in this work is the strategicthinking of attacker’s perspective benefit the defense decision maker in this interactionbetween attacks and defenses. However, this framework is not suitable for deployingdynamic network threats countermeasure and has no real-time security solution for real-time attacks.

Packet marking technique is widely used for IP traceback even tracing back thesource of attacks is extremely difficult. In Ref. [20], the authors present a marking

“9780471697558c11” — 2015/3/20 — 12:02 — page 276 — #8

276 CLOUD SECURITY

mechanisms for DDoS traceback, which injects a unique mark to each packet for trafficidentification. As a probabilistic packet marking (PPM) method, it has a potential thatleads attackers to inject marked Packet and spoofed the traffic. Reference [21] is anotherimportant traceback method by using deterministic packet marking (DPM). The victimcould track the packets from the router which splits the IP address into two segments.Differ from previous methods, in Ref. [22], the authors present an independent methodto traceback attacker based on entropy variations. However, most of these works do nothandle the IP spoofing very well and packet modification is needed to implement thesemethods.

11.3.2 SDN-Based Security Solutions

OpenFlow-enabled solutions provide the programming capabilities with high flexibilityand scalability, which have been largely deployed to enable new services or enhance theagility for networking systems [23–25]. Combining the OpenFlow with other opensourcepackages creates new networking service opportunities. QuagFlow [26] integrates theQuagga opensource routing suite with OpenFlow to provide a centralized control overthe physical OFS and Quagga router in VM. However, using OpenFlow as a way forsecurity purpose, especially in cloud environment, is still in an early stage. SDN hasbeen researched to establish monitoring system [27–29] due to its centralized abstractarchitecture and its statistics capability. OpenSafe [28] is a network monitoring systemthat allows administrators to easily collect usage statistics of networking and detect mali-cious activities by leveraging programmable network fabric. It uses OpenFlow techniqueto enable some manipulations of traffic, such as selective rules matching and arbitraryflows directing, to achieve its goal. Furthermore, ALARMS is designed as a policy lan-guage to articulate paths of switches for easily network management. OpenNetMon [29]is another approach for network monitoring application based on OpenFlow platform.This work is implemented to monitor per-flow metrics to deliver fine-grained input fortraffic engineering. Benefiting from the OpenFlow interfaces that enable statistic queryfrom controller, the authors proposed an accurate way to measure per-flow throughput,delay and packet loss metrics. In Ref. [27], the authors proposed a new framework toaddress the detection problem by manipulating network flows to security nodes for inves-tigation . This flow-based detection mechanism guarantees all necessary traffic packetsare inspected by security nodes. For dynamism purpose, provided services could beeasily deployed by users through a simple script language. However, all three aforemen-tioned studies [27–29] do not further discuss the countermeasure for intrusion maliciousactivities but only provide the monitoring service.

FortNox [30] is a Security Enforcement Kernel to address the conflict of rule tosecure OpenFlow network. Different rules are inserted by various OpenFlow applica-tions can generate rule conflict, which has potential to allow malicious packets bypassthe strict system security policy. FortNox applies a rule reduction algorithm to detectconflicts and resolves a conflict by assigning authorization roles with different privi-lege for the candidate flow rule. This kernel overcomes the potential vulnerability ofOpenFlow rules installment and enables an enforceable flow constraint to enhance SDNsecurity. The authors another research, Fresco, [31] implements an OpenFlow Security

“9780471697558c11” — 2015/3/20 — 12:02 — page 277 — #9

EXISTING SOLUTIONS 277

Application Development Framework based on Security Enforcement Kernel. It encap-sulates the network security mitigation in the framework and provide an APIs to enablelegacy application to trigger FRESCO module. However, this work does not havethe capability to defend and protect network assets independently because predefinedpolicies are needed to drive this system.

CONA [32] is a content-oriented networking architecture build on NetFPGA-OpenFlow platform. In this design, hosts request contents and agents deliver therequested contents while the hosts can not. Under the content-aware supervision, sys-tem can perform prevention by: (1) collecting suspect flows information from othersagents for analysis, and (2) applying rate limit to each of relevant agents to slow downthe overwhelming malicious traffic.

11.3.3 Security Issues of SDN

With the emerging of SDN, researchers start to concern the security of SDN itself. InRef. [33], a replication mechanism is brought up to handle the weakness of centralizedcontrolled network architecture, which is that one single point of failure could lead adowngrade of network resilient for the whole system. CPRecovery component is ableto update the flow entry in secondary controller dynamically and secondary controllercan take control the switch automatically when primary controller is down due to over-whelming traffic or DDoS attack. This work could be considered as a solution for DDoSattack; however, simple replication mechanism hardly promise that all the secondary con-trollers are able to tolerate high pressure attack even more backups could be deployed inthis system.

AvantGuard [34] is an SDN extension, which enhances the security and resilienceof OpenFlow itself. To address the two bottlenecks, scalability, and responsivenesschallenge, in OpenFlow, this paper introduces two new modules: connection migrationmodule and actuating trigger module. The former component is efficient to filter incom-plete TCP connection by establishing a handshake session before packets arriving thecontroller. TCP connections are maintained by migration connection module to avoidthe threats of TCP saturation attack. Actuating trigger module enables the data planereport network status and active a specific flow rule based on predefined traffic condi-tions. This research improved the robustness of SDN system and provided additional dataplane information to control plane to acquire higher security performance.

11.3.4 SDN-Based Intrusion Detection and Prevention System

As we discussed before, IDS and IPS are critical security appliance to protect cloudcomputing network. When we apply SDN to the cloud system, the decoupled switchwith separated control plane and data plane, creates a network OS layer to allow pro-grammable interface and open network control. This feature leads a flexibility anddynamic NR, which can efficiently and effectively control the network and enable secu-rity manipulation for higher level guards. However, only a small number of works aredone to implement SDN-based IDS and even fewer works on SDN-based IPS.

“9780471697558c11” — 2015/3/20 — 12:02 — page 278 — #10

278 CLOUD SECURITY

L-IDS [35] is a learning intrusion detection system to provide a network servicefor mobile devices protection. It is able to detect and respond to malicious attack withthe deployment of existing security system. It is more like a network service that cantransparently configured for end-host mobility and enable already known countermea-sures to mitigate detected threats. The authors do not provide a comprehensive solutionfor detected attack and more evaluations are needed to figure the most efficient responseaction for threats.

In a recent work [2], the authors present an SDN-based IDS/IPS solution to deployattack graph to dynamically generate appropriate countermeasures to enable the IDS/IPSin the cloud environment. The originality and contribution of this work mainly comesfrom using the attack graph theory to generate a vulnerability graph and achieve theoptimal decision result on selecting the countermeasure. SnortFlow [1] is another recentwork that focuses on the design of OpenFlow-based IPS with preliminary results.

Improving the accuracy and efficiency of NIDS is another important research thathas attracted many researchers in this area. For example, selective packet discarding isproposed in Ref. [36]. They built up the prototype by using Snort to improve the accuracyof NIDS. In Ref. [37], the authors show good throughput performance of IDS/IPS byproposing a string matching architecture.

In Ref. [38], the authors propose a mechanism called OpenFlow Random HostMutation (OFRHM) in which the OpenFlow controller frequently assigns each hosta random virtual IP that is translated to/from the real IP of the host. This mecha-nism can effectively defend against stealthy scanning, worm propagation, and otherscanning-based attack.

We believe the dynamic and adaptive capability of the SDN framework could benefitthe development of IDPS. This area is worth to be well explored for SDN-enabled cloudsystem to build suitable and on demand IDS/IPS system. Thus, we have been setting ourresearch target on establishing the SDN-based IDPS in cloud environment. This researchoutcome includes design and implementation of a full lifecycle SDN-based intrusiondetection and prevention system in cloud virtual networking environment.

11.4 TRANSFORMING TO THE NEW IDPS CLOUD SECURITYSOLUTIONS

11.4.1 Limitations of Existing Solutions

After investigating the traditional cloud security solutions and SDN-based one, we findthe existing solutions still have limitations in the following aspects:

• The detection solutions cannot efficiently detect and monitor the traffic. The mostcommon way for detecting the traffic is to configure the SPAN port mirror, whichmeans that all the traffic need to be duplicated and forwarded to a port in which anIDS is directly attached. Doubling the ongoing traffic definitely downgrades theperformance such as delay, available bandwidth, and so on.

• The prevention solutions are not sufficiently flexible. The most common way toprevent the attack traffic is to drop it. However, all detection engine has false

“9780471697558c11” — 2015/3/20 — 12:02 — page 279 — #11

FLOWIPS: DESIGN AND IMPLEMENTATION 279

positive and false negative (FN), which means drooping actions on all suspecttraffic may kill the good traffic. Other prevention solutions, for example, OFRHM[38], is not performed in a reactive way. It proactively performs the moving targetdefense to prevent the malicious traffic, which does not work for the maliciousinsider case.

• The comprehensive cloud security solutions including both detection and preven-tion can be hardly found.

11.4.2 New IDS/IPS in Cloud Virtual Networking Environments

A straightforward approach to implement the IDS/IPS is to deploy existing solutionssuch as Snort-based IDS/IPS solutions without changes in clouds. In Ref. [11], Rafeeqdiscusses an approach on how to implement the Snort and Iptables-based IPS in clouds. In[7], the authors classify the types of traditional IPS based on desktop, host, and network,where the network-based IPS usually involves security inspections such as DPI.

SDN-based security approaches in a cloud virtual networking environment havebeen considered as the trend for future virtual networking security solutions [39]. In ourrecent work [2, 40], we present an SDN-based IDS/IPS solution using attack graph tech-niques to guide the cloud security management system to dynamically generate appro-priate countermeasures to enable the IDS/IPS services. SnortFlow [1] is another recentwork focusing on the design and evaluation of OpenFlow-[41] based IPS in the cloudenvironment. These existing solutions demonstrated that Snort can be used to detectintrusions in clouds; however, there are still a few important issues that current work hasnot addressed and can be regarded as the guidance for designing future IDPS solutions:

• Will SDN-based IDS/IPS has better performance than traditional snort-based IPS?• How to establish an efficient software-based SDN solution in the cloud virtual

networking?• How to design the SDN-based IDS/IPS networking architecture that provides a

dynamic defensive mechanism for clouds?

To address the aforementioned enumerated issues, we proposed a high-level archi-tecture to realize the IPS by integrating Snort and OpenFlow [41] components in Ref. [1].By utilizing the power of SDN OpenFlow, the cloud networking environment can bedynamically reconfigured based on the detected attacks in realtime. Our prototypingis established based on the Open Virtual Switch (OVS) and Xen-based [42] cloudenvironment. The evaluation results show that the proposed system is feasible in thecloud environment and provides valuable guidance for re-designing FlowIPS and furtherconducting thorough evaluations.

11.5 FLOWIPS: DESIGN AND IMPLEMENTATION

FlowIPS provides several salient features to advance the security research and devel-opment for cloud computing. It presents a new design of IDS/IPS based on SDN

“9780471697558c11” — 2015/3/20 — 12:02 — page 280 — #12

280 CLOUD SECURITY

Admin Dom U

Controller

Dom 0

Log

OpenFlow

Listen

JSON/RPC

Snort-agent

ovs-vswitchd

Xenbr0Xenbr1

ovsdb-serverPacket

classifier Flowtable

eth x

Openvswitch_mod.ko

Hash lookup table

User

Kernel

eth1 eth0 eth0

Vif 2.0Vif 1.0Vif 1.1

VM from Dom 1VM fromDom 2

User Dom U

Figure 11.2. FlowIPS system architecture.

approaches, that is, using programmable OVS. It supports a dynamic defensive mech-anism supporting programmable NR.

In the rest of this section, we present the designed architecture and components ofthe proposed FlowIPS, and the processing flow. The architecture and components arepresented in Figure 11.2.

11.5.1 System Components

Cloud Cluster hosts cloud resources and the proposed FlowIPS. A cloud cluster containsone or multiple cloud servers with major cloud-based OS installed. All major cloud-based OS with SDN feature enabled, such as OpenStack, CloudStack, Xenserver, KVM,and so on, can be compatible with our proposed system. In this work, we demonstrateand establish the system based on Xenserver that is an efficient parallel virtualizationsolution. There are two types of domains in Xen-based cloud: Dom 0 and Dom U. Dom 0is the management domain that belongs to the cloud administrative domain. We introduceone Dom U dedicated for administrative purpose to place controller and log component;while all other Dom Us are for hosting VMs for users. Dom U resources are managed byDom 0 and must go through Dom 0 to access the hardware.

Open vSwitch (OVS) is a software implementation of the OFS. OVS is usuallyimplemented in the management domain or privilege domain of cloud servers. In ourestablished prototype, OVS is natively implemented in the Dom 0 of XenServer cloudsystem. Inter-VM communication within the same physical server is controlled by theOVS without exposing the traffic out of the physical box. Each Dom 0 in Xenserverruns a userspace daemon (flow path) as well as a kernel space module (fast path). Inuserspace, there are two modules; they are ovsdb-server and ovs-switchd. The moduleovsdb-server is the log-based database that holds switch-level configuration; while the

“9780471697558c11” — 2015/3/20 — 12:02 — page 281 — #13

FLOWIPS: DESIGN AND IMPLEMENTATION 281

module ovs-switchd is the core OVS component that supports multiple independentdatapaths (bridges). In Figure 11.2, the ovs-switchd module is able to communicate withthe ovsdb-server module through the management protocol. They communicate withthe controller through OpenFlow protocol, and with the kernel module through netlinkprotocol. In the kernel space, the kernel module handles packet switching, lookup andforwarding, tunnel encapsulation and decapsulation. Every virtual interface (VIF) on eachVM has a corresponding virtual interface/port on OVS, and different VIFs connecting tothe same bridge can be regarded on the same switch. For example, VIF 1.0 (the virtualport of eth0 on VM from Dom 1) has the layer 2 connection with VIF 2.0 (the virtual portof eth0 on VM from Dom 2). OVS forwards packets based on the entries in flow table.

Snort is a multi-mode packet analysis IDS/IPS tool, and it has better performancecompared to many other products [43]. It has several components such as sniffer, packetlogger, and data analysis tools. In its detection engine, rules form attack signatures tojudge if the detected behavior is a malicious behavior or not. It has both host- andnetwork-based detection engines; and it also has a wide range of detection capabilitiesincluding stealth scans, OS fingerprinting, buffer overflows, back doors, and so on. Toestablish the IPS in the cloud environment, the first step is to interface the detectionengine Snort to the cloud networks management component, that is, OVS. In a cloudserver, Snort can be implemented in Dom 0 (privilege domain) or Dom U (unprivilegeddomain) based on Xen virtualization techniques. In this architecture, we deploy the Snortin Dom 0, which makes it easily sniff the traffic through the software bridge in OVS. Allthe logging information generated from the Snort is output into a CSV file so that the con-troller can access in real time. The Snort component can be simply replaced with otherIDS solutions, for example, Suricata, because the mitigation and detection is decoupled,which is different from the traditional IPS solution (e.g., Snort+Iptables). The perfor-mance evaluation of Snort and Suricata in the cloud is discussed in Ref. [43], the overallperformance of Snort is better than Suricata, which is also the reason why we chooseSnort as the candidate for the detection engine in this implementation.

Controller is the component providing a centralized view and control over the cloudvirtual networks. The controller contains three major components, FlowIPS daemon,alert interpreter, and rules generator. FlowIPS daemon is mainly for collecting alerts gen-erated from Snort agents deployed in Dom 0. Alert interpreter takes care of parsing thealert and targets the suspect traffic. Then, the parsed and filtered information is passed torules generator who is in charge of the rules to be configured on OpenFlow-enabled soft-ware or hardware switches. A database is used to store the generated rules and switches’original states for future operations like resuming functions, and so on.

11.5.2 FlowIPS Processing Flow

The processing flow of the FlowIPS is illustrated in Figure 11.3. The network traffic isgenerated from the cloud resources, that is, VMs. All network traffic must be generatedfrom the VIFs that are attached to virtual bridges in OVS. The virtual bridge can beregarded as the virtual switch, which means all VIFs connecting to the same bridge areon the same network. The Snort agent in Dom 0 has the advantage of directly detectingthrough the bridge. When any traffic matching the Snort rules is alerted into the log file,

“9780471697558c11” — 2015/3/20 — 12:02 — page 282 — #14

282 CLOUD SECURITY

Cloud virtual network environment

Packets Cloud resource

FlowIPS

agent@Dom0

Bridge 0 Bridge n

OVS@Dom0

FlowTable

Rules

generator

Alertintepreter

FlowIPSdaemon

CSV log file

FlowIPS controller

Figure 11.3. FlowIPS processing flow.

the FlowIPS daemon will fetch the alert information from the CSV log file in real time.Then, the alert interpreter will parse the alert information to get the current networksecurity situation. Finally, the rules generator will generate the OpenFlow flow tablerule entries and push them to the OVS to update its flow table. Therefore, the followingsuspect traffic with respect to flow table entries in the OVS can be swiftly handled witha deployed countermeasure.

11.6 FLOWIPS VS SNORT/IPTABLES IPS

Motivated by the limitation of the representative traditional IPS, for example, Snort/Iptables IPS, FlowIPS is designed to take advantages of SDN to provide security coun-termeasures to increase the flexibility and efficiency. This section mainly discusses thecomparison between the proposed FlowIPS and the Snort/Iptables IPS focusing on theFlowIPS working mechanism and new capabilities.

Tractional IPS system is not specially designed for cloud virtual networking environ-ment, but for a general network environment. The major difference between the generalnetwork environment and cloud virtual networking environment is that the latter oneusually has difference network domains, that is, management network domain and usernetwork domain. Those two domains are at difference layers, and therefore have dif-ference efficiency. User network is on top of the management network, which meansthe lower layer can be expected with better efficiency. Thus, we design the FlowIPSespecially for cloud virtual networking environment and take advantages of OVS inmanagement domain in order to achieve better performance.

“9780471697558c11” — 2015/3/20 — 12:02 — page 283 — #15

FLOWIPS VS SNORT/IPTABLES IPS 283

Dom U Dom U Dom U Dom U

AttackerIptablesIPS

controllerVictim

FlowIPS

controller

FlowTableNetwork

Network

Kernel space

Application

Transport

InternetUser space

NFQueue

1 1 2 3 4 4 2 3

Data packetControl packet

Dom 0

Figure 11.4. FlowIPS and Snort/Iptables IPS mechanism.

The compared two IPS solutions are different in terms of the working mechanismand operation levels. Figure 11.4 indicates the scenario on how the Iptables IPS (lightlines) and FlowIPS (dark lines) detect and prevent attacks. The number beside eachline represents the sequence of the packet flow. Solids lines and dotted lines representthe data traffic and control traffic, respectively. For Snort/Iptables IPS, Snort needs to beconfigured as inline mode and recompiled with Iptables. Besides detection engines, oneof most important components of the Iptables is the NFQUEUE, which is an Iptablesand Ip6tables target delegating the decision on packets to a userspace software. It issuesa verdict on a detected packet. Snort/Iptables IPS can not be placed in the Dom 0 ofXenserver because Snort/Iptables IPS is a higher level proxy-based solution comparingwith the OVS-based IPS solution, which means that it needs to be placed in the middleof two or more communication end virtual hosts. Thus, Snort/Iptables IPS needs to beplaced at the same level with all VMs at Dom U. Moreover, it is noted that OVS in Dom0 is the same as network stack in OS kernel level, which means that OpenFlow feature isnot enabled when the flow table is empty. As shown in Figure 11.4, when attacking pack-ets generated from attacker’s virtual interface, all the packets need to be passed throughDom 0 before being forwarded to the destination (line 1). When Snort detects any sus-pect traffic, it needs to inform the NFQUEUE to take the actions defined in the rules.The Iptables IPS needs to consult the controller who sends out control messages to issue

“9780471697558c11” — 2015/3/20 — 12:02 — page 284 — #16

284 CLOUD SECURITY

command (lines 2 and 3). Finally, the suspect packet is handled at the kernel space atDom U and will be either forwarded to victim or dropped (line 4).

Unlike the Snort/Iptables IPS, FlowIPS deploys both the detection engine and thepacket processing module in Dom 0. This is an efficient approach especially when han-dling large amount of traffic. When packets arrive at Dom 0 (line 1), Snort detectionengine is able to sniff the bridges, and only few traffic between OVS at Dom 0 and thecontroller at Dom U is generated (lines 2 and 3). After the controller update the flow table,all traffic with the same pattern will be processed at OVS fast path in Dom 0 (line 4).From the Figure 11.4, it is also obvious that packets in Snort/Iptables IPS scenario needto be forwarded in and out the Dom 0 twice, while the FlowIPS only needs once to fulfillthe same task. Thus, due to the IPS working mechanism, FlowIPS should significantlyoutperformance any other Dom U IPS solution especially in cloud virtual networkingenvironment, which will be proved in Section 11.8.

11.7 NETWORK RECONFIGURATION

NR is a means to reconfigure the network characteristics including topology, packetheader, QoS parameters, and so on. With the SDN concept enabled in the cloud vir-tual networking environment, NR can be applied to construct the IPS system. Major NRactions are summarized in Table 11.1, including the following actions:

• Traffic Redirection (TR) can redirect the traffic to a secure appliance (e.g., DPIunit, Honeypot) by rewriting the packet header. TR is usually implemented byusing MAC/IP address rewriting. Controller can push entry to flow table whichcan take packet header rewriting action on matching packets.

• QoS Adjustment (QA) is a very efficient way to handle flood type of attacks. OVS isable to adjust the QoS parameters of any attached VIF. After lower the TX/RX rate,suspect attack traffic will generate less impact on the network and hosts nearby.Sometimes, QA can be configured to work with other NR like traffic isolation.

• Traffic Isolation (TI) is different from the TR in that TI provides an isolated virtualnetworking channel separated from others, for example, separated virtual bridges,

TABLE 11.1. Networkreconfiguration actions

No. Countermeasure

1 Traffic redirection2 QoS adjustment3 Traffic isolation4 Filtering5 Block switch port6 Quarantine

“9780471697558c11” — 2015/3/20 — 12:02 — page 285 — #17

NETWORK RECONFIGURATION 285

isolated ports or GRE tunnel. Malicious traffic will be only impact any host on itsisolated virtual channel and will not impact other normal traffic.

• Filtering is similar with the filter in Iptables, but they are different in that filteringin NR will handled packets at OVS kernel space and will not forwarded to a remotecontroller. MAC/IP address change is a very straightforward way to prevent thevictim from being attacked by the malicious traffic. The default IPS action, thatis, drop, can be also regarded as a filtering rule that drop the matching packets.

• Blocking switch port actions can be set up in the flow table as filtering rules.Some attacks are performed by exploring a certain port, especially a public serviceport. By blocking those ports, the attack can be prevented as the attacking path isdisconnected.

• Quarantine is a comprehensive approach to do the isolation in cloud virtualnetworking environment. It works similarly with TI, but it isolates the suspectnetwork resources (not just the suspect traffic). Another difference between thenormal traffic isolation and quarantine is that more flexible self-defined policiescan be applied in quarantine mode. Quarantine can be also regarded as the supersetof many NR set. For example, the system can quarantine suspect network targets(VMs) with only ingree permission and without egree permission. Thus, such VMcan only receive traffic but can not generate traffic to the network.

11.7.1 Representative NR Actions

Before introducing the NR actions, the default action taken by the IPS is blocking ordropping the malicious traffic, which can be regarded as filtering function we describedin Table 11.1. Since NIDS incurs FP and FN when judging the network packets, decisionssuch as dropping packets may incurs high FP or FN. In this section, we present tworepresentative NR actions besides the default IPS action, TR and QoS adjustment (QA).The reason why we exclude the traffic isolation, block switch port, is because they canbe performed with similar ways, for example, rewrite packet header. And quarantine isa comprehensive solution that can be performed by combining several individual action.Thus, we only discuss the default NR, that is, dropping, and other two NR actions, whichare displayed in Figure 11.5.

11.7.1.1 Traffic Redirection. There are three ways to implement the TR basedon OVS in a cloud environment: MAC address rewriting, IP address rewriting, andOVS port rewriting. When detection engine detects any suspect packet, the controllerfirstly pushes the OpenFlow entry (i.e., matching packet header fields and correspond-ing actions) to OVS to update the flow table. IP and MAC addresses changes are doneby flow table when certain packets are matching specific entries. Then correspondingactions will be taken for matching packets. Actions can be set as changing on any headerfield of the flow table, for example, source IP, destination IP, source MAC, destinationMAC. TR mostly depends on the destination address (DA) field. When destination IPor MAC address is changed, the OVS will forward the packet to the changed destina-tion address in the packet header. This NR function is especially useful when dealing

“9780471697558c11” — 2015/3/20 — 12:02 — page 286 — #18

286 CLOUD SECURITY

Open vSwitch

Ethernet IP TCPVLAN

Snort logSnort detection

engineQoS control

Virtual bridge 1 Virtual bridge 2 Virtual bridge 3 Virtual bridge n

QoS

adjustment

Traffic

redirection

DPI/honeypotVM host

PacketsPackets

Packets

Drop malicous pakcet

VM hostAttacker

POXcontroller SA DA DAType SA Proto Src DstID

SSLVM hosts

Inport

Figure 11.5. Network reconfiguration mechanism.

with the suspect packets, which cannot be determined as malicious. Suspicious packetsare expected not to forward to a possible victim but a detection site for further check-ing, for example, applying DPI or sending to a honeypot. As shown in Figure 11.5, theblock represents the corresponding flow table fields that TR may change. Moreover, IPand MAC field rewriting can be combined with other NR function to implement manynetwork function, for example, NAT.

Beside the MAC and IP address change, there is another way to realize the TR, portrewriting. This method is also natively enabled by the OVS architecture. As shown inFigure 11.2, each bridge created in an OVS can be regarded as a virtual switch. All VMVIFs are connected to virtual bridges through virtual port (i.e., virtual interface). Thus,by forwarding any packet to the virtual port, the VM VIF that connects to that virtual portwill be able to receive the forwarding packets. Through this mechanism, FlowIPS is alsoable to set any virtual port as the output port of any packet to implement the TR functionwithout changing the packet header. One of benefits of using port rewriting is that anypacket header will not be changed while TR is being realized, which is efficient anduseful to some components (e.g., security appliance) when collecting original networkdata for further learning.

11.7.1.2 QoS Adjustment. QA is a desired feature when dealing with floodtype of attack, for example, DoS and DDoS. When one or multiple victims are understress from receiving a huge amount of traffic from one or multiple sources that can notbe confidently determined as attackers, it is always expected to slow down the currentextremely fast flow and to determine if the traffic is malicious or not after further inspec-tion. In general, there are two ways to implement the QA, reset the QoS parameters oneither VIF or port on OVS. Setting the QoS limitation on VIF or OVS port has differentapplied scenarios. When setting the QoS limitation on VIF, it is necessary to first locatethe packet source, for example, the attacker. Thus, the number of suspect attackers wouldbetter not be large. On the other hand, when there is a DDoS attack on the network, theattack source may be a large set, which means it is infeasible and impractical to locateall zombie attackers and adjust their VIFs. To solve this issue, we introduce a smart way

“9780471697558c11” — 2015/3/20 — 12:02 — page 287 — #19

NETWORK RECONFIGURATION 287

to implement the QoS for such situation, by limiting the incoming port on OVS. Whenany packet arrived at OVS, it must have an event port also called in port, as shown onthe flow table in Figure 11.5. We can limit the QoS of that incoming port so that anyarriving packet exceeding the QoS limit will be dropped without further process, whichsignificantly enhances the performance of QA. Also, attacker may deploy IP spoofingtechnology to modify the source IP address of attacking packets to avoid being tracedback. Thus, an intuitive approach is to modify QoS parameters. It is also possible to inte-grate the QA with TI, for example, forward the packet into a VLAN with specified QoSlimitation. Here, we do not discuss the the QoS model used by FlowIPS, and we onlyinvestigate in the capability provided by FlowIPS.

11.7.2 NR Selection Policy

In Figure 11.3, there is a component in FlowIPS controller called rules generator. Therules generator is designed to choose NR and generate the corresponding OpenFlow rulesbased on the detection engine’s alerts. The rules can be generated based on differentalgorithms that are not our focus here. Based on two representative NRs we mentionedabove and one default drop action, we summarize the IPS action selection policy inTable 11.2.

Degree of confidence (DoC) represents the degree of how confident the detectionengine believes the traffic is a malicious one. Since one of the biggest challenges of NIDSis to reduce the FN and FP, it is impossible for a detection engine to detect attacks with100% accuracy. When the traffic is suspected and the detection engine can not draw theconclusion that it is definitely the malicious traffic, it is wise to choose the appropriate NRto mitigate the potential attack consequence. In this article, the cost means the resourcesconsumption in the system when taking countermeasure actions. Various countermeasureactions consume different amount of resources due to the frequent OpenFlow operations,for example, updating flow table, taking OpenFlow actions. In practice, certain types ofNR can be establish automatically to respond to a particular attack scenario.

In general, a packet dropping action is the default NR for whitelist-based counter-measure approaches, and it has the lowest cost since the packet match the entry in flowtable can be established easily. This countermeasure usually implies high DoC to preventattackers from introducing malicious traffic into the cloud system.

TR is appropriate for the traffic with medium DoC. When FlowIPS detects possibleattacking traffic with a medium DoC, the traffic can be redirected to a secure appli-ance, for example, DPI proxy or Honeypot for further inspection and learning. Afterthe FlowIPS inspects the traffic, the traffic can be possibly forwarded to the original

TABLE 11.2. FlowIPS actions selection guidance

Major actions DoC Cost Preferred scenario

Drop High Low Any determined malicious trafficTR Medium Medium Attacking traffic requiring further inspectionQA Low Medium Attacks with overwhelmed traffic, e.g., DoS

“9780471697558c11” — 2015/3/20 — 12:02 — page 288 — #20

288 CLOUD SECURITY

destination or take other actions, for example, drop. Using TR, the suspect traffic willnot be forwarded to the original destination until a further process is done. Since TRneeds to use packet header or OVS port rewriting technology for every single suspectpacket matching the flow table entry, it costs more resource than simply drop action.

QA has two ways as we mentioned earlier, VIF-based QA and virtual port-based QA.We are mainly focusing on the virtual port-based QA since it can be applied to a broadrange of scenarios. QA is preferred to be taken for traffic with lower DoC than TR sincethe malicious packets with lower DoC can be sent to the original destination. Packetswith low DoC can not be determined as malicious. Thus, such traffic does not need to bedropped or redirected. QA incurs similar overhead compared to the TR approach since allthe packets need to be handled by the OVS. QA is a good approach to mitigate resourceconsumption attacks, such as DDoS attacks.

11.8 PERFORMANCE COMPARISON

After implementing the FlowIPS described in Figure 11.2, we present a comparison ofthe performance of proposed FlowIPS and one traditional IPS candidate, that is, Iptables-based IPS. For fair comparison, FlowIPS deploys just the default NR action (drop)since Snort/Iptables IPS does not have additional NR capabilities besides drop. All theimplementation and evaluation is conducted on a Dell R510 Server with two Intel XeonQuad-core processors and 32 GB Memory total.

Figure 11.6 evaluates the IPS forwarding capability under overwhelmed workload.The IPS itself is set as the proxy of two virtual end hosts. We use hacking tools [44] toinitiate the DoS attack toward the IPS target at a fixed rate of 150,000 packets per secondas the interference source. For demonstration purpose, we choose two major DoS attacksas candidates, which are ping of death (PoD) and SYN flood attack. To measure the IPShealth traffic forwarding capability, a VM sends packets to another one via IPS at variousrate, i.e., packets per second. In traditional IPS solution, DoS packets are first capturedby the IPS detection engine that further matches the rules and takes drop action on thepackets. In the FlowIPS approach, the OVS fulfills the same task as Iptables does but alsohandles packets in a different and more efficient mechanism. After Snort finds packetsmatching an attack’s signature, the controller is able to be aware of the current threatsin real time by parsing CSV log file and then pushes corresponding flow entries intothe flow table. After flow table is updated, the malicious traffic can be handled by theOVS fast path that can dramatically increase the system performance. From Figure 11.6,FlowIPS under both type of attacks has almost 100% forwarding rate, which means thatall normal traffic can be properly forwarded even the FlowIPS is under the significantstress. For Snort/Iptables IPS it has about 70% and 40% success forwarding rate underSYN flood and PoD attacks, respectively. The reason why IPS SYNFlood has betterperformance over IPS PoD is because the PoD attack packets are averagely bigger thanSYNFlood ones and therefore consume more resource from the IPS than the SYNFloodscenario.

In Figure 11.7, we evaluate the alert generation capacity of both IPS and FlowIPSunder flood interference. This metrics also states how IPS can process the attacking

“9780471697558c11” — 2015/3/20 — 12:02 — page 289 — #21

PERFORMANCE COMPARISON 289

Healthy tra

ffic

success forw

ard

ing r

ate

(%

)

0

10

20

30

40

50

60

70

80

90

100

110

120FlowIPS PoD and SYNFlood

IPS SYNFlood

IPS PoD

65,0

00

60,0

00

55,0

00

50,0

00

45,0

00

40,0

00

35,0

00

30,0

00

25,0

00

20,0

00

15,0

00

10,0

00

5,00

0

Attacking rate (Packets per second)

Figure 11.6. Health traffic impact.

0

10

20

30

40

50

60

70

80

90

100

110

120

350,

000

125,

000

100,

000

75,0

00

50,0

00

40,0

00

35,0

00

25,0

00

10,0

00

15,0

00

9,50

0

8,00

0

7,500

7,000

6,50

0

6,00

0

5,50

0

5,00

0

4,50

0

4,00

0

3,50

0

3,00

0

2,50

0

2,00

0

1,50

0

1,00

0

Attack rate (packets per second)

FlowIPS

Intr

usio

n d

ete

ction r

ate

(%

)

IPS

Figure 11.7. Intrusion detection rate.

“9780471697558c11” — 2015/3/20 — 12:02 — page 290 — #22

290 CLOUD SECURITY

packets from security perspective. To evaluate this performance, we generate two dif-ferent types of attacks, which are DoS flooding attack acting as the interference sourceand ICMP flood attack acting as an potential threat to be tested. This evaluation mainlyindicates whether IPS and FlowIPS can generate alert under high workload stress orinterference. The figure shows the successful alert generation rate of ICMP attack underDoS attack interference. It suggests that alert generation of traditional IPS is impactedby DoS interference and most resources of IPS system are used to handle DoS attacktherefore the performance of alert generation rate decreases as the ICMP attack speedincreases. When the speed of the ICMP attack reaches to 15,000 packets per second, IPScan only generate 13.72% alerts of total ICMP attack. On the other hand, FlowIPS isable to efficiently avoid interference from DoS flooding attack due to OVS capability, soit can successfully alert all the threats which are sent at the speed of 15,000 packets persecond. When the speed of the ICMP attack reaches to 30,000 packets per second, theperformance of FlowIPS start decreasing, and when the speed of ICMP attack increasesto 300,000 packets per second, Snort agent in FlowIPS is not able to capture packets andlaunch alerts because the snort detection engine itself almost reached its threshold.

Thus, the evaluation of the proposed IPS validates the analysis mentioned inSection 11.6. The FlowIPS has better network and security performance, especially incloud virtual networking environment.

11.9 OPEN ISSUES AND FUTURE WORK

Even though the evaluation results in previous section show expected results. There arestill several issues that FlowIPS cannot address. First of all, the intrusion detection sys-tem deployed in this system is a signature-based detection engine, which means that thedetection capability is limited. Some attack behaviors that do not fall into any signa-ture pattern, for example, DDoS attack, are not able to be efficiently detected by usingthe signature based introduction detection system. Secondly, the current IDS is plantedin the management domain of the cloud virtual networking environment and differentdetection engines are logically disconnected from each other. This is also an criticalissue because some distributed and collaborative attacks, e.g., DDoS attack, will alsoescape from the detection engines without appropriate sychronization among distributeddetection engines. Last, how to interpret the alert and generate the prevention strategyefficiently is also one of the top concerns to improve the proposed system.

Therefore, based on the previous open issues, the future work of FlowIPS involvesthe following three aspects: (1) Signature- and anomaly-based detection: Beside the sig-nature based detection engine, it is expected to incorporate the anomaly-based detectionas well. Thus, the majority of malicious behaviors will be efficiently captured. (2) Syn-chronization: Currently, there is only one Snort detection agent in the system. We aregoing to introduce more detection engine placed in different servers and collect alertsfrom all of them, which further help to generate the NR rules by correlating some alerts.(3) Algorithms: optimized algorithms are required in alert interpreter module, rules gen-erator, and snort agent partition to increase the efficiency of proposed FlowIPS withoutbreaking down the detected vulnerable service.

“9780471697558c11” — 2015/3/20 — 12:02 — page 291 — #23

REFERENCES 291

11.10 CONCLUSION

In this chapter, we summarize the issues and design challenges of cloud network securityand comprehensively study the existing work of cloud security. Then, we propose anOpenFlow-based IPS called FlowIPS in the cloud virtual networking environment. Itinherits the intrusion detection capability from Snort and flexible NR from OpenFlow.FlowIPS is firstly compared with traditional IPS from principle perspective and thenthrough real-world evaluation. NR actions are also designed and developed based onOVS and POX controller in cloud virtual networking environment. The evaluation resultsshow the performance difference between the proposed FlowIPS and Iptable/Snort IPS,and therefore validate the superior of proposed solution.

ACKNOWLEDGMENTS

The presented work is sponsored by ONR YIP award, NSF grants CNS-1029546 andCNS-1217736, and China Mobile research grant.

REFERENCES

1. T. Xing, D. Huang, L. Xu, C.-J. Chung, and P. Khatkar, “Snortflow: A openflow-based intru-sion prevention system in cloud environment,” in GENI Research and Educational ExperimentWorkshop, GREE, Salt Lake City, UT, 2013.

2. C.-J. Chung, P. Khatkar, T. Xing, J. Lee, and D. Huang, “Nice: Network intrusion detection andcountermeasure selection in virtual network systems,” in IEEE Transactions on Dependableand Secure Computing (TDSC), Special Issue on Cloud Computing Assessment, vol. 10, no. 4,pp. 198–211, 2013.

3. C. C. S. Alliance, “Top threats to cloud computing v1.0,” in Cloud Security Alliance, 2010.

4. K. Kell, “Ec2 security revisited,” in Online Blog, 2013.

5. “SourceFire Inc.” [Online]. Available: http://www.snort.org.

6. “Suricata Inc.” [Online]. Available: http://suricata-ids.org.

7. W. Morton, “Intrusion prevention straitegies for cloud computing,” 2011.

8. N. Gude, T. Koponen, J. Pettit, B. Pfaff, M. Casado, N. McKeown, and S. Shenkes, “Nox:Towards an operating system for networks,” in ACM SIGCOMM Computer CommunicationReview, Vol. 38, no. 3, pp. 105–110, July 2008.

9. “Citrix Systems, Inc.” [Online]. Available: http://www.citrix.com/products/xenserver.

10. “KVM.” [Online]. Available: http://www.linux-kvm.org.

11. R. U. Rehman, Intrusion Detection Systems with Snort: Advanced IDS Techniques UsingSnort, Apache, MySQL, PHP, and ACID. Prentice Hall Professional, Upper Saddle River, NJ,2003.

12. V. Sekar, R. Krishnaswamy, A. Gupta, and M. K. Reiter, “Network-wide deployment of intru-sion detection and prevention systems,” in Proceedings of the 6th International Conference,ser. Co-NEXT ’10. New York, NY, USA: ACM, 2010, pp. 18:1–18:12. [Online]. Available:http://doi.acm.org/10.1145/1921168.1921192.

“9780471697558c11” — 2015/3/20 — 12:02 — page 292 — #24

292 CLOUD SECURITY

13. C. Kruegel, F. Valeur, G. Vigna, and R. Kemmerer, “Stateful intrusion detection for high-speednetwork’s,” in Proceedings. 2002 IEEE Symposium on Security and Privacy, 2002. Oakland,CA, 2002, pp. 285–293.

14. K. Vieira, A. Schulter, C. Westphall, and C. Westphall, “Intrusion detection for grid and cloudcomputing,” IT Professional, vol. 12, no. 4, pp. 38–43, July 2010.

15. S. Roschke, F. Cheng, and C. Meinel, “An extensible and virtualization-compatible IDsmanagement architecture,” in Fifth International Conference on Information Assurance andSecurity, 2009 (IAS ’09). vol. 2, Xi’An, China, 2009, pp. 130–134.

16. J. Francois, I. Aib, and R. Boutaba, “Firecol: A collaborative protection network for thedetection of flooding ddos attacks,” IEEE/ACM Transactions on Networking, vol. 20, no. 6,pp. 1828–1841, Dec 2012.

17. R. Braga, E. Mota, and A. Passito, “Lightweight DDoS flooding attack detection usingnox/openflow,” in 2010 IEEE 35th Conference on Local Computer Networks (LCN), Denever,CO, Oct 2010, pp. 408–415.

18. S. Yu, Y. Tian, S. Guo, and D. Wu, “Can we beat DDoS attacks in clouds?,” IEEE Transactionson Parallel and Distributed Systems, vol. 25, no. 9, pp. 2245–2254, September 2014.

19. G. Yan, R. Lee, A. Kent, and D. Wolpert, “Towards a Bayesian network game frameworkfor evaluating DDoS attacks and defense,” in Proceedings of the 2012 ACM Conference onComputer and Communications Security, ser. CCS ’12. New York: ACM, 2012, pp. 553–566.[Online]. Available: http://doi.acm.org/10.1145/2382196.2382255.

20. M. T. Goodrich, “Probabilistic packet marking for large-scale ip traceback,” IEEE/ACM Trans-actions on Networking, vol. 16, no. 1, pp. 15–24, Feb. 2008. [Online]. Available: http://dx.doi.org/10.1109/TNET.2007.910594.

21. A. Belenky and N. Ansari, “IP traceback with deterministic packet marking,” IEEE Commu-nications Letters, vol. 7, no. 4, pp. 162–164, 2003.

22. S. Yu, W. Zhou, R. Doss, and W. Jia, “Traceback of ddos attacks using entropy variations,”IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 3, pp. 412–425, 2011.

23. B. Koldehofe, F. Durr, M. A. Tariq, and K. Rothermel, “The power of software-defined net-working: Line-rate content-based routing using openflow,” in Proceedings of the 7th Workshopon Middleware for Next Generation Internet Computing, MW4NG ’12. New York: ACM,2012, pp. 3:1–3:6.

24. R. Kappor, G. Porter, M. Tewari, G. M. Voelker, and A. Vahdat, “Chronos: Predictable lowlatency for data center applications,” in Proceedings of the Third ACM Symposium on CloudComputing, SoCC ’12. New York: ACM, 2012, pp. 9:1–9:14.

25. M. Suchara, D. Xu, R. Doverspike, D. Johnson, and J. Rexford, “Network architecture forjoint failure recovery and traffic engineering,” in Proceedings of the ACM SIGMETRICS JointInternational Conference on Measurement and Modeling of Computer Systems, SIGMETRICS’11, New York: ACM, 2011, pp. 97–108.

26. M. R. Nascimento, C. E. Rothenberg, M. R. Salvador, and M. F. Magalhaes, “QuagFlow:Partnering quagga with openflow,” in Proceedings of the ACM SIGCOMM 2010 Conference,SIGCOMM ’10. New York: ACM, 2010, pp. 441–442.

27. S. Shin and G. Gu, “Cloudwatcher: Network security monitoring using openflow in dynamiccloud networks (or: How to provide security monitoring as a service in clouds?),” in Pro-ceedings of the 2012 20th IEEE International Conference on Network Protocols (ICNP),ser. ICNP ’12. Washington, DC: IEEE Computer Society, 2012, pp. 1–6. [Online]. Available:http://dx.doi.org/10.1109/ICNP.2012.6459946.

“9780471697558c11” — 2015/3/20 — 12:02 — page 293 — #25

REFERENCES 293

28. J. R. Ballard, I. Rae, and A. Akella, “Extensible and scalable network monitoring usingopenSAFE,” in Proceedings of the 2010 Internet Network Management Conference onResearch on Enterprise Networking, INM/WREN ’10. Berkeley, CA: USENIX Association,2010, p. 8.

29. N. L. van Adrichem, C. Doerr, and F. A. Kuipers, “Opennetmon: Network monitoring inopenflow software-defined networks,” in Network Operations and Management Symposium(NOMS). IEEE, 2014.

30. P. Porras, S. Shin, V. Yegneswaran, M. Fong, M. Tyson, and G. Gu, “A security enforcementkernel for openflow networks,” in Proceedings of the First Workshop on Hot Topics in SoftwareDefined Networks, ser. HotSDN ’12. New York: ACM, 2012, pp. 121–126. [Online]. Available:http://doi.acm.org/10.1145/2342441.2342466.

31. S. Shin, P. A. Porras, V. Yegneswaran, M. W. Fong, G. Gu, and M. Tyson, “Fresco: Mod-ular composable security services for software-defined networks,” in Proceedings of the20th Annual Network and Distributed System Security Symposium (NDSS ’13), February2013.

32. J. Suh, H.-g. Choi, W. Yoon, T. You, T. Kwon, and Y. Choi, “Implementation of content-oriented networking architecture (cona): A focus on DDoS countermeasure.” September 2010.

33. P. Fonseca, R. Bennesby, E. Mota, and A. Passito, “A replication component for resilientopenflow-based networking,” in Network Operations and Management Symposium (NOMS),2012 IEEE, April 2012, pp. 933–939.

34. S. Shin, V. Yegneswaran, P. Porras, and G. Gu, “Avant-guard: Scalable and vigilantswitch flow management in software-defined networks,” in Proceedings of the 2013 ACMSIGSAC Conference on Computer & Communications Security, ser. CCS ’13, 2013,pp. 413–424.

35. R. Skowyra, S. Bahargam, and A. Bestavros, “Software-defined IDs for securing embeddedmobile devices,” in High Performance Extreme Computing Conference (HPEC), 2013 IEEE,Waltham, MA, Sept 2013, pp. 1–7.

36. A. Papadogiannakis, M. Polychronakis, and E. O. Markatos, “Improving the accuracy of net-work intrusion detection system under load using selective packet discarding,” in Proceedingsof the Third European Workshop on System Security, EUROSEC ’10. New York: ACM, 2010,pp. 15–21.

37. L. Tan and T. Sherwood, “A high throughput string matching architecture for intrusion detec-tion and prevention,” in the 32nd Annual International Symposium on Computer Architecture(ISCA), Madison, WI, 2005.

38. J. H. Jafarian, E. AI-Shaer, and Q. Duan, “Openflow random host mutation: Transparentmoving target defense using software defined networking,” in Proceedings of the First Work-shop on Hot Topics in Software Defined Networks, HotSDN ’12. New York: ACM, 2012,pp. 127–132.

39. V. Inc., “Vmware vcloud networking and security overview,” in white paper, 2012.

40. C.-J. Chung, J. Cui, P. Khatkar, and D. Huang, “Non-intrusive process-based monitoringsystem to mitigate and prevent vm vulnerability explorations,” in 2013 9th InternationalConference on Collaborative Computing: Networking, Applications and Worksharing (Col-laboratecom), Austin, TX. IEEE, Washington, DC, 2013, pp. 21–30.

41. N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker,and J. Turner, “Openflow: Enabling innovation in campus networks,” in ACM SIGCOMMComputer Communication Review, vol. 38, no. 2, pp. 69–74. Apr, 2008.

“9780471697558c11” — 2015/3/20 — 12:02 — page 294 — #26

294 CLOUD SECURITY

42. P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, andA. Warfieldh, “Xen and the art of virtualization,” in Proceedings of the Nineteenth ACMSymposium on Operating Systems Principles (SOSP), New York 2003.

43. A. Alhomoud, R. Munir, J. P. Disso, I. Awan, and A. Al-Dhelaan, “Performance evaluationstudy of intrusion detection systems,” in The 2nd International Conference on Ambient System,Networks and Technologies, Niagara Falls, Canada, 2011.

44. “Back Track Linux.” [Online]. Available: http://www.backtrack-linux.org.

“9780471697558c12” — 2015/3/20 — 12:03 — page 295 — #1

12

SURVIVABILITY AND FAULTTOLERANCE IN THE CLOUD

Mohamed Faten Zhani1 and Raouf Boutaba2

1Department of Software and IT Engineering, École de technologie supérieure,University of Quebec Montreal, Canada

2D.R. Cheriton School of Computer Science, University of Waterloo, Waterloo,Ontario, Canada

12.1 INTRODUCTION

In recent years, cloud computing has emerged as a successful model to offer computingresources in an on-demand manner for large-scale Internet services and applications.However, despite the success of cloud computing, many companies are still reluctant toembrace the cloud, mainly because of the lack of hard guarantees on the survivability andreliability of the offered services. As a result, cloud providers are urged to put in placestrategies to deal with failures, mitigate their impact, and improve the fault tolerance oftheir infrastructures in order to ensure high availability of services.

Recent reports and studies have highlighted the devastating impact of failures andservice outages on any enterprise in terms of profitability, reputation, and even viability.During the last couple of years, service outages have affected millions of online cus-tomers around the world [1]. Although the root causes of such outages may differ (e.g.,software bugs, hardware failures, unexpected demand, human mistakes, denial of serviceattacks, and misconfiguration), the consequences can be disastrous for many businesses.


295

“9780471697558c12” — 2015/3/20 — 12:03 — page 296 — #2

296 SURVIVABILITY AND FAULT TOLERANCE IN THE CLOUD

Obviously, the impact of service downtime varies considerably with the application andthe business [2]. For some critical applications, the cost of downtime can run between$84,000 and $108,000 per hour [3].

Even for less-critical services, recurrent outages can damage the company’s reputa-tion. Although it may not be possible to directly assess monetary loss due to reputationimpairment, there is no doubt that it impacts customer’s loyalty, which in the long termaffects the revenue and even viability of an enterprise. In North America alone, IT down-time cost businesses more than 26 billion dollars in revenue in 2010 [2]. As more andmore critical services run on the cloud, ensuring high availability and reliability of thecloud resources has become a vital challenge in cloud computing environments.

This chapter provides a comprehensive study of fundamental concepts and tech-niques related to survivability and reliability in cloud computing environments. It firstlays out key concepts of the cloud computing model and concepts related to survivabil-ity, and then it presents an overview of the outcomes of recent analyses of failures inthe cloud. Finally, it reviews and discusses existing techniques aimed at improving faulttolerance and availability of cloud services. The ultimate goal is to develop a compre-hensive understanding of state-of-the-art solutions for improving cloud survivability andreliability, and to provide insights into the critical challenges to be addressed in the future.

12.2 BACKGROUND

In this section, we first provide a brief overview of the cloud computing paradigm. Wethen review the fundamental concepts related to fault tolerance and survivability. Theseconcepts have been well studied in the computer industry, and they can be easily appliedto the systems that make up cloud computing environments.

12.2.1 Cloud Computing Fundamentals

In recent years, cloud computing has arisen as a cost-effective platform for hostinglarge-scale Internet services and applications. In typical cloud environments, the mainstakeholders are: the cloud provider (CP), service providers (SPs), and end users. Thecloud provider owns the physical infrastructure and leverages virtualization technologyto partition the available resources and lease them to multiple service providers [4]. EachSP then uses the leased resources to deploy its services and applications and offer themto end users through the Internet.

Currently, CPs like Google Compute Engine [5] and Amazon EC2 [6] only offercomputing and storage resources (i.e., virtual machines) without any guarantees on net-work performance. The lack of such guarantees results in variable and unpredictableperformance and also several potential security risks as applications can impact eachother [7].

To address these issues, recent research proposals have advocated offering both com-puting and networking resources in the form of virtual data centers (VDCs) (also knownas virtual infrastructures) [7–9]. Basically, a VDC is made up of virtual machines (VMs)and virtual switches connected through virtual links. Virtual machines and switches arecharacterized by their capacity in terms of processing, memory, and disk size, whereas

“9780471697558c12” — 2015/3/20 — 12:03 — page 297 — #3

BACKGROUND 297

virtual links provide guaranteed bandwidth and eventually bounded propagation delay.From the cloud provider’s perspective, VDCs are a means to ensure better performanceisolation between different user services. In addition, as the resource requirement of eachVDC is provided by SPs, CPs are able to take more informed management decisions anddevelop fine-grained resource allocation schemes. At the same time, SPs’ benefit fromusing VDCs by taking full advantage of the cloud computing model (particularly, in termsof costs) with assured guarantees in terms of the computing and networking resourcesallocated for their applications and services, as well as greater security, thanks to betterisolations between VDCs.

One of the key challenges faced by cloud providers is the VDC embedding (alsoknown as mapping) problem, which aims at allocating computing and networkingresources to the VMs and virtual links with the goal of achieving several objectives:

1. Maximize the revenue generated from the embedding of VDCs.

2. Minimize VDC request queuing delay, which is the time an SP has to wait beforeits requested VDC is allocated.

3. Minimize the energy consumed by the physical infrastructure, which is usuallyachieved by consolidating VMs in a minimal number of servers.

4. Provide guarantees on (or at least maximize) the availability of the resourcesallocated to VDCs.

Existing VDC embedding schemes typically attempt to achieve simultaneously morethan one of these objectives, which may sometimes conflict with each other. In this chap-ter, we focus mainly on the schemes that have targeted at least the fourth objective, whichis related to VDC fault tolerance and availability.

12.2.2 Survivability-Related Concepts

In the following, we provide the definition of the basic terms related to survivability andfault-tolerance.

• Fault tolerance: Fault tolerance is the property of a system that is able to operatecorrectly despite the presence of hardware or software failures. Generally speak-ing, fault tolerance is achieved through creating backups that take the place offailed components, and thereby ensure the continuity of the service.

• Reliability: Reliability is the conditional probability that a system remains opera-tional for a stated interval of time given that the system was operating flawlessly.Generally speaking, there are two widely used metrics that can capture the reliabil-ity of a system: namely, the mean time between failures (MTBFs) and the failurerate. The MTBF is the mean up time between failures of a system, and the failurerate is the expected number of failures per a given time period

• Availability: Availability of a system is defined as the percentage of time for whichthe system in question is operational. It can also be seen as the probability thatthe system is up at any given time. Specifically, the availability An ∈ [0, 1] of aphysical device n is given by

“9780471697558c12” — 2015/3/20 — 12:03 — page 298 — #4


TABLE 12.1. Availability vs. daily and monthly downtimes

System Tolerable Tolerableavailability (%) daily downtime monthly downtime

95 1 h:12 min 1 day 12 h 31 min99 14 min:2 s 7 h:18 min:17 s99.9 1 min:26 s 43 min 49 s99.99 8.6 s 4 min:23 s99.999 0.9 s 26.3 s99.9999 0.1 s 2.6 s

An =MTBFn

MTBFn + MTTRn

(12.1)

where MTBFn and MTTRn represent the MTBFs and the mean time to repair forthe device n, respectively. Both MTBFn and MTTRn can be computed based on thehistorical records of failure events. Table 12.1 provides the tolerable daily down-time associated with some availability value. It is worth noting that the availabilityis usually expressed in 9s. For instance, five 9s means an availability of 99.999%.The 9s are a logarithmic measures; that is, a system with five 9s availability is 10stimes more available than another one with four 9s.

• Fault domain: A fault domain is a set of devices that share a single point of fail-ure [10]. For instance, servers connected to the same top-of-rack switch belongto the same fault domain. It is also worth noting that a device may belongsimultaneously to multiple fault domains.

12.3 FAILURE CHARACTERIZATION IN CLOUD ENVIRONMENTS

In this section, we briefly review recent works on failure characterization in cloudenvironments, and then summarize the main outcomes of these studies.

Wu et al. [11] proposed NetPilot, an automated failure mitigation system. NetPilotdeals automatically with failures in large-scale data centers without human intervention.The system is built based on the analysis and characterization of failures reported inproduction data centers over a period of 6 months. The authors identified three maincauses of failures: software failures, which account for 21% of the total number of fail-ures; hardware failures, which represent 18% of the total failures; and misconfiguration,which is the main cause of failure with 38% of the total number of failures. The studyreported also that simple failure mitigation operations are very effective in cutting downfailure repair times. However, some failures may require a high repair time, and thus maylead to significant service downtimes. These results are in line with the ones reported inRef. [12] that revealed that 95% of network failures can be repaired within 10 min, andonly 0.09% of the failures may need more than 10 days to be repaired. This shows that

“9780471697558c12” — 2015/3/20 — 12:03 — page 299 — #5

AVAILABILITY-AWARE RESOURCE ALLOCATION SCHEMES 299

repair times and failure impact on services may vary significantly depending on the typeof failure.

Vishwanath et al. [13] analyzed the failures of more than 100,000 servers running inmultiple data centers owned by Microsoft. The authors analyzed the logs collected over14 months. Their main finding was that server failures are mainly caused by hard disk,memory, and raid controller failures. The study revealed that hard disk failures accountfor 78% of the total failures. It also reported that there is a high correlation between thenumber of disk drives in the server and the number of server failures. Finally, they foundthat failures are recurrent: devices that have experienced failures are more likely to failagain in the near future. This shows that device failure rates have a skewed distribution.

Gill et al. [14] performed a characterization of failures which occurred in severalMicrosoft data centers. In their study, they first identified networking devices that weremore prone to failure. Then, they assessed the impact of failures on application per-formance and evaluated the effectiveness of network redundancy. They reported that75% of networking equipment was top-of-rack switches, 15% were core and aggrega-tion switches, and 10% were load balancers (LBs). They noticed that the failure ratesof the equipment varied significantly with the type and the model (LBs, servers, top-of-rack switches, aggregation switches, routers). In particular, LBs had the highest failureprobability (20%) during a 1-year period. Switches have much lower failure probabilityat less than 5%. Furthermore, the failure rates of different devices were unevenly dis-tributed; for instance, the number of failures across LBs was highly variable. Some LBdevices experienced more than 400 failures during 1 year. Finally, failure traces showthat correlated failures were extremely rare.

Based on the observations of the aforementioned studies, we can summarize themain characteristics of failures in data centers as follows. The duration of failures isextremely variable [12, 15]. Indeed, some failures can last for seconds, whereas otherslast for days. Data center equipment exhibit high heterogeneity in terms of failure ratesand availability. This suggests that such heterogeneity should be taken into account whendesigning and deploying fault-tolerance mechanisms.

12.4 AVAILABILITY-AWARE RESOURCE ALLOCATION SCHEMES

In the following, we provide a survey of the most representative proposals that haveaddressed the VDC embedding problem while taking into consideration VDC require-ments in terms of resources and availability.

12.4.1 Survivable Mapping

Xu et al. [16] proposed a survivable VDC (termed “virtual infrastructure” in the paper)embedding scheme that allocates resources not only to VDCs but also to backup VMs andvirtual links with the goal of minimizing total consumed resources. The authors definedbasic requirements to ensure that backup VMs can take over in case of failures. Theserequirements are translated into placement constraints that have to be considered whilemapping the resources. For instance, the first constraint is that each VM and its backup

“9780471697558c12” — 2015/3/20 — 12:03 — page 300 — #6


should be placed in two different machines. The second constraint is that there must besufficient available bandwidth for each backup VM to communicate with other VMs sothat it can replace the failed VM without any impact on service performance. However,the proposed scheme does not consider the availability of the physical machines andassumes that the number of backups is known beforehand (a backup VM and a backupvirtual link for each VM and virtual link, respectively). Finally, this approach does notconsider cases where switches fail resulting in the disconnection of a set of servers at thesame time. Hence, it only allows mitigating failures that occur at the servers.

12.4.2 ORP

Yeow et al. [17] studied the problem of allocating resources to VDCs (termed virtualinfrastructures in the paper) in a physical infrastructure such that the desired availabilityfor each VDC is guaranteed. Figure 12.1 provides an example of a virtual data center andshows how this VDC is mapped to a physical data center. It also shows backup nodesand links that are provisioned in order to achieve a desired availability.

The first challenge addressed by the authors was how to estimate the availability ofa VDC. Hence, they developed a formula to compute the availability of a VDC basedon the provisioned number of backups and the availability of physical machines hostingthe VDC. The study makes two main assumptions: (1) the data center is homogenous,that is, all physical devices have same availability and failure probability; and (2) nodefailures are independent. Hence, the availability of a VDC that includes K backup virtualnodes is given by

AVDC =K∑

i=0

(N + K

i

)Ai (1 − A)N+K−i (12.2)

where N is the number of virtual nodes comprising the VDC, and A is the availabilityof the physical nodes hosting the VDC components. This formula allows the estimationof the number of backup nodes required to achieve a desired availability. The authorsthen proposed an opportunistic redundancy pooling (ORP) mechanism allowing multi-ple VDCs to share backups. The idea is to estimate the number of shared backups basedon Equation 12.2 simply by setting the variable N to the sum of the number of nodes of

Physical server

Virtual machine

Backup VM

Virtual link

Physical link

Physical switch

Virtual switch

b2 b2

n1

AVDC=99.99%

....

n2

b|N|

n0

n|N|

CPU

memorydisk

(a) (b)

Figure 12.1. VDC embedding [18]. (a) Example of a virtual data center (star topology).

(b) Example of embedding a virtual data center into a physical infrastructure.

“9780471697558c12” — 2015/3/20 — 12:03 — page 301 — #7


multiple VDCs. This approach can reduce the amount of allocated backups by upto 31.25%. Finally, the authors adapted the multicommodity flow technique used inRef. [19] to formulate the node and link joint resource allocation problem whileconsidering sharing backup resources among multiple VDCs.

Although this work presents many advantages, its application to real-world environ-ments is limited. Indeed, machine failure rates and availability are highly heterogeneousin production data centers [20]. Furthermore, the approach does not consider the avail-ability of networking elements (e.g., switches and routers) in the computation of VDCavailability.

12.4.3 VENICE

Zhang et al. [20] put forward an aVailability-aware EmbeddiNg framework In CloudEnvironments (VENICE). They first presented a technique to compute the availability ofan embedded VDC that considers the heterogeneity of the physical devices in terms offailure rate and availability. They then proposed an embedding that uses the availabilitycomputation technique to achieve the desired availability for the hosted VDCs.

In their work, the authors considered that each VDC is hosting a multi-tier serviceapplication (e.g., a three-tier Web application, as shown in Fig. 12.2a). Each tier containsa set of VM replicas that communicate with the VMs of the following tier. The authorspresented a technique to compute the availability of a particular VDC mapping based onthe availability of the underlying physical devices. The key idea behind this technique isto compute the VDC availability by considering all possible failure scenarios. Specifi-cally, a failure scenario is a specific configuration in which some physical componentshave failed. Figure 12.2b shows an example of 3 failure scenarios (s1, s2, and s3) thatcould affect the VDC drawn in Figure 12.2b. Hence, VENICE first identifies all possiblefailure scenarios that could impact the operation of the VDC, computes the availability

VDC

Physical

data center

Mapping of virtual components to physical components

Web server

App servers Databases

TierSi: Failure scenario i

(b)(a)

n6

n7

l4

l7l

3

l3

l5

l6

l7

l4

n5

n5

S1

S2 S3

n6

n7

n1

n1

n1

n3

n3

n2

n2

n4

n5

n4

n5

n2

n2

n3

n3

n4

n4

l5 l

2

l2

l5

l6

Figure 12.2. VDC embedding [20]. (a) Embedding of a VDC running a three-tier web applica-

tion. (b) Example of three failure scenarios.

“9780471697558c12” — 2015/3/20 — 12:03 — page 302 — #8


of the VDC under each of them and finally estimates the overall VDC availability usingconditional probability. More formally, define N as the set of physical components andsi(n) ∈ {0, 1} as a boolean variable indicating whether or not the device n is down. Letsi = (si(n))n∈N denote a failure scenario that involves k simultaneous physical failures,and S = (si)i∈|S| denote the set of all possible failure scenarios. The availability A

siVDC of

a particular VDC under failure scenario si ∈ S is computed as follows:

AsiVDC =

∏n∈N:si(n)=1

(1 − An)∏

n∈N:si(n)=0

An (12.3)

The overall availability of the VDC is then computed as follows:

AVDC =

|S|∑i=1

P(si)AsiVDC (12.4)

where P(si) is the probability that failure scenario si occurs.Using this technique, the authors addressed the availability-aware VDC embedding

problem where each service provider specifies not only the resource requirements of theVDC but also its desired availability. VENICE tries then to achieve the desired VDCavailability by carefully placing the VDC components. The goal of the VDC allocationalgorithm is to maximize the total revenue of the cloud provider while minimizing thepenalty incurred due to service unavailability. Unfortunately, neither the proposed avail-ability computation technique nor the embedding scheme consider the case where backupnodes and links are provisioned.

12.4.4 Hi-VI

Rabbani et al. [18] proposed a high-availability virtual infrastructure management frame-work (Hi-VI). The Hi-VI framework dynamically provisions backup resources (i.e.,virtual nodes and virtual links) for each VDC in order to achieve the desired availabil-ity for the VDCs. The originality of this approach is that it takes into consideration theheterogeneity of data center computing and networking equipments. Hence, it considersthe case where equipments have different failure rates and availabilities.

The authors derived a formula to compute the availability of a particular VDC. Theformula uses the availability of the physical equipment hosting the VDC componentsand also considers the number of provisioned backup nodes and links. However, twosimplifying assumptions were made: (1) VDCs have a star topology, that is, each VDCcomprises a set of VMs connected to a single virtual switch; and (2) equipment doesnot fail simultaneously, that is, only a single physical failure may occur at a time. Theintuition behind the proposed formula is that since there are K backups, it is possible toreplace up to K failed VMs. In other words, the availability of a VDC with K backups isgiven by the probability of having fewer than K failures. Mathematically, the availabilityof a particular VDC as denoted by AVDC is written as:

AVDC =

( ∏n:yn=1

ynAn

)+

K∑k=1

( ∑n:gn=k

((1 − An)

∏t∈N\{n}:yt=1

ytAt

))(12.5)

“9780471697558c12” — 2015/3/20 — 12:03 — page 303 — #9


where An is the availability of a physical component n ∈ N, K is the number of backupVMs (which is equal to the number of backup virtual links), and yn is a boolean variablethat takes 1 if the physical node n ∈ N either hosts one of the VMs of the VDC or is usedas an intermediate node to embed a virtual link. The variable gn is the number of VMsmapped to physical machine n. The first term of the equation is the probability that allvirtual nodes are operational (that is, all physical nodes hosting the VDC are available),whereas the second term represents the probability that a single physical node failureoccurs, incurring less or equal than K virtual node failures.

Using the availability formula (Eq. 12.5), the authors proposed a VDC embeddingalgorithm that jointly allocates computing and networking resources for VDCs and thevirtual backup nodes and links. The idea is to first embed the VDC and then graduallyadd new backups in high availability nodes until the desired availability is achieved. If itis not possible to meet the desired availability, the VDC is simply rejected. The ultimategoal of the algorithm is to ensure all embedded VDCs satisfy the desired availabilitywhile minimizing the number of backups and the number of servers used to host thevirtual resources.

12.4.5 WCS

Bodik et al. [10] studied resource allocation in data centers that achieve the best trade-off between fault tolerance and bandwidth usage. Indeed, when VMs of the same VDC(termed “service” in the paper) are spread across the data center, they are less likely tobe affected by the same failure (e.g., top-of-rack failures) but they consume significantbandwidth in the data center network, as they are far from each other (Fig. 12.3).Conversely, when these VMs are placed close to each other, they consume less band-width, but a single failure (e.g., at the top-of-rack level) may simultaneously affectmany VMs.

Based on this observation, the authors proposed an allocation scheme that mitigatesfailure impact on the virtual data center while minimizing bandwidth usage in the datacenter network. They hence put forward a new metric named the worst-case survival(WCS) to measure the fault-tolerance of a particular VDC. The WCS of a VDC is definedas the number of its VMs that remain available during a single worst-case failure dividedby the total number of its VMs.

Physical server

Virtual machine

Virtual link

Physical link

Physical switch

Virtual switch

(a) (b)

Figure 12.3. Tradeoff between fault tolerance and bandwidth usage [10]. (a) Allocation opti-

mized for bandwidth but with low fault-tolerance. (b) Allocation optimized for fault-tolerance:

more bandwidth is consumed.

“9780471697558c12” — 2015/3/20 — 12:03 — page 304 — #10


The proposed resource allocation scheme includes two basic operations: (1) band-width minimization and (2) fault-tolerance optimization. The bandwidth minimizationis performed only once, at the initial allocation of the VDC. It consists of applyingthe K-way min-cut to split VMs of the same VDC into partitions such that their intercommunication bandwidth is minimized. These partitions are initially placed into dif-ferent racks in the data center. The fault-tolerance optimization is then accomplishedby gradually spreading out the VMs one by one across multiple fault domains in orderto maximize the VDC worst-case survival while ensuring that bandwidth consumptionremains low.

This solution has several limitations. First, it does not consider the availability of theunderlying physical components. Also, it considers only the worst-case failure, whichoccurs mainly in aggregation switches. Hence, it ignores other types of failures that mayhappen, for example in top-of-rack switches or servers. Finally, the paper assumes thata physical machine only hosts one VM of the same VDC. Consequently, large VDCswill be mapped onto a large number of distinct servers, and hence will lead not onlyto a high number of used physical machines but also to high bandwidth usage in thedata center.

12.4.6 SVNE

Rahman et al. [21] were the first to introduce and study the problem of survivable vir-tual network embedding (SVNE). The problem relates to protecting a virtual network(equivalent to a VDC but deployed over an ISP network rather than a data center), andmore specifically virtual links, against physical link failures. As a virtual link betweentwo nodes maps onto a path (set of connected physical links), the authors consideredtwo types of virtual link protection and restoration mechanisms. The first mechanism iscalled link protection and restoration and basically aims at protecting the virtual link byprotecting each physical link comprised in its associated path. Hence, a backup detouris provided for each physical link in the path. The second mechanism is the path protec-tion and restoration mechanism, which requires the provision of a backup path for eachprimary path associated to a virtual link. Of course, it is mandatory that the primary andbackup paths have disjoint links.

Figure 12.4 shows an example of a virtual link embedded into a physical network.The continuous line shows the primary path to which the virtual link is mapped. Thedashed lines represent the protection of each physical link when the link protection andrestoration mechanism is used. Finally, when the path protection and restoration mecha-nism is adopted, the dotted line represents the allocated protection path, which is disjointfrom the primary path (i.e., initial embedding).

The authors mathematically formulated the SVNE problem as an integer linear pro-gram and proposed two heuristic solutions: a proactive solution and a reactive one. Thefirst one addresses failures by proactively provisioning backup resources for potentialfailures in the future. The main drawback of this approach is that it may result in wastingup to 50% of physical network resources. The second solution addresses this drawback byreactively handling failures. Hence, it determines the backup path when a failure occurs.The advantage of such approach is the use of fewer resources for backups and hence

“9780471697558c12” — 2015/3/20 — 12:03 — page 305 — #11


Virtual link

Mapping

Mapping

Path protectionLink protectionPrimary path

Virtualresources

Physicalnetwork

Figure 12.4. Link and path protection of a virtual link.

more virtual networks can be embedded in the physical infrastructure. The main limita-tion of the proposed solutions is that they assume that only a single failure may occurat a time.

12.4.7 Discussion

Generally speaking, these proposals can be classified into (1) availability-awareresource placement schemes, which attempt to improve VDC availability by care-fully selecting the physical nodes hosting the virtual data center; and (2) redundancyprovisioning techniques that allocate backup resources in order to achieve the desiredavailability.

Table 12.2 compares different features of the surveyed schemes. For each, the tableprovides the following information:

1. What type of backup resources are provisioned (i.e., virtual nodes, virtual links,or both);

2. Whether or not the backup resources are shared among different VDCs;

3. Whether or not the scheme provides a technique to estimate the number of virtuallinks/nodes provisioned as backups;

4. Whether or not the scheme provides a technique to compute the availability ofa VDC;

5. Whether the technique to compute the VDC availability takes into considerationthe heterogeneity of the equipment in terms of availability and failure rates;

6. Whether the scheme makes the assumption that one physical server can only hostone VM from the same VDC.

“9780471697558c12” — 2015/3/20 — 12:03 — page 306 — #12


TABLE 12.2. Comparison of survivable embedding schemes

Proposal Backup Shared Estimate the Computing Heterogeneity VM col-resources or not # of backups avail. location

Sur. Map. [16] Nodes No No No N/A YesORP [17] Nodes & Links Yes Yes Yes No NoSVNE [21] Links No N/A No N/A N/AWCS [10] Nodes No No No N/A NoHi-VI [18] Nodes & Links No Yes Yes Yes YesVENICE [20] No No N/A Yes Yes Yes

Based on the surveyed works presented earlier, we can make the followingobservations:

• Most of the existing proposals did not take into account the availability of thephysical devices (e.g., Refs. [10, 16]). Furthermore, many of them have assumedthat the cluster is homogenous, that is, all devices have the same availabilitiesand failure rates (e.g., Ref. [17]), whereas in practice, cloud computing environ-ments are extremely heterogeneous. Indeed, physical devices have different types,capacities, failure rates, MTBFs, availability, and reliability. As a result, propos-als overlooking this heterogeneity have limited applicability in real-world cloudenvironments.

• Some proposals (e.g., Refs. [17, 22]) make the assumption that one physical nodeis able to host only one virtual component from the same VDC. This assumptiondoes not hold in practice, since the main benefit of virtualization is the possibilitythat multiple virtual components share the same physical device. In addition, suchan assumption has an impact on the way resources are allocated. For instance,in order to satisfy this assumption, for a VDC containing 1000 VMs, the VMsshould be placed into 1000 physical nodes even when it is possible to consolidatethem into 500 machines. The resulting allocation is hence suboptimal, as it leadsto the usage of a higher number of servers in addition to the consumption of morebandwidth between the VMs (as they are scattered across 1000 machines ratherthan 500). Consequently, an effective solution should allow multiple VMs fromthe same VDC to share a single host whenever possible.

• Some other proposals (e.g., Ref. [17]) do not consider the availability of the net-working devices (e.g., physical routers and switches) and middleboxes. However,recent studies [14] have revealed that these devices, and in particular top-of-rack switches and LBs, are more prone to failure than other types of equipment.Consequently, it is necessary to take into consideration the availability of suchcomponents in the VDC availability computation.

• Finally, all the proposals, without exception, assumed that only a single failurecould occur at a given time. Although according to some studies [14], it might bereasonable to make such an assumption, it is still not realistic as simultaneous andcorrelated failures may occur in practice. In this case, many challenges remain

“9780471697558c12” — 2015/3/20 — 12:03 — page 307 — #13

REFERENCES 307

unsolved like computing the availability of a virtual data center, estimating thenumber of backup nodes and links required to achieve a desired availability, andalso deciding the placement of these backups.

12.5 CONCLUSION

Despite the widespread adoption of cloud computing, failures and service outages loomas major concerns, pressing cloud providers to put in place strategies to deal with failures,mitigate their impact and improve the fault tolerance of their infrastructures in order toprovide more stronger guarantees on the availability of their resources.

This chapter provided a comprehensive study of cloud survivability and reliabilityconcepts and solutions, including an overview of recent studies of failures in productionenvironments and a survey of the relevant solutions proposed to improve the availabilityof virtual data centers in the cloud. We discussed the main features of each of the solutionsand highlighted their advantages and limitations.

We believe that there are still a lot of challenges to overcome in order to offer highlyreliable and available cloud services. Specifically, more work should be dedicated tostudying failure characteristics, and particularly the correlation between failures. Thereis also a pressing need to develop more sophisticated solutions for improving the surviv-ability and fault tolerance of cloud services, taking into account the heterogeneity of thecloud infrastructures as well as scenarios where multiple failures occur simultaneously.

REFERENCES

1. Costs and scope of unplanned outages. http://www.evolven.com/blog/2011-devastating-outages-major-brands.html. Accessed on November 17, 2014.

2. Downtime outages and failures, understanding their true costs. http://www.evolven.com/blog/downtime-outages-and-failures-understanding-their-true-costs.html. Accessed on November17, 2014.

3. Costs and scope of unplanned outages. http://www.evolven.com/blog/costs-and-scope-of-unplanned-outages.html. Accessed on November 17, 2014.

4. Md. Faizul Bari, Raouf Boutaba, Rafael Esteves, Lisandro Zambenedetti Granville, MaximPodlesny, Md Golam Rabbani, Qi Zhang, and Mohamed Faten Zhani. Data center net-work virtualization: A survey. IEEE Communications Surveys & Tutorials, 15(2):909–928,2013.

5. Google compute engine. https://cloud.google.com/. Accessed on November 17, 2014.

6. Amazon elastic compute cloud (Amazon EC2). http://aws.amazon.com/ec2/. Accessed onNovember 17, 2014.

7. Hitesh Ballani, Paolo Costa, Thomas Karagiannis, and Ant Rowstron. Towards predictabledatacenter networks. In ACM SIGCOMM, Toronto, Ontario, Canada, August 2011.

8. Mohamed Faten Zhani, Qi Zhang, Gwendal Simon, and Raouf Boutaba. VDC Planner:Dynamic migration-aware virtual data center embedding for clouds. In IEEE/IFIP Interna-tional Symposium on Integrated Network Management (IM), Ghent, Belgium, May 27–31,2013.

“9780471697558c12” — 2015/3/20 — 12:03 — page 308 — #14


9. Chuanxiong Guo, Guohan Lu, Helen J. Wang, Shuang Yang, Chao Kong, and Peng Sun. Sec-ondnet: A data center network virtualization architecture with bandwidth guarantees. In ACMCoNEXT, Philadelphia, PA, 2010.

10. Peter Bodík, Ishai Menache, Mosharaf Chowdhury, Pradeepkumar Mani, David A. Maltz,and Ion Stoica. Surviving failures in bandwidth-constrained datacenters. In ACM SIGCOMM,Helsinki, Finland, 2012.

11. Xin Wu, Daniel Turner, Chao-Chih Chen, David A. Maltz, Xiaowei Yang, Lihua Yuan,and Ming Zhang. Netpilot: Automating datacenter network failure mitigation. SIGCOMMComputer Communication Review, 42(4): 443–454, August 2012.

12. Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim,Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. VL2: A scalable andflexible data center network. In ACM SIGCOMM, Barcelona, Spain, 2009.

13. Kashi Venkatesh Vishwanath and Nachiappan Nagappan. Characterizing cloud computinghardware reliability. In ACM Symposium on Cloud Computing (SOCC), Indianapolis, IN,2010.

14. Phillipa Gill, Navendu Jain, and Nachiappan Nagappan. Understanding network failures indata centers: Measurement, analysis, and implications. In ACM SIGCOMM, Toronto, Ontario,Canada, 2011.

15. Ming Zhang, Chi Zhang, Vivek Pai, Larry Peterson, and Randy Wang. Planetseer: Inter-net path failure monitoring and characterization in wide-area services. In Symposium onOpearting Systems Design & Implementation (OSDI), San Francisco, CA, 2004. USENIXAssociation, Seattle, WA, pages 12–12. http://dl.acm.org/citation.cfm?id=1251266. Accessedon December 9, 2014.

16. Jielong Xu, Jian Tang, K. Kwiat, Weiyi Zhang, and Guoliang Xue. Survivable virtualinfrastructure mapping in virtualized data centers. In IEEE International Conference onCloud Computing (CLOUD), Honolulu, HI, 2012. http://www.thecloudcomputing.org/2012/.Accessed on December 9, 2014.

17. Wai-Leong Yeow, Cedric Westphal, and Ulas C. Kozat. Designing and embedding reliablevirtual infrastructures. SIGCOMM Computer Communication Review, 41(2): 53–56, April2011.

18. Md Golam Rabbani, Mohamed Faten Zhani, and Raouf Boutaba. On achieving high surviv-ability in virtualized data centers. IEICE Transactions on Communications, E97-B(1): 10–18,January 2014.

19. N. M. Mosharaf Kabir Chowdhury, Muntasir Raihan Rahman, and Raouf Boutaba. Virtualnetwork embedding with coordinated node and link mapping. In IEEE INFOCOM, Rio DeJanerio, Brazil, 2009, pages 783–791.

20. Qi Zhang, Mohamed Faten Zhani, Meyssa Jabri, and Raouf Boutaba. Venice: Reliablevirtual data center embedding in clouds. In IEEE International Conference on ComputerCommunications (INFOCOM), Toronto, Ontario, Canada, April 27–May 2, 2014.

21. Muntasir Raihan Rahman and Raouf Boutaba. SVNE: Survivable virtual network embed-ding algorithms for network virtualization. IEEE Transactions on Network and ServiceManagement, 10(2):105–118, 2013.

22. Hongfang Yu, Vishal Anand, Chunming Qiao, and Gang Sun. Cost efficient design of sur-vivable virtual infrastructure to recover from facility node failures. In IEEE InternationalConference on Communications (ICC), Kyoto, Japan, June 2011.

“9780471697558part4” — 2015/3/20 — 12:10 — page 309 — #1

PART IV

CLOUD APPLICATIONSAND SERVICES

“9780471697558part4” — 2015/3/20 — 12:10 — page 310 — #2

“9780471697558c13” — 2015/3/20 — 12:15 — page 311 — #1

13

SCIENTIFIC APPLICATIONSON CLOUDS

Simon Ostermann, Matthias Janetschek, Radu Prodan, andThomas Fahringer

Institute for Computer Science, University of Innsbruck, Innsbruck, Austria

13.1 INTRODUCTION

Today, scientific applications require an ever-increasing number of resources to deliverresults for growing problem sizes in a reasonable amount of time. In the past 20 years,while the largest projects were able to afford expensive supercomputers, the smaller oneswere forced to opt for cheaper resources such as commodity clusters or, more challengingto build, computational Grids. To program such large-scale distributed heterogeneousinfrastructures, scientific workflows emerged as an attractive paradigm by allowing theprogrammers to focus on the composition of existing legacy code fragments to createlarger and more powerful applications. Therefore, numerous efforts have been spent onresearching and developing integrated programming and computing environments [1] tosupport the workflow lifecycle and meet scientists’ needs.

Nowadays, Cloud computing proposes an alternative such that resources are nolonger owned by the application scientists, but leased from large specialized data centerson-demand and in a cost-effective fashion according to temporal needs. This separa-tion frees research institutions from the permanent costs of over-provisioning, opera-tion, maintenance, and depreciation of resources. Existing workflow systems cannot


311

“9780471697558c13” — 2015/3/20 — 12:15 — page 312 — #2

312 SCIENTIFIC APPLICATIONS ON CLOUDS

senselessly take advantage of this new infrastructure without appropriate middlewaresupport that often requires nontrivial extensions to the scheduling, enactment, resourcemanagement, and other runtime execution services. At the same time, existing Cloudproviders such as Amazon recognized the importance of workflows to science andengineering and started to provide highly tuned solutions integrated into their nativeplatforms [2] such as the Amazon Simple Workflow (SWF) service. Other platforms likeOpenStack or CloudStack do not offer advanced services for workflows and requirethe used of external workflow engines for such executions. However, existing work-flow systems [1] cannot immediately take advantage of this advanced support becauseof different, incompatible languages, interfaces and communication protocols. Anotherdownside of SWF is that it requires applications to be written in Java and to implementspecific interfaces, which is problematic for scientific workflows based on the composi-tion of legacy code fragments. Using SWF requires scientists to learn a new developmentand execution platform in addition to the one they already regularly use but is very simpleto get used to compared to most other scientific workflow environments.

To address this heterogeneity in workflow systems and underlying computing infras-tructures, the SHIWA European project (http://www.shiwa-workflow.eu/) researchedand developed the Interoperable Workflow Intermediate Representation (IWIR) [3]that enables fine-grained interoperability between workflow systems via transparenttranslation of workflows applications programmed in different languages. IWIR is ageneric and system-neutral workflow representation able to sufficiently describe thelarge majority of existing workflow constructs. The common representation reduces thecomplexity of porting n workflow systems on m computing platforms from O(m · m)to O(n + m). Additionally, it enables the integration of new workflow systems andnew computing platforms with constant O(1) complexity by implementing IWIRimporters/exporters. This ensures not only interoperability across workflow systemsbut also enables workflows to be executed on new external foreign (or nonnative)computing infrastructures. IWIR provides additional tools and libraries to ease thedevelopment of language translators, and is currently supported by five major work-flow systems: ASKALON (AGWL language) [4], Moteur (GWENDIA language) [5],WS-PGRADE (gUSE language) [6], Pegasus [7], and Triana (DAX representation) [8](see Fig. 13.1).

In this chapter, we take advantage of IWIR and present a scalable software engi-neering solution that provides existing scientific workflows access to the Amazon

IWIR

IWIR Converter 1

IWIR Converter 2

IWIR Converter 3

IWIR Converter 4

AGWL

gUSE

GWENDIA

DAX

ASKALON

WS-PGRADE

MOTEUR

Triana

Java (Amazon SWF)

Figure 13.1. SHIWA fine-grained interoperability.

“9780471697558c13” — 2015/3/20 — 12:15 — page 313 — #3

RELATED WORK 313

Elastic Compute Cloud (EC2) infrastructure. By designing and implementing one singleIWIR-to-SWF converter, we automatically allow all IWIR-compliant workflow systemsto benefit from the SWF features and to access the EC2 infrastructure with native perfor-mance. We present a method for automatically converting a scientific workflow specifiedin IWIR into Amazon SWF, and a supporting architecture for reusing and executingexisting legacy code on EC2. We illustrate the integration and the advantages of ourarchitecture with the help of a real-world scientific workflow originally programmed inthe ASKALON integrated development and computing environment.

The chapter is organized as follows. We discuss related work in Section 13.3.Section 13.4 introduces the IWIR workflow model, followed by an introduction toAmazon SWF in Section 13.5. Section 13.6 introduces our pilot workflow applicationused for validation. Section 13.7 describes the conversion process of an IWIR workflowinto an Amazon SWF workflow. Section 13.8 presents experimental results from portingour pilot application to SWF. Section 13.10 concludes the chapter.

13.2 BACKGROUND INFORMATION

Nowadays, scientific computing is an important part of research in most academic dis-ciplines. Problems are getting more and more complex and finding solutions requiresan ever-increasing amount of computation. Simulations for weather, earthquake, nuclearresearch, and material science are just a few examples of areas where there one cannever have enough computation capacity available to make a realistic simulation withoutlots of model simplifications. Therefore, computer scientists are trying to find solutionsto make scientific computing faster, easier, and more reliable on the available set ofresources. Most applications developed by non-computer scientists are often hart to scaleonto clusters or super computers and need support from the computer science communityto scale them to nowadays clusters.

By introduction of cloud computing, a new resource type was added as possibleplatform to execute scientific applications. As this new technology is slowly adapted bycomputer scientists, it can be assumed that other fields of research that are relying on par-allel computing for solving their problems have a even higher learning curve to includethis new technologies into their everyday tool set. This gap between new technologiesand need for computing needs to be closed by tools developed by computer scientists toallow easier adaptation of clouds for scientific computing.

13.3 RELATED WORK

Since the advent of Cloud computing, the scientific community showed interest inbringing scientific workflows on this new infrastructure. This trend increased with theavailability of commercial Clouds featuring nearly the same performance as traditionalGrid parallel computers [9]. There exist two major approaches in this community effort:pure Cloud and hybrid combining Grid and Cloud infrastructure.

FutureGrid [10] provides a Cloud test-bed that allows scientists explore the fea-tures of Cloud computing and experiment without charging real costs, as commercial

“9780471697558c13” — 2015/3/20 — 12:15 — page 314 — #4


providers do. [11] shows a proof-of-concept astrophysics workflow called Montage usingthe Pegasus Grid workflow system adapted for Clouds. The work in Ref. [12] shows ameteorological workflow executed in combined Grid and Cloud infrastructures usingthe ASKALON environment. In Ref. [13] the Pegasus Workflow Management Sys-tem is used to execute a astrophysics workflow across multiple clouds and show howchallanging this task still is. A hybrid approach for extending clusters with additionalCloud resources during peak usage for better throughput, transparent to the end-usersis presented in Ref. [14]. [15] presents a similar approach using the Torque job man-ager. The work in Ref. [16] presents a workflow engine purposely developed for Cloudsand extended Cloud federations. The Megha workflow system [17] provides a portal forsubmitting workflows to combined Grid and Cloud resources.

A drawback of all these efforts is that they provide custom non-interoperable solu-tions that isolate scientists on specific workflow system and Cloud infrastructures. In thischapter, we show how the IWIR-based approach opens the Amazon EC2 infrastructureand its SWF workflow system to the scientific community through one single IWIR-to-SWF translator. The idea of a single intermediate language has been explored in otherdomains, for example, by the UNiversal Computer Oriented Language (UNCOL) [18]proposed in 1958 by Conway as a solution for making compiler development economi-cally viable. Following the UNCOL idea, the Architecture Neutral Distribution Format(ANDF) is a technology defined by the Open Software Foundation allowing common“shrink wrapped” binary programs be distributed for use on Unix systems running ondifferent hardware platforms. Unfortunately, ANDF was never widely adopted either.IWIR is the first effort to investigate this idea on scientific workflows in distributed Gridand Cloud computing infrastructures.

13.4 IWIR WORKFLOW MODEL

In IWIR, a workflow application is represented by a composite activity A = (I,O,G)consisting of n input ports I =

⋃ni=1 {Ii}, m output ports O =

⋃mi=1 {Oi}, and a

directed acyclic graph (DAG) G = (A,D), consisting of k activities A =⋃k

i=1 {Ai},interconnected through data flow dependencies:

D = {(Ai,Aj, (Oim, Ijn)) | (Ai,Aj) ∈ A × A ∧ (Oim, Ijn) ∈ Oi × Ij}, where (Oim, Ijn)represents a data transfer from the output port Oim of activity Ai to the input port Ijn ofactivity Aj. A data flow dependency between two activities implies a control flow prece-dence too. A pure control flow dependency between Ai and Aj has Dij = (Ai,Aj, ∅). Weuse pred (Ai) = {Ak| (Ak,Ai, (Okm, iin)) ∈ D ∨ (Ak,Ai, ∅) ∈ D} to denote the set of pre-decessors of activity Ai (i.e. activities to be completed before starting Ai). Figure 13.2shows an detailed example of such a DAG and its components.

Compared to business workflows, the main difference to the scientific workflows weare focusing on is the high computational requirments of the activities and not buissnesprocesses in general.

There are two categories of activities in IWIR: atomic and composite. An atomicactivity, represented by A = (I,O, ∅), is characterized by an activity type, uniquelydefined by a name and a signature. For example, activity names are PrepareLM,

“9780471697558c13” — 2015/3/20 — 12:15 — page 315 — #5

AMAZON SWF BACKGROUND 315

D12 D13

D24 D34I4m

O4m

Activity

Data transfer

In port

Out port

Control flowDependency

A1

A2 A3

A4

Figure 13.2. Example of a DAG with four activities.

LinearModel, PostProcessSingle and PostprocessFinal for our pilotworkflow introduced in Figure 13.5 and Section 13.6. The signatures of LinearModeland PostProcessSingle are shown in lines 8 and 19 of Listing 13.1. A compositeactivity, represented by A = (I,O,G), where G �= ∅, can be of four kinds: condi-tional (if), sequential loop (while, for, forEach), parallel loop (parallelFor,parallelForEach), and sub-workflow (or DAG, added for modularity reasons).

13.5 AMAZON SWF BACKGROUND

Amazon SWF provides a high-level method for implementing workflow applicationsand for coordinating their synchronous and asynchronous task executions on multiplesystems, which can be cloud-based, on-premises, or both. The architecture of AmazonSWF is displayed in Figure 13.3. SWF implements a work-stealing approach consistingof three parts: decider, task queues, and activity workers.

The decider implements the logic of the workflow. Unlike schedulers in scientificworkflow systems, the decider only decides which activity to execute next based on thehistory of already executed tasks, and not where to execute it. However, one still haslimited control by using several task queues where the decider puts the activities to beexecuted next.

The task queues hosted by Amazon are identified by their name and can be accessedvia an HTTP API. There are two types of tasks. First, decision tasks are generated byAmazon SWF and executed by the decider every time a state change exists (e.g., startof workflow instance or activity task termination). The result of a decision task is usu-ally a set of activity tasks that can be executed next. Second, activity tasks are executedby the activity workers and represent the individual pieces of work which comprise theworkflow.

The activity workers, as shown in Figure 13.4, execute the individual workflow activ-ities. The decider and the activity workers actively listen to one or more task queuesand, when a task is received, execute the corresponding code and report back the execu-tion status to Amazon SWF. All input values of an activity are contained in the task

“9780471697558c13” — 2015/3/20 — 12:15 — page 316 — #6


Amazon SWF

Getsdecision

tasks

Getactivity

tasks

Getactivity

tasks

Getactivity

tasks

Return

results

Return

results

Return

results

Return

decisions

The decider

implements the

application’s

bussiness logic

DECIDER

• Maintain distributed application state

• Tracks workflow executions

• Ensures consistency of execution history

• Provides visibility into executions

• Holds and dispatches takes

• Provides control over task distribution

• Retains workflow execution history

Workers foractivity 1

amozonwebservices

TM



Mobile On premisesCloud

AWS

Figure 13.3. Amazon Simple Workflow Service architecture.

AmazonSWF

AskalonIWIR

WF request

DeciderTask

queue

Activity

worker

Desicion task

Decision

Task execution

Polling for task

Desicion task

Decision

Polling for task

Workflow finished

Task finished

Executable task

Figure 13.4. Simplified sequence diagram of execution of a workflow with one task.

“9780471697558c13” — 2015/3/20 — 12:15 — page 317 — #7

RAINCLOUD WORKFLOW 317

request received from the task queue. Unlike traditional workflow systems, AmazonSWF provides no means to transfer files or prepare the execution environment. If an activ-ity requires some input or produces some output files, it has to transfer them by itself.For Amazon SWF, workflow activities are simply remote asynchronous procedure calls.

Developing a workflow with Amazon SWF requires the following steps: (1) developa decider implementing the logical workflow coordination, (2) develop activity work-ers implementing the individual activities, (3) register the workflow at Amazon SWF,(4) start the decider and activity workers and let them listen to the SWF endpoint, and(5) start the workflow.

AWS Flow Framework allows the development of Amazon SWF workflows via theAWS SDK for Java by specifying its coordination steps as a sequential Java program,where the workflow activities are represented as function calls. Functions represent-ing activities and functions used to handle or manipulate data produced or consumedby activities need to have the @Asynchronous annotation (called in the followingasynchronous functions), and their input arguments and return values need to be of typePromise. A Promise object acts as a handle to the actual data that will be available assoon as the corresponding asynchronous function has been executed. When used as inputargument, a Promise object can also be used to represent data dependencies betweenseveral asynchronous functions. An asynchronous function, having a Promise objectproduced by another asynchronous function as input, will only be executed when theactual data referenced by the Promise object is available.

Amazon SWF executes a workflow application through repeated invocations of thedecider program, which is executed every time a state change occurs (signaled via adecider task), and a history of all decider executions is recorded. To intercept all calls toasynchronous functions in the decider program, AWS uses AspectJ, a Java implementa-tion of aspect-oriented programming [19]. An asynchronous function is instantiated onlyonce during the entire workflow execution, and its return value is saved into the execu-tion history. In every subsequent decider execution, the same function is not re-executed,but its result extracted from the execution history and returned as Promise object.An asynchronous function that has not been executed yet is put into a queue, and aPromise object with no actual data is returned. This data will be instantiated as soonas the corresponding asynchronous function has produced it. Before the decider fin-ishes its execution, it examines all asynchronous functions in the queue, executes thosewhose dependencies are satisfied, and records their results in the execution history. Theworkflow execution finishes if there are no more nonexecuted asynchronous functions.

13.6 RAINCLOUD WORKFLOW

We introduce in this section the RainCloud workflow used in this chapter for illustratingand validating our approach. Raincloud is a meteorological workflow for investigatingand simulating precipitations in mountainous regions using a simple numerical lin-ear model of orographic precipitations [20]. The workflow has been developed in theASKALON environment by the Institute of Meteorology of the University of Innsbruckto analyze certain meteorological phenomena by extending the linear model theory. The

“9780471697558c13” — 2015/3/20 — 12:15 — page 318 — #8


LinearModel

LinearModel

PrepareLM

PostprocessSingle

PostprocessSingle

PostprocessFinal

Figure 13.5. Simplified view of the RainCloud workflow.

workflow is also is used by the Tyrolean avalanche service (Tiroler Lawinenwarndienst)for their daily issued avalanche bulletin. We choose this applications as it is used on adaily baisis by scientists.

Figure 13.5 shows a simplified architecture of the RainCloud workflow. The firstactivity PrepareLM prepares and partitions the data for the linear model. Each partitionis then processed in a parallel loop iteration by a pipeline of two activities: Lin-earModel and PostprocessSingle, the last one being optional. The number ofparallel loop iterations can be configured by setting the appropriate input parameter. Thelast activity collects the output data and produces the final result. Listing 13.1 showsthe specification of the parallelForEach loop in IWIR. Inside this loop, we firsthave the atomic activity linearModel (line 8), followed by an if-construct (line 14)containing the atomic activity postProcessSingle (line 19).

Listing 13.1 RainCloud’s parallelForEach loop in IWIR1 . . .2 < p ara l l e lForEach name=" Para l l e lForEach_1 ">3 < i n p u t P o r t s >< i n p u t P o r t name="isPPS" type =" boolean " / >4 <loopElements ><5 loopElement name="PLMTars" type =" c o l l e c t i o n / f i l e " / >6 < / loopElements >< / i n p u t P o r t s >7 <body>8 < task name=" l inearModel " t a s k t y p e =" l inearModel ">9 < i n p u t P o r t s >< i n p u t P o r t name="PLMTar" type =" f i l e " / >

10 < / i n p u t P o r t s >11 < outputPor t s >< outputPort name="LMTar" type =" f i l e " / >12 < outputPort name=" o u t f i l e " type =" f i l e " / >< / outputPor t s >13 < / task >14 < i f name=" Decis ionNode_1 ">15 < i n p u t P o r t s >< i n p u t P o r t name="LMTar" type =" f i l e " / >16 < inputPor t name="isPPS" type =" boolean " / >< / i n p u t P o r t s >17 < c o n d i t i o n >isPPS = t r u e < / c o n d i t i o n >18 <then>19 < task name=" p o s t P r o c e s s S i n g l e " t a s k t y p e =" p o s t P r o c e s s S i n g l e ">20 < i n p u t P o r t s >< i n p u t P o r t name="LMTar" type =" f i l e " / >21 < / i n p u t P o r t s >22 < outputPor t s >< outputPort name="PPSTar" type =" f i l e " / >23 < / outputPor t s >24 < / task >25 < / then>26 < outputPor t s >< outputPort name=" P P S l i s t T a r s " type =" f i l e " / >< / outputPor t s >27 < l i n k s >28 < l i n k from=" Decis ionNode_1 /LMTar" to =" p o s t P r o c e s s S i n g l e /LMTar" / >29 . . .30 < / l i n k s >31 < / i f >32 < / body>33 < outputPor t s >

“9780471697558c13” — 2015/3/20 — 12:15 — page 319 — #9

IWIR-TO-SWF CONVERSION 319

34 < outputPort name=" P P S l i s t T a r s " type =" c o l l e c t i o n / f i l e " / >35 < outputPort name=" o u t f i l e s " type =" c o l l e c t i o n / f i l e " / >36 < / outputPor t s >37 < l i n k s >38 < l i n k from=" Para l l e lForEach_1 / PLMTars" to =" l inearModel / PLMTar" / >39 . . .40 < / l i n k s >41 < / para l l e lForEach >42 . . .

13.7 IWIR-TO-SWF CONVERSION

Figure 13.6 shows the architecture of our IWIR-to-SWF conversion solution, consist-ing of four parts: the decider, Amazon SWF, the legacy code execution service on eachworker node, and the file storage. With Amazon SWF, the decider and workflow activi-ties are individual Java programs, purposely designed for Amazon SWF. The goal of thischapter is to present a method for translating scientific workflows from the interoperableIWIR representation to Amazon SWF with as little effort for the programmer as pos-sible. While the abstract workflow coordination can be automatically translated into anSWF decider Java program, there is no practical way to automatically convert the legacycode implementing the concrete workflow activities into an SWF-compatible Java pro-gram. To still achieve this goal with minimal programmer involvement, we implementedan execution service that interfaces with Amazon SWF and acts as a Java wrapper forexisting legacy code.

The only requirement imposed by Amazon SWF on the worker nodes is an outgoingHTTP connection to Amazon SWF. This makes Amazon SWF easy to set up with noneed of firewall reconfiguration. Technically, direct file transfers between worker nodesare possible, but this requires a corresponding service running on the worker nodes andthe firewall to be reconfigured accordingly. As we did not want to loose the advantage ofan easy setup, we decided to use an intermediate file storage for the file transfers, so thatthere is no need for incoming connections on the worker nodes. Currently, we support

Decider (sheduler)

1. Put task into queue

Amazon SWF (task queues)

2. Fetch task from queue

7. Report execution status

3. Prepare environment

5. Execute legacy code

Worker node4./6. Transfer files

Filestorage

Figure 13.6. Architecture of a generated Amazon SWF workflow.

“9780471697558c13” — 2015/3/20 — 12:15 — page 320 — #10


only Amazon S3 as an intermediate file storage, but other file storage technologies canbe easily added as extensions.

13.7.1 Decider Generation

As presented in Section 13.4, an IWIR workflow is constructed from a top-level DAGactivity that explicitly describes the data flow between its activities. Control flow con-structs such as loops and conditionals are represented by composite activities. To convertan IWIR workflow into an Amazon SWF decider, we have to transform the data flow-driven IWIR DAGs and the semantics of the composite activity types into a controlflow-driven Java program. Moreover, we also have to take care that the concepts of theAWS Flow Framework, namely asynchronous functions and semantics of the returnedPromise objects, are correctly applied.

The basic principle of the conversion is that every atomic or composite activity isrepresented by its own activity function in the Java program. Listing 13.2 shows the gen-eration process of the decider. The first step is the generation of a function representingthe start of the workflow (line 3). The signature of this function represents the input andoutput ports of the workflow (line 10). In the function body, the top-level activity func-tion is called with the appropriate input arguments (line 11). Afterwards, the results ofthe top-level activity function are presented to the user in an appropriate way (line 13).Every activity encountered during the conversion process with no activity function cre-ated yet is put into a queue (e.g., in line 12). After the workflow entry function has beengenerated, the algorithm iterates through this queue (line 4) and generates an activityfunction for each queue element (line 6).

Listing 13.2 SWF decider generation algorithmInput: Scientific workflow: A = (I, O, G)Output: SWF decider (Java program)1: function GenDecider(A = (I, O, G))2: Queue ← ∅3: GenWfStart(A, Queue)4: while Queue �= ∅ do5: A ← Pop(Queue)6: GenActivityFunction(A, Queue)7: end while8: end function9: function GenWfStart(A, Queue)

Input: A = (I, O, G)10: GenWfStartProlog(I, O)11: GenActivityFuntionCall(A, Queue)12: Put(Queue, A)13: GenWfStartEpilog(O)14: end function

13.7.2 Activity Function Generation

Listing 13.3 shows the generation of an activity function representing a workflow activity.The function signature of an activity function corresponds to the input and output ports,while the function body implements its semantic behavior, including any associated

“9780471697558c13” — 2015/3/20 — 12:15 — page 321 — #11


DAG. For atomic activities, we only have to generate the function signature with theActivity annotation and an empty function body (lines 9–10). The AWS Flow Frame-work will then automatically generate function stubs, which allow us to communicatewith the SWF task queue. For composite activities, we need to additionally generate,besides the function signature, a function body implementing the activity behavior (lines4–6). In the following text, we describe in detail how the activity functions are generated.To facilitate understanding, we divided the code generation of the composite activityfunction bodies in three logical sections: (1) activity semantics, (2) DAG control flow,and (3) DAG data flow. However, these steps are not distinct, but interleaved with eachother (e.g., the function call in line 5 generates not only the control flow but also thedata flow).

Listing 13.3 Activity function generation algorithmInput: Workflow activity: A = (I, O, G)Output: Activity function (in Java)1: function GenActivityFunction(A, Queue)

Input: A = (I, O, G)2: if G �= ∅ then � Composite activity3: N ← GetActivityName(A)4: GenFunctionProlog(N, A)5: GenDAGControlFlow(G, Queue)6: GenFunctionEpilog(A)7: else � Atomic activity8: N ← GetActivityTypeName(N, I, O)9: GenFunctionSignature(N, I, O)

10: GenEmptyFunctionBody()11: end if12: end function

13.7.2.1 Function Signature. The first step in generating an activity functionis the function signature. The arguments of the activity function represent the input portsand the return value the output ports of the associated workflow activity. However, thisrepresentation has some inadequateness. In a workflow representation, the input and out-put ports of an activity are usually identified by their names, and the number of outputports is not limited. By contrast, the arguments of a Java function are identified by theirorder and the return argument is restricted to one. Moreover, returning values by call byreference does not work in an SWF program because the activity functions are executedasynchronously from the rest of the program (see Section 13.5). In practice, the first inad-equateness can be neglected when generating the decider by consistently maintaining thesame parameter order. However, this may pose a problem for the legacy wrapper service,as changes in the order of the input arguments cannot be automatically distributed to thisservice. To address this problem, we implemented a wrapper class for the input argu-ments of atomic activity functions with a field for the name of the input port and a fieldfor the actual value. The legacy wrapper service can then assign the input values to thecorrect input port by looking at the name field. To address the second inadequateness, weimplemented another wrapper class that stores several output values into an array that isreturned by the activity functions.

“9780471697558c13” — 2015/3/20 — 12:15 — page 322 — #12


For example, Listing 13.4 shows a function signature representing the atomic activ-itylinearModel of RainCloud. Because the activity has more than one output port, thecorresponding function returns an object of type PortWrapperArray encapsulatingthe output values. All input values have the type PortWrapper because the functionrepresents an atomic activity and, therefore, needs to interface with the legacy wrapperservice. The AWS Flow framework automatically generates a stub function for inter-facing with the task queues declared as asynchronous and returning a Promiseobject.

Listing 13.4 Function signature of the atomic linearModel activity1 @Activ i ty ( name=" R a i n C l o u d A c t i v i t i e s . l inearModel " )2 p u b l i c PortWrapperArray l inearModel ( PortWrapper PLMTar )

Listing 13.5 contains another example of a function signature representing thecomposite activity ParallelForEach_1.

Listing 13.5 Function signature of the ParallelForEach_1 activity1 @Asynchronous2 p r i v a t e Promise para l l e lForEach_1 ( Promise <Boolean > isPPS , Promise < S t r i n g [ ] >

PLMTars ) ;

13.7.2.2 Activity Semantics. The next step is the generation of code thatimplements the semantics of the three types of composite activities: (1) container,(2) conditional, and (3) loop. Container activities only contain other activities withoutadditional semantics. Conditional activities consist of an if-else construct andseparate activity function control flows for the two branches. The conditional expres-sion may contain input port values that can be easily referenced by specifying theappropriate function argument. Loop activities are the hardest to implement becauseof the several IWIR loop flavors: while, for, forEach, parallelFor, andparallelForEach. We exploited the asynchronous function invocation feature ofSWF to implement parallel loops as simple sequential loops in the decider program.Because activity functions are executed asynchronously, the decider does not wait foran activity function to finish before starting the next loop iteration. To force sequen-tial execution of activity functions inside a nonparallel loop, we have to introduceartificial dependencies between activity functions called in different iterations usingPromise-objects.

Listing 13.6 shows an example of a function body representing the compositeactivity ParallelForEach_1 of RainCloud. The number of loop iterations is firstcalculated in line 3. Lines 5–10 represent the actual for loop, whereas lines 12–13 dealwith the construction of the return value.

Listing 13.6 ParallelForEach_1 activity semantics1 p r i v a t e Promise para l l e lForEach_1 ( . . . ) {2 / / Get number of e lements .3 i n t maxIter = PLMTars . g e t ( ) . l e n g t h ;4 / / I t e r a t e over the g iven array .5 f o r ( i n t i = 0 ; i < maxIter ; i ++) {6 / / Get current e lement7 Promise < Str ing > p = Promise . asPromise ( PLMTars . g e t ( ) [ i ] ;

“9780471697558c13” — 2015/3/20 — 12:15 — page 323 — #13


8 / / A c t i v i t y f u n c t i o n c o n t r o l f low goes here9 . . .

10 }11 / / Bui ld re turn va lue .12 Promise [ ] r e t v a l = new Promise [ 2 ] ;13 re turn Promise . asPromise ( r e t v a l ) ;14 }

13.7.2.3 DAG Control Flow. The workflow activities of a given DAG aresorted according to their topological order that preserves the original data flow. In thetopological order, a workflow activity can only be executed after all its predecessorshave been completed and produced the required input data. As a workflow may consistof several DAGs, we calculate the topological order for each DAG independently.

For example, RainCloud’s ParallelForEach_1 loop calls the activitylinear-Model whose results are fed into the activity PostProcessSingle,depending on the value of the input parameter isPPS. Listing 13.7 shows the equiv-alent Java activity with calls to the contained activity functions in lines 6 and 8. The ifstatement, which determines whether PostProcessSingle should be executed, isrepresented by the decisionNode_1 function in line 8, with the missing parameteradded in the data flow step (Listing 13.8, line 14).

Listing 13.7 Control flow inside ParallelForEach_1 activity1 p r i v a t e Promise para l l e lForEach_1 ( . . . ) {2 i n t maxIter = PLMTars . g e t ( ) . l e n g t h ;3 f o r ( i n t i = 0 ; i < maxIter ; i ++) {4 Promise < Str ing > currEl = Promise . asPromise ( PLMTars . g e t ( ) [ i ] ;5 / / Ca l l to atomic a c t i v i t y "LinearModel"6 a c t i v i t y C l i e n t . l inearModel ( currEl ) ;7 / / Ca l l to composi te i f−a c t i v i t y8 dec i s ionNode_1 ( . . . , isPPS ) ;9 }

10 Promise [ ] r e t v a l = new Promise [ 2 ] ;11 re turn Promise . asPromise ( r e t v a l ) ;12 }

13.7.2.4 Data Flow. The last step in generating the body of an activity func-tion is to introduce variables that model the data flow between the enclosed activityfunctions. To ease the variable handling, we use the single static assignment techniqueemployed in compiler construction, which requires every variable be written once andnot reused afterwards. Every value returned by an activity function is assigned to its ownunique variable and passed as input to each activity function with a connected input port.The main idea is that the implementation of an activity function does not need to knowhow the preceding activity functions produced and stored their output values. This isalso reflected in the activity function signatures (see Section 13.7.2.1) which only con-sists of the input arguments from the original workflow specification. Activity functionsreturning more than one value return a wrapper object (see Section 13.7.2). The individ-ual values contained in this wrapper object need to be extracted before they are fed toa subsequent activity function. Unfortunately, Promise objects can only be accessedinside asynchronous functions, otherwise an exception will be thrown. To address this

“9780471697558c13” — 2015/3/20 — 12:15 — page 324 — #14


problem, we implemented several asynchronous helper functions for data manipulationand conversion.

Listing 13.8 presents the data flow of the activity function representing the com-posite activity ParallelForEach_1. First, an array for holding the results of eachloop iteration is created for each activity in lines 4 and 6. The activities’ output ports aredirectly connected to a corresponding output port of the surrounding composite activity.Then, the return value of each activity function is stored in its own variable in lines 10and 14. Since the activity linearModel returns a wrapper object, we have to convertit (line 12) before using the actual return values (lines 14 and 16). At the end of each loopiteration, the values produced in the iteration are stored into the corresponding variables(lines 16 and 18). At the end of the function body, we construct the return object andconvert the variables into a more suitable form (lines 22 and 24).

Listing 13.8 Data flow within the ParallelForEach_1 composite activity1 p r i v a t e Promise para l l e lForEach_1 ( . . . ) {2 i n t maxIter = PLMTars . g e t ( ) . l e n g t h ;3 / / Holds output v a l u e s o f l inearModel a c t i v i t y4 Promise [ ] out1 = new Promise [ maxIter ] ;5 / / Holds output v a l u e s o f dec i s ionNode_1 a c t i v i t y6 Promise [ ] out2 = new Promise [ maxIter ] ;7 f o r ( i n t i = 0 ; i < maxIter ; i ++) {8 Promise < Str ing > p = Promise . asPromise ( PLMTars . g e t ( ) [ i ] ;9 / / Save l inearModel re turn value o f in lmo1

10 Promise <PortWrapperArray > lmo1 = a c t i v i t y C l i e n t . l inearModel ( p ) ;11 / / Convert lmo1 in a format f o r f u r t h e r p r o c e s s i n g12 Promise [ ] lmo2 = U t i l s . convertPWA2Pa ( lmo1 , 2 ) ;13 / / Input f i r s t va lue s t o r e d in lmo2 ; save return value i n t o dno114 Promise dno1 = dec i s ionNode_1 ( lmo2 [ 0 ] , isPPS ) ;15 / / S tore l inearModel re turn va lue in a c o l l e c t i o n16 out1 [ i ] = lmo2 [ 1 ] ;17 / / S tore i f re turn va lue in a c o l l e c t i o n18 out2 [ i ] = dno1 ;19 }20 Promise [ ] r e t v a l = new Promise [ 2 ] ;21 / / Convert c o l l e c t i o n to a s u i t a b l e re turn format22 r e t v a l [ 0 ] = U t i l s . convertAoP ( out1 ) ;23 / / Convert c o l l e c t i o n to a s u i t a b l e re turn format24 r e t v a l [ 1 ] = U t i l s . convertAoP ( out2 ) ;25 / / Return the output v a l u e s26 re turn Promise . asPromise ( r e t v a l ) ;27 }

13.8 EXPERIMENTS

The goal of our experiments is to compare the performance of the RainCloud workflowin three configurations: automatically generated SWF workflow (using the techniquedescribed in Section 13.7), manually optimized SWF workflow, and original ASKALONversion executed using the ASKALON middleware. To be able to interface with theEC2 infrastructure, we pragmatically extended the ASKALON middleware services suchas security with Amazon credentials, information service with virtual machine imagemanipulation, and enactment engine with SSH-based job submission [21].

“9780471697558c13” — 2015/3/20 — 12:15 — page 325 — #15

EXPERIMENTS 325

13.8.1 Setup

We run the experiments on 16 Amazon instances of type m1.medium. For the SWFworkflow, we used S3 as intermediate file storage. We executed the SWF decider andthe ASKALON scheduler on a dedicated host with an Intel i7-2600K quad-core proces-sor running at 3.4 GHz and 8 GB of memory, outside of Amazon EC2. For ASKALONwe used a just-in-time scheduler that maps the next ready activities on the machinesdelivering the earliest completion time, because it mostly resembles the SWF opera-tion. This simple approach does not benefit from several optimizations normally used inworkflow executions but the goal of this analysis was not to compare the features of theASKALON worflow system with Amazon SWF. We executed the RainCloud workflowin two scenarios: noncongested with 16 parallel loop iterations and two problem sizes(small and large) and congested with 64 parallel loop iterations and a small problem size.The small problem size corresponds to a 18 × 18 simulation grid and the large one to a36 × 36 grid. For each scenario and workflow version, we calculated the two metrics:average total execution time and cumulative execution time of all workflow activitiesplus the scheduling time. To get an understanding on the amount of overhead presentin a workflow execution, we further split its cumulative execution time into process-ing time (performing actual computation), scheduling time, waiting time (in an engineinternal queue) due to insufficient free resources, queuing time due to middleware andexternal load latencies, and file transfer time.

13.8.2 Results

Figure 13.7 shows the total execution times for the three workflow versions with 16 par-allel iterations and small and large problem sizes in the non-congested scenario. Themanually written SWF workflow is only marginally faster than the automatically gen-erated version. We expected this result because the two versions only differ in theimplementation of the decider, whose overhead is negligible compared to the total work-flow execution time. Surprisingly, the ASKALON version suffers from significantlyhigher execution times due to the much higher overhead for transferring files betweenthe worker nodes, as shown in Figure 13.8a. We found out that this overhead is causedby the Java CoG Kit [22] employed by ASKALON as a black-box library for interfacing

0

5

10

15

20

25

SWF

(manual)

SWF

(automatic)

ASKALON

Execution tim

e (

min

)

Small problem size

0

10

20

30

40

50

60

70

SWF

(manual)

SWF

(automatic)

ASKALON

Executio

n tim

e (

min

)

Large problem size

Figure 13.7. RainCloud execution time in noncongested scenario.

“9780471697558c13” — 2015/3/20 — 12:15 — page 326 — #16


0

1

2

3

4

5

6

7

8

9

SWF(manual)

SWF(automatic)

ASKALON ASKALON

Cum

ula

tive t

ime (

min

)

0

1

2

3

4

5

6

7

8

9

Cum

ula

tive t

ime (

min

)

Small problem size

Scheduling Queueing

SWF(manual)

SWF(automatic)

Large problem size

(b)

0

50

100

150

200

250

SWF

(manual)

SWF

(manual)

SWF

(automatic)

SWF

(automatic)

Cum

ula

tive t

ime (

min

)

Cum

ula

tive t

ime (

min

)

Small problem size

Processing

Scheduling

Queueing

File transfer

0

100

200

300

400

500

600

700

800Large problem size

ASKALON ASKALON

(a)

Figure 13.8. RainCloud cumulative times in noncongested scenario. (a) Cumulative times.

(b) Cumulative overheads except file transfer.

with Grids (through Globus plugin) and Clouds (through SSH plugin), which uses anASKALON middleware machine outside Amazon EC2 as an intermediary for trans-ferring files between two remote machines. In the following text, we disregard the filetransfer times to make the experiments more comparable.

The other reasons for ASKALON’s performance losses are the scheduling andqueuing overheads shown in Figure 13.8b. The scheduling overhead in ASKALON isapproximately three times higher than in SWF because it is tuned for highly heteroge-neous and distributed Grid infrastructures, as opposed to Clouds that tend to be morehomogeneous and located within one data center. Because of this, the ASKALON Gridscheduler needs to interact with a resource manager for discovering the available sharedresources which is not a requirement in static Clouds owned by a single organization.Moreover, the ASKALON scheduler also needs to evaluate the external load generatedby scientists sharing a specific Grid resource, not required for dedicated Cloud resources.Finally, the ASKALON scheduler is also responsible for preparing the remote execu-tion environments (and directories) through multiple SSH connections, not required forSWF that delegates the setup of the environment to the locally running legacy wrapper

“9780471697558c13” — 2015/3/20 — 12:15 — page 327 — #17

EXPERIMENTS 327

service. A more generic execution approach has more features resulting in overheadsthan a specialized platform dependent one.

Also, the workflow activities wait three times longer in the queue of ASKALONcompared with SWF. The average overhead per workflow activity without file transferis approximately 4–5 s for SWF and around 15 s for ASKALON, which is compara-ble for scientific workflows with long running activities. The queuing time is largerfor ASKALON than for SWF because of the higher middleware stack required byASKALON for supporting a broader range of heterogeneous infrastructures (i.e., clus-ters, Grids, and Clouds), as opposed to SWF tuned for running in the native EC2infrastructure only. In addition, ASKALON actively pushes workflow activities tobe executed onto the worker nodes which introduces higher overhead than the pullapproach used by Amazon SWF where the worker nodes actively fetch tasks from a taskqueue.

Figure 13.9 shows the total execution times in the congested scenario. The SWFversion performs slightly better for 64 parallel loop iterations than for 16; how-ever, this improvement due to load imbalance on the 16 iteration parallel loop andcoarse-grain activity sizes is still within the standard deviation. Using 64 iterationsproduces a finer grained parallelization and smaller activity sizes that enable a bet-ter schedule with smaller load imbalance overhead. The ASKALON version with 64parallel iterations performs worse than with 16, but this is again within the standarddeviation.

Figure 13.10 shows the cumulative execution time for 64 parallel loop iterations. Asexpected, the waiting times in the internal engine queue are extremely high because thereare four times as many workflow activities ready to execute than worker nodes. Again,the queuing time of ASKALON is larger than of SWF because of the higher middle-ware stack and the batch-mode access to resources. The average overhead per workflowactivity without the file transfer and waiting overheads is 17 s for SWF and 40 s forASKALON, which is an increase by a factor of 3.4 for SWF and 2.7 for ASKALON com-pared to the previous scenario. The slight increase in execution time of the ASKALONversion in the congested scenario is mainly caused by file transfer overheads.

0

5

10

15

20

25

SWF(automatic)

ASKALON

Execution tim

e (

min

)

Small problem size

Noncongested (16 parallel loops) Congested (64 parallel loops)

Figure 13.9. RainCloud execution time with 16 and 64 parallel loops.

“9780471697558c13” — 2015/3/20 — 12:15 — page 328 — #18


Small problem size

0

100

200

300

400

500

600

700

800

SWF

Noncongested

ASKALON SWF

Congested

ASKALON

Cu

mu

lative

tim

e (

min

)

Processing

Scheduling

Waiting

Queueing

File transfer

0

10

20

30

40

50

60

70

80

90

SWF

Noncongested

ASKALON SWF

Congested

ASKALON

Cu

mu

lative

ove

rhe

ad

(m

in)

Figure 13.10. Cumulative RainCloud execution times with 16 and 64 parallel loops.

13.8.3 Discussion

To conclude, ASKALON has been designed to support a variety of heterogeneous anddistributed computing environments, including Globus, gLite, EC2, and GroudSim-based [23]. This heterogeneity in the supported infrastructures is achieved through amodular architecture consisting of several layers and comprising complex services suchas enactment engine, scheduling, file transfers, and resource management. Although wepaid high attention at tuning the ASKALON overheads when building the Cloud plugins,we exhibit a performance drop due to the higher middleware stack compared to SWF,tuned for working with the local, simpler, and more homogeneous EC2 infrastructure.For this reason, SWF features a much simpler architecture where the decider only decideswhich workflow activities can be executed next and not on which resources. The tasksof preparing the execution environment and transferring local files from S3 need to bemanually implemented by the programmer incurring a lower execution overhead, at thecost of a higher programming effort. Workflows consisting of numerous relatively shortactivities will mostly suffer from the larger ASKALON middleware overheads.

13.9 OPEN CHALLENGES

The approach shown in this chapter represents one possible solution for using cloudresources for scientific computing utilizing the workflow paradigm. Not all applicationscan utilize this approach to efficiently use the available resource pool. There are stillopen challenges for different types of scientific applications that are not covered withthe shown solution:

Big Data: Some scientific domains rely on enormous amounts of data to be pro-cessed. The challenge for this applications is to transfer this data into the cloud

“9780471697558c13” — 2015/3/20 — 12:15 — page 329 — #19

CONCLUSION 329

for processing as transferring the processing power from the cloud to the data istechnically not possible [24, 25]. For this class of application, a faster Internetconnection will be needed; but when looking at physics experiments (like ATLASfrom CERN), this fastest possible transfer speed might still be to low.

Security: Most scientific applications are self-written, open source or free to use.When utilizing commercial software there might be problems with licensing themto leased cloud hardware. Additionally the input data and results might also berestricted in their distribution (i.e., medical studies). To allow such applicationsto utilize cloud resources without violating copyrights or laws is still an openchallenge [26, 27].

Super computing: Some applications are build for massive parallel architecturescommonly only available in supercomputers. Cloud providers might have shownin demo applications that it is possible to build a setup that is fast enough to reachthe TOP500 [28] speed wise but that is far away from a regular use case any sci-entist can deploy on the cloud everyday. For those applications the only solutionis still having access to a supercomputer.

13.10 CONCLUSION

In this chapter, we proposed a method for automatic porting of scientific workflows toAmazon SWF, able to exploit the native performance of the EC2 infrastructure. Thesolution is based on the SHIWA fine-grained interoperability technology for translatingworkflows written across different languages and workflow systems through the com-mon IWIR representation. This scalable software engineering solution enables fivemajor workflow systems currently supporting the IWIR representation access the EC2infrastructure through the SWF service: ASKALON, MOTEUR, Pegasis, Triana, andWS-PGRADE.

We presented in this chapter the difficulties we encountered in translating an dataflow-oriented ASLALON workflow into a control flow-oriented SWF decider program.The method is based on an algorithm that automatically generates the SWF decider Javaprogram and the underlying activity functions in four phases: function signature, activitysemantics, DAG control flow, and data flow generation.

We presented experimental results for porting an original real-world ASKALONworkflow to the EC2 infrastructure in two configurations: conversion to a Java SWFdecider or execution through the ASKALON middleware connected to EC2 via an SSHplugin. The results demonstrate that the SHIWA fine-grained interoperability solutionthat translates an ASKALON workflow into an SWF version through the common IWIRrepresentation is a promising alternative for porting workflows to a new infrastruc-ture and able to exploit its native performance. Amazon SWF represents an attractiveenvironment for running traditional workflow applications, especially those consistingof numerous relatively short activities affected by large middleware overheads whenexecuted in traditional ways. This is demonstrated by the performance of the automati-cally generated SWF workflow, which is similar to the manually optimized version. By

“9780471697558c13” — 2015/3/20 — 12:15 — page 330 — #20


contrast, porting existing Grid workflow middleware environments such as ASKALONto the Cloud, although effective, have performance drawbacks compared to the trans-lated SWF version. The reasons of performance losses lie in the high middleware stackrequired for supporting a wider range of distributed and heterogeneous cluster, Grid,and Cloud computing infrastructures and a more generic scheduling and executionapproach.

A downside of SWF is its proprietary implementation hosted by a commercial ven-dor who charges costs and may abandon this service anytime if it is lacking success.Another difference to clusters and Grids is the pull-based assignment of tasks to anunknown number of activity workers that requires different scheduling methods.

ACKNOWLEDGMENTS

This work has been performed in the projects TRP 237-N23 funded by the Austrian Sci-ence Fund (FWF) and eRamp, co-funded by the grant 843738 from the Austrian ResearchPromotion Agency (FFG) and the ENIAC Joint Undertaking.

REFERENCES

1. I. J. Taylor, E. Deelman, D. Gannon, and M. Shields, Eds., Workflows for e-Science. ScientificWorkflows for Grids. Springer, London, U.K., 2007.

2. J. Varia and S. Mathew, “Overview of amazon web services,” 2012. Available:http://d36cz9bnwru1tt.cloudfront.net/AWS_Overview.pdf. Accessed on December 9, 2014.

3. K. Plankensteiner, J. Montagnat, and R. Prodan, “IWIR: A language enabling portabilityacross grid workflow systems,” in Proceedings of the 6th Workshop on Workflows in Supportof Large-Scale Science, Seattle, WA. ACM, New York, 2011, pp. 97–106.

4. T. Fahringer, R. Prodan, R. Duan, J. Hofer, F. Nadeem, F. Nerieri, S. Podlipnig, J. Qin,M. Siddiqui, H. Truong et al., “Askalon: A development and grid computing environmentfor scientific workflows,” Workflows for e-Science, Springer, Berlin, 2007, pp. 450–471.

5. T. Glatard, J. Montagnat, D. Lingrand, and X. Pennec, “Flexible and efficient workflowdeployment of data-intensive applications on grids with moteur,” International Journal ofHigh Performance Computing Applications, vol. 22, no. 3, pp. 347–360, 2008.

6. P. Kacsuk, “P-grade portal family for grid infrastructures,” Concurrency and Computation:Practice and Experience, vol. 23, no. 3, pp. 235–245, 2011.

7. E. Deelman, G. Singh, M. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. Berriman,J. Good et al., “Pegasus: A framework for mapping complex scientific workflows ontodistributed systems,” Scientific Programming, vol. 13, no. 3, pp. 219–237, 2005.

8. I. Taylor, M. Shields, I. Wang, and O. Rana, “Triana applications within grid computing andpeer to peer environments,” Journal of Grid Computing, vol. 1, no. 2, pp. 199–217, 2003.

9. G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. Berman, and P. Maechling,“Scientific workflow applications on amazon ec2,” in 2009 5th IEEE International Conferenceon E-Science Workshops. Oxford, U.K. IEEE, New York, 2009, pp. 59–66.

“9780471697558c13” — 2015/3/20 — 12:15 — page 331 — #21

REFERENCES 331

10. P. Riteau, M. Tsugawa, A. Matsunaga, J. Fortes, T. Freeman, D. LaBissoniere, and K. Keahey,“Sky computing on futuregrid and grid’5000,” in 5th Annual TeraGrid Conference: PosterSession, vol. 68, Pittsburgh, PA, 2010, p. 119.

11. C. Hoffa, G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berriman, and J. Good,“On the use of cloud computing for scientific workflows,” in IEEE Fourth InternationalConference on eScience, 2008 (eScience’08)., Indianapolis, IN. IEEE, New York, 2008,pp. 640–645.

12. G. Morar, F. Schueller, S. Ostermann, R. Prodan, and G. Mayr, “Meteorological simula-tions in the cloud with the ASKALON environment,” in Euro-Par 2012: Parallel ProcessingWorkshops. Springer, Berlin, Germany, 2013, pp. 68–78.

13. J.-S. Vöckler, G. Juve, E. Deelman, M. Rynge, and B. Berriman, “Experiences using cloudcomputing for a scientific workflow application,” in Proceedings of the 2nd InternationalWorkshop on Scientific Cloud Computing ser. ScienceCloud (’11, Boulder, CO). ACM,New York, 2011, pp. 15–24. Available: http://doi.acm.org/10.1145/1996109.1996114

14. M. De Assunção, A. Di Costanzo, and R. Buyya, “Evaluating the cost-benefit of using cloudcomputing to extend the capacity of clusters,” in Proceedings of the 18th ACM Interna-tional Symposium on High Performance Distributed Computing, Munich, Germany. ACM,New York, 2009, pp. 141–150.

15. P. Marshall, K. Keahey, and T. Freeman, “Elastic site: Using clouds to elastically extend siteresources,” in Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster,Cloud and Grid Computing, Melbourne, Australia. IEEE Computer Society, Washington, DC,2010, pp. 43–52.

16. D. Franz, J. Tao, H. Marten, and A. Streit, “A workflow engine for computing clouds,” inCLOUD COMPUTING 2011, the Second International Conference on Cloud Computing,GRIDs, and Virtualization, Rome, Italy, 2011, pp. 1–6.

17. S. Pandey, D. Karunamoorthy, K. Gupta, and R. Buyya, “Megha workflow managementsystem for application workflows,” IEEE Science & Engineering Graduate Research Expo,Melbourne, Australia, 2009.

18. M. E. Conway, “Proposal for an uncol,” Communications of the ACM, vol. 1, no. 10,pp. 5–8, Oct. 1958. [Online]. Available: http://dl.acm.org/citation.cfm?id=368928. Accessedon December 9, 2014.

19. G. Kiczales, J. Lamping, A. Mendhekar, C. Maeda, C. Lopes, J.-M. Loingtier, and J. Irwin,“Aspect-oriented programming,” ECOOP’97—Object-Oriented Programming, Jyuäskylä,Finland, pp. 220–242, 1997.

20. I. Barstad and F. Schueller, “An extension of Smith’s linear theory of orographic precipita-tion: Introduction of vertical layers,” Journal of the Atmospheric Sciences, vol. 68, no. 11,pp. 2695–2709, 2011.

21. S. Ostermann, R. Prodan, and T. Fahringer, “Extending grids with cloud resource manage-ment for scientific computing,” in 2009 10th IEEE/ACM International Conference on GridComputing Banff, Alberto, Canada. IEEE, New York, 2009, pp. 42–49.

22. G. von Laszewski, I. Foster, J. Gawor, and P. Lane, “A Java commodity Grid kit,” Concurrencyand Computation: Practice and Experience, vol. 13, no. 89, pp. 643–662, 2001.

23. S. Ostermann, K. Plankensteiner, and R. Prodan, “Using a new event-based simulation frame-work for investigating resource provisioning in Clouds,” Scientific Programming, vol. 19,no. 2, pp. 161–178, 2011.

“9780471697558c13” — 2015/3/20 — 12:15 — page 332 — #22


24. D. Agrawal, S. Das, and A. El Abbadi, “Big data and cloud computing: Current state and futureopportunities,” in Proceedings of the 14th International Conference on Extending DatabaseTechnology, Uppsala, Sweden. ACM, New York, 2011, pp. 530–533.

25. S. Chaudhuri, “What next?: A half-dozen data management research goals for big data and thecloud,” in Proceedings of the 31st Symposium on Principles of Database Systems, Scottsdale,AZ. ACM, New York, 2012, pp. 1–4.

26. B. R. Kandukuri, V. R. Paturi, and A. Rakshit, “Cloud security issues,” in IEEE InternationalConference on Services Computing, 2009. (SCC’09) Bangalore, India. IEEE, New York, 2009,pp. 517–520.

27. D.-G. Feng, M. Zhang, Y. Zhang, and Z. Xu, “Study on cloud computing security,” Journal ofSoftware, vol. 22, no. 1, pp. 71–83, 2011.

28. “Top500 homepage,” Website, accessed on 06/2014, http://www.top500.org/system/10661.

“9780471697558c14” — 2015/3/20 — 12:16 — page 333 — #1

14

INTERACTIVE MULTIMEDIAAPPLICATIONS ON CLOUDS

Karine Pires and Simon Gwendal

Telecom Bretagne, Institut Mines-Telecom, Paris, France

14.1 INTRODUCTION

In less than 10 years, services over Internet have switched from static Web pages tointeractive multimedia applications. To deal with the demand for more interactivity andmore multimedia content, service providers have been forced to upgrade their infrastruc-ture to offer their service. In the meantime, the benefits of virtualizing the infrastructureand of delegating the delivery to external companies have prevailed over the traditionalarchitecture where the service provider owns a set of servers and delivers the content byitself. What is now referred to as the cloud is the combination of multiple actors tied bycommercial agreements and orchestrated by the service provider.

In this chapter, we will focus on the service providers that offer massive, interactive,multimedia services. These service providers face challenging scalability and responsetime issues. We will describe the solutions that have been recently developed to addressthese issues. In particular, we will pay a close attention to three representative services:

1. Cloud Gaming: On-demand gaming, also known as cloud gaming, is a new videogaming application/platform. Instead of requiring end users to have sufficiently


333

“9780471697558c14” — 2015/3/20 — 12:16 — page 334 — #2

334 INTERACTIVE MULTIMEDIA APPLICATIONS ON CLOUDS

powerful computers to play games, on-demand gaming performs the intensivegame computation, including the game graphics generation, remotely with theresulting output streamed as a video back to the end users. For game developers,shifting from traditional gaming platforms to cloud gaming means a better controlon the delivery and easier software development and upgrade. On their side, endusers benefit from platform independence, for example, to play computationallyintensive games on portable devices that do not have the required hardware, andalso to be able to play at any time on any available device including smart TVsand smartphones.

2. Massive User-Generated Content (UGC) Live Streaming: Anybody can become aTV provider. This promise, which has been floating in the air for almost 10 years,has led to a considerable effort from the research community [1]. The popular-ity of UGC live streaming aggregators has however not grown as fast as someexpected. Yet, the past couple of years have seen a surge of interest for suchservices, pushed by new usages, including crowdsourced journalism [2] ande-sport [3]. The major actors of the area have been forced to take some measuresto cope with traffic explosion. Typically, Twitch.tv, gaming branch of justin.tv,announced a significant increase in the delay,1 while the new live service fromYouTube was only offered to a subset of users.2

3. Time-shifting On-Demand TV: In time-shifted TV, a program broadcasted froma given time t is made available at any time from t to t + δ where δ can bepotentially infinite. The popularity of TV services based on time-shifted stream-ing has dramatically risen [4]: nPVR (a personal video recorder located in thenetwork), catch-up TV (the broadcaster records a channel for a shifting num-ber of days, and proposes the content on demand), TV surfing (using pause,forward or rewind commands), and start-over (the ability to jump to the begin-ning of a live TV program). Today, to enjoy catch-up TV requires a digital videorecorder (DVR) connected to the Internet. However, TV broadcasters need toprotect advertisement revenue, whereas a DVR viewer can decide to fast forwardthrough commercials. By controlling the TV stream, not only the broadcastersmay guarantee that commercials are played, but they can also adapt them to theactual time at which the viewer watches the program. This calls for a cloud-basedtime-shifted TV service.

These three services make use of the same basic infrastructure components to offertheir services in the best conditions. We will first describe in a generic way these deliverycomponents. Then, we will study each of these services in an iterative manner with theambition to reveal some of the unique characteristics of these services and the solutionsthat have been deployed so far.

1Twitch: The Official Blog http://is.gd/PdqlZI2YouTube Live Introduction http://is.gd/Aw0yAx

“9780471697558c14” — 2015/3/20 — 12:16 — page 335 — #3

DELIVERY MODELS FOR INTERACTIVE MULTIMEDIA SERVICES 335

14.2 DELIVERY MODELS FOR INTERACTIVEMULTIMEDIA SERVICES

14.2.1 Background

The interactive response time is defined as the elapsed time between when an action of theuser is captured by the system and when the result of this trigger can be perceived by theuser. For example in cloud gaming, which is one of the most demanding services on thisaspect, the work in Ref. [5] demonstrates that a latency around 100 milliseconds is highlyrecommended for dynamic, action games, while response time of 150 milliseconds isrequired for slower paced games.

The overall interactive response time T of an application includes several types ofdelays, are defined as follows:

T = tclient +

tnetwork︷︸︸︷taccess + tisp + ttransit + tprovider +tserver

14.2.1.1 Hardware Latency. We define tclient as the playout delay, which is thetime spent by the client to (1) send action information (e.g., in cloud gaming, initiatingcharacter movement in a game) and (2) receive and play the video. Only the client’shardware is responsible for tclient, but the software that runs at the client side is commonlyprovided by the service provider.

Additionally, we define tserver as the processing delay, which refers to the time spentby the server to process the incoming information from the client, to generate the cor-responding video information, and to transmit the information back to the client. Theservice provider is mainly responsible for the processing delay.

Both playout and processing delays can be reduced with hardware changes andsoftware development by the service provider.

14.2.1.2 Network Latency. The remaining contribution of total latency comesfrom the network. We further divide the network latency into four components: taccess,tisp, ttransit, and tprovider.

First, taccess, is the data transmission time between the client’s device and the firstInternet-connected router. Three quarters of end users who are equipped with a DSL con-nection experience a taccess greater than 10 milliseconds when the network is idle [6], andthe average access delay exceeds 40 milliseconds on a loaded link [7]. The behaviour ofdifferent network access technologies can greatly vary, as the latency of the access net-work can differ by a factor of 3 between different Internet Service Providers (ISPs) [7].Additionally, the home network configuration and the number of concurrent active com-puters per network access can double access link latency [8]. Finally, when the networkconnection is through cellular networks, some other parameters can affect the delay,including the technologies at the base station and the underlying network protocol (the“generation” of the network).

The second component of network delay is tisp, which corresponds to the transmis-sion time between the access router and the peering point connecting the ISP network

“9780471697558c14” — 2015/3/20 — 12:16 — page 336 — #4


to the next hop transit network. During this phase, data travel exclusively within the ISPnetwork. Although ISP networks are generally fast and reliable, major ISPs have reportedcongestion due to the traffic generated by new multimedia services [9].

The third component is ttransit, which is defined as the delay from the first peeringpoint to the front-end server of the service provider. The ISP and provider are responsiblefor ttransit; however, the networks along the path are often owned by third-party networkproviders. Nonetheless, the ISP and the cloud provider is responsible for good networkconnectivity for their clients.

The fourth component, tprovider, is defined as the transmission delay between thefront-end server of the service provider and the hosting server for the client. The provideris responsible for tprovider. This delay is however rarely significant. Network latenciesbetween two servers in modern datacenters are typically below 1 milliseconds [10].

14.2.2 Introducing Delivery Models

Service provides have several options to build their “delivery cloud.” Here is a shortintroduction to them.

14.2.2.1 Data Center. The most common way to deliver content is to use a data-center (DC), which is basically a large set of servers [11]. The DC can be either owned orrented by the service provider. In the former case, the infrastructure is almost exclusivelypaid at the construction, however it has some fixed capacity limitations. In the lattercase, the infrastructure can scale up and down on demand but the service provider hasto deal with another actor (DC provider). Although DCs are attractive, easy-to-manageinfrastructures, they do not enable low response time for a large population of usersbecause they are located in one location (or few locations if the service provider dealswith several DC). That is, the aforementioned network latency is too high for a vastfraction of the population because tisp and ttransit are large. Moreover, the monetary cost totransfer data is higher because the traffic should cross several networks until the contenteventually reaches the users.

14.2.2.2 Peer to Peer. The challenge of delivering multimedia content on alarge scale is essentially a problem related to the reservation of physical resources. Toaddress this problem, the scientific community has advocated for years for a peer-to-peer (P2P)-based infrastructure, where users themselves contribute to the delivery byforwarding the content they received. A lot of algorithms have been designed to improvethe delivery performances [1]. However, various constraints have limited the deploymentof P2P systems for commercial purpose. First, firewalls and network address translator(NAT) still prevent many direct connection between users [12]. Second, P2P requireusers to install a program on their computers. Such a “technical,” security-sensitiverequirement can prevent users from using the service. Moreover, despite some newbrowser-based technologies (e.g., WebRTC), a P2P software depends on the configu-ration of the computer of end users, which is a cause of many development difficulties.Third, the service provider has a low control on the Quality of experience (QoE) of userssince it does not directly control the performances. Last, the complexity of P2P sys-tem can increase the delay. Many initiatives have aimed at ensuring that peers connect

“9780471697558c14” — 2015/3/20 — 12:16 — page 337 — #5

DELIVERY MODELS FOR INTERACTIVE MULTIMEDIA SERVICES 337

preferentially with the other peers that are located in the same network [13], which makettransit null. However, a peer usually get data from multiple other peers. Even if the directconnection between two peers is short, aggregating data multiple from multiple peersrequires synchronisation and buffering, which cause extra delay.

14.2.2.3 Content Delivery Network. In the recent years, content delivery net-works (CDNs) have emerged as the privileged way for large-scale content delivery. CDNcomprises three types of communication devices: a relatively small number of sources,which directly receive the content from the service producer, a medium-sized network ofreflectors, and a large number of edge servers, which are deployed directly in the accessnetworks, close to the users. The proximity between the end-users and the edge-serversmakes that network latency is small.

For a decade, the CDN providers have met the demand of two families of playersin the value chain of content delivery: service providers (because large-scale Internetservices have to be distributed for redundancy, scalability, and low-latency reasons) andnetwork operators (because minimizing inter domain traffic while still fulfilling their ownusers’ requests is a business objective). CDNs have thus emerged as a new category ofmarket players with a dual-sided business. They provide caching capacities “as a service”to network operators, and they provide a distributed hosting capacity to service providers.The CDN providers provide both scalability and flexibility, they deal with distributioncomplexities, and they manage multiple operator referencing—all of these services at aunique selling point.

The Works in Refs. such as [14, 15] confirm that edge-servers are not only usedfor serving static content. As studied in Ref. [1], current CDN infrastructure has theability to serve millions of end users and is well-positioned to deliver game content andsoftware [16]. However, CDN edge servers are generally built from commodity hardwarethat have relative weak computational capabilities and often lack GPUs.

14.2.3 Composing Hybrid Delivery Models

At the time this chapter was written, there was no clear consensus about the best solutionsto deploy. Typically for video streaming, we observe that the main actors made differentchoices. To name a few:

• Google uses multiple DCs distributed over the globe to delivery their services,including YouTube [17].

• NetFlix uses a composition of multiple CDNs on its delivery chain [18].• A composition of P2P assisted by CDN to improve viewers QoE was deployed on

LiveSky [19].• Justin.tv, one of the biggest live streaming service, uses private DC assisted by

CDN [20].

A recent trend is to build hybrid delivery models that compose several of theaforementioned models. We depict in Figure 14.1 and list hereafter some frequentcompositions, each one with its own pros and cons.

“9780471697558c14” — 2015/3/20 — 12:16 — page 338 — #6


$ $$ $$$

} }

Service

providerService

provider

Service

provider

Service

provider

Stable

traffic

Traffic

peak

CDN

Service

provider

Service

provider

CDN

CDN CDN CDN

Users Users Users Users

UsersUsers

UsersUsers

Users

(a)(b)

(d)(c)

Figure 14.1. Hybrid delivery models compositions. (a) CDN-P2P, (b) Multi-CDN, (c) Multi-DC,

and (d) DC-CDN.

14.2.3.1 CDN P2P. Such a composition is managed by either the serviceprovider, as shown in Ref. [21] or the CDN provider, as shown in Ref. [22]. The CDNoffers some guarantee on the QoE by offering a minimum amount of resources and byreducing the first response time. The CDN also allows users behind NAT to be properlyserved. On its side, the P2P system assists the CDN in case of traffic peak. The more usersto be served by the system, the more resources in the system. The potential problem withsuch composition is that service providers want to control all parts of the delivery chain.Indeed, most profits come from a clear understanding of the demand from end users anda capacity to adapt the delivered content to every user (e.g., embedded advertisement).Another potential problem comes from the lack of guarantee of QoS. Finally CDN-P2Pcompositions suffer from the same drawback as P2P one, including the requirement ofinstalling a software on users’ computer.

14.2.3.2 Multi-CDN. The service provider is commonly the main manager ofthis composition. A typical example of such a composition has been thoroughly studiedin Ref. [18]. The main idea is that the service provider relies on several CDNs to deliverthe content. For each user, the service provider decides the CDN in charge of serving

“9780471697558c14” — 2015/3/20 — 12:16 — page 339 — #7

CLOUD GAMING 339

this user. The advantage in this composition is the possibility to achieve the best QoE forthe viewers with the lowest cost from the multiple prices applied by each CDN. Anotheradvantage is that the delivery is more robust since a downtime from one CDN providercan be mitigated by using another CDN. However, this delivery is only based on third-party actors, which means that even the consolidated background traffic is dealt in apay-as-you-go way. Therefore, the overall price can be high.

14.2.3.3 Multi-DCs. Service providers that want to provide features beyond thebasic delivery of the same content are interested in hosting the service in their own serversin a DC. However, response time requirements force service providers to deploy multipleDCs in order to serve the whole population with low response time [23]. In that case, itbecomes crucial to manage the traffic such that the load is well balanced among thedifferent DCs [24] and to manage the sharing of content over the multiple DCs [25]. Theadvantage is a lower cost rate per GBps than multi-CDN, the total control of the deliverychain, and a relatively low response time since every end-users should have a DC nearby(so ttransit is reduced). The cons include substantial high cost for the initial deploymentfor multiple DCs.

14.2.3.4 DC CDN. In order to mitigate the disadvantages of the aforementionedmodels, it is frequent that video service providers deploy hybrid DC-CDN composi-tions [20]. DC-CDN hybrid composition is expected to combine the main advantages ofboth delivery solutions at a minimum cost. The high prices paid for CDN are minimizedby using the CDN resources only when DC is out of capacity, normally at traffic peaks.The DC is dimensioned so that the consolidated background traffic (or valleys of usage) isdealt by the DC. In the cloud computing context, such composition is often called hybridcloud where conventional DCs and cloud solutions are deployed together to aim the samecombined advantage [26]. Various studies have indicated that it is not trivial to outsourcetasks from the internal DCs to the external delivery infrastructure [27], typically due tosecurity [28], QoS [29], and economic [27] reasons.

14.3 CLOUD GAMING

As said in Section 14.1, cloud gaming is a new paradigm that has the potential to changethe video game industry. Attractive for both end-users and developers, cloud gamingfaces two main technical challenges: latency and the need for servers with expensive,specialized hardware that cannot simultaneously serve multiple gaming sessions. Byoffloading computation to a remote host, cloud gaming suffers from

• encoding latency, that is, the time to compress the video output• network latency, which is the delay in sending the user input and video output back

and forth between the end user and the cloud.

Past studies [30–32] have found that players begin to notice a delay of100 milliseconds [5]. Although the video encoding latency will likely fall with faster

“9780471697558c14” — 2015/3/20 — 12:16 — page 340 — #8


encoders, at least 20 milliseconds of this latency should be attributed to playout andprocessing delay [33]. It means that 80 milliseconds is the threshold above which net-work latency begins to appreciably affect user experience, among which a significantportion of network latency is unavoidable as it is bounded by the speed of light in fiber.Because of this strict latency requirement, servers are restricted to serving end users thatare located in the same vicinity. This explains the inaptitude of a DC-only solution forcloud gaming. Even multi-DC hybrid solutions are inefficient for cloud gaming whenthe number of DC is too small. To validate this statement, we perform a large-scale mea-surement study consisting of latency measurements from PlanetLab and Amazon EC2to more than 2,500 end users. These results are originally presented in Ref. [34].

In the following, we study the effectiveness of various infrastructures to offer on-demand gaming services. We focus on the network latency since the other latencies,especially the generation of game videos, have been studied in previous work [30, 35].We evaluate in particular a multi-DC solution and a hybrid CDN-DC solution, which hasbeen originally proposed in Ref. [36].

14.3.1 Measurement Settings

To determine the ability of today’s cloud to provide the cloud gaming service, we conducttwo measurement experiments to evaluate the performance and latency of cloud gamingservices on existing cloud infrastructures in the United States. First, we perform a mea-surement campaign on the Amazon EC2 infrastructure during May 2012. Although EC2is one of today’s largest commercial clouds, our measurements show that it has someperformance limitations. Second, we use PlanetLab [37] nodes to serve as additionalDCs in order to estimate the behavior of a larger, more geographically diverse cloudinfrastructure.

In our model, a DC (either Amazon EC2 or PlanetLab) is able to host all games andto serve all end-users that are within its latency range as it has a significant amount ofstorage and computational resources. This model is based on public information avail-able regarding the peak number of concurrent end users using on-demand gaming today(less than 1800 [38]) and the size of modern cloud DCs (hundreds of thousands ofservers [39]).

As emphasized in previous network measurement papers [6, 7], it is challenging todetermine a representative population of real clients in large-scale measurement experi-ments. For our measurements, we use a set of 2,504 IP addresses, which were collectedfrom 12 different BitTorrent3 swarms. These BitTorrent clients were participating in pop-ular movie downloads. Although 2,504 IP addresses represent a fraction of the totalpopulation in the United States, these IP addresses likely represent home users whoare using their machines for entertainment purposes. Therefore, we believe that theseusers are a reasonable cross-section of those who use their computers for entertainmentpurposes, which includes gaming. We refer to this selected users as the population.

We choose BitTorrent as the platform for our measurement experiments since,in our beliefs, it provides a realistic representation of end users and their geographic

3http://www.bittorrent.com/

“9780471697558c14” — 2015/3/20 — 12:16 — page 341 — #9

CLOUD GAMING 341

distribution. We use the GeoIP service to restrict our clients to the United States, whichis the focus of this measurement study. Moreover, it allows us to determine the approxi-mate geographical locations of our end users, which are used as a parameter for many ofour measurement experiments. After determining the clients, we use TCP measurementprobe messages to determine latency between servers and clients. Note that we mea-sure the round-trip time from the initial TCP handshake, which is more reliable than atraditional ICM ping message and less sensitive to network conditions.

14.3.2 Measurement of a State-of-the-Art Multi-DCInfrastructure

The Amazon EC2 cloud offers three DC in the United States to its customers. We obtaina virtual machine instance in each of the three DCs. Every 30 min, over a single day, wemeasure the latency between each DC to all of the 2,504 clients. We use the median valuefrom ten measurements to represent the latency between an end-host to a PlanetLab nodeor EC2. Figure 14.2 depicts the ratio of covered end users that have at least one networkconnection to one of the three DCs for a given latency target. Two observations can bemade from the graph shown in Figure 14.2:

• More than one-quarter of the population cannot play games from an EC2-poweredcloud gaming platform. The thin, vertical gray line in Figure 14.2 represents the80 milliseconds threshold network latency yielding a 70% coverage.

• Almost 10% of the potential clients are essentially unreachable. In our study,unreachable clients are clients that have a network latency over 160 milliseconds,which renders them incapable of using an on-demand gaming service. Althoughwe filter out the IP addresses that experienced highly variable latency results, westill observe that a significant proportion of the clients have a network latency

0

0

0.2

0.4

0.6

0.8

1

20 40 60 80 100 120 140 160

Median latency (in ms)

Ratio o

f cove

red u

sers

Figure 14.2. Population covered by EC2 cloud infrastructure as a function of the median

latency.

“9780471697558c14” — 2015/3/20 — 12:16 — page 342 — #10


over 160 milliseconds. This result confirms the measurements made by previouswork, which identified that home gateways can introduce a significant delay ondata transmission [7].

14.3.2.1 Effects of a Larger Cloud Infrastructure. An alternative to deploy-ing a small number of large DCs is to instead use a large number of smaller DCs. The mainproviders have claimed to possess up to a dozen DCs within the United States [40, 41] inorder to improve their population coverage. A large DC is generally more cost-efficientthan a small DC; therefore, cloud providers should carefully determine if it is economi-cally beneficial to build a new DC. In the following, we investigate the gain in populationcoverage when new DCs are added into the existing EC2 infrastructure.

We create a simulator that uses our collected BitTorrent latencies in order to deter-mine how many users are able to meet the latency requirement for gaming. We use 44geographically diverse PlanetLab [37] nodes in the United States as possible locationsfor installing DCs. We consider a cloud provider that can choose from the 44 locationsto deploy a k-DC cloud infrastructure. We determine latencies between clients and Plan-etLab nodes using the result of our measurement campaign. Afterwards, we determinethe end user coverage when using PlanetLab nodes as additional DCs.

We design two strategies for deciding the location of DCs:

• Latency-based strategy: the cloud provider wants to build a dedicated cloud infras-tructure for interactive multimedia services. The network latency is the onlydriving criteria for the choice of the DC locations. For a given number k, the cloudprovider places k DCs such that the number of covered end users is maximal.

• Region-based strategy: the cloud provider tries to distribute DCs over an area. Wedivide the United States into four regions as set forth by the US Census Bureau:Northeast, Midwest, South, and West. Every DC is associated with its region. Inevery region, the cloud provider chooses random DC locations. For a given totalnumber of DCs k, either � k

4� or � k

4� DCs are randomly picked in every region.

For cloud providers, the main concern is to determine the minimum number of DCsrequired to cover a significant portion of the target population. Figure 14.3 depicts theratio of covered users as a function of the response time target for two targets networklatencies: 80 and 40 milliseconds. The former 80 milliseconds network latency targetcome from previous works [5, 31, 42], which indicate that 100 milliseconds is the latencythreshold that is required for realism and acceptable gameplay for action games. Becauseat least 20 milliseconds can be attributed to playout and processing delay [33], networklatency can account for up to 80 milliseconds of the total latency. We select 40 millisec-onds as a stricter requirement for games that require a significant amount of processingor multiplayer coordination.

We observe that a large number of DCs are required if one wants to cover asignificant proportion of the population. Typically, a cloud provider, which gives pri-ority to latency, reaches a coverage ratio of 0.85 with 10 DCs for a target latency of80 milliseconds. Using the region-based strategy requires nine DCs to reach a 0.8 ratio.In all cases, a 0.9 coverage ratio with a 80 milliseconds response time is not achiev-able without a significant increase in the number of DCs (around 20 DCs). For more

“9780471697558c14” — 2015/3/20 — 12:16 — page 343 — #11

CLOUD GAMING 343

0

0.2

0.4

0.6

0.8

1

5 10 15 20 25

EC2 80 ms

EC2 40 ms

Ratio o

f cove

red u

sers

Number of datacenters

80 ms latency-based

80 ms region-based

40 ms region-based

40 ms latency-based

Figure 14.3. Coverage vs. the number of deployed DCs.

4040

0.2

0.4

0.6

0.8

1

50 60

Response time (in ms)

Ratio o

f cove

red u

sers

70 80 90 100

5 Datacenters

20 Datacenters

Figure 14.4. User coverage for a region-based DC location strategy (average with min and max

from every possible set of locations).

demanding games that have a lower latency requirement (e.g., 40 milliseconds), we findthat cloud provides exceedingly low coverage. Even if 20 DCs are deployed, less thanhalf of the population would have a response time of 40 milliseconds. Overall, the gainsin coverage are not significant with regard to the extra cost due to the increase in thenumber of DCs.

We then focus on the performance of two typical cloud infrastructures: a 5- and20-DC infrastructure. We assume a region-based location strategy since it is a realistictrade-off between cost and performance. We present the ratio of covered populations forboth infrastructures in Figure 14.4.

We observe that there can be significant performance gaps between a 5 and 20-DCdeployment. Moreover, five DCs do not guarantee reliably good performance, despitethe expectation that a region-based location strategy provides good coverage. Typically,a well-chosen 5-DC deployment can achieve 80% coverage for 80 milliseconds. How-ever, a poorly chosen 5-DC deployment can result in a disastrous 0.6 coverage ratio. Bycontrast, a 20-DC deployment exhibits insignificant variances in the coverage ratio.

“9780471697558c14” — 2015/3/20 — 12:16 — page 344 — #12


14.3.3 Hybrid DC-CDN Infrastructure

Since the multi-DC hybrid solution has some significant shortcomings. We explorein the following the potential of a hybrid DC-CDN infrastructure to meet the latencyrequirements of on-demand gaming end users. More details can be found in [36].

14.3.3.1 Experimental Settings. Out of the 2,504 IP addresses collected inour measurement study, unless otherwise specified, we select 1,500 of these IP addressesto serve as on-demand gaming end users. Of the remaining IP addresses, we select 300of them to represent edge servers.

Client-to-client latency is determined as follows. Our simulator requires a latencymatrix between all of our collected BitTorrent clients. A BitTorrent client may be usedto represent either an edge server or an end-user. Since we do not have control of ourcollected BitTorrent clients, we estimate client-to-client latency by mapping a Client C1

to its closest PlanetLab node, P. Suppose we wish to determine the latency betweenClient C1 and C2. This latency is the sum of P’s latency to C2 and a fuzzing factor thatis between 0 and 15 milliseconds. We assume that client C1 is located relatively near itsclosest PlanetLab node P; thus, the additional 0–15 milliseconds accounts for the latencybetween C1 and P. Furthermore, an edge server may only serve an end user if it hosts theend user’s demanded game.

We evaluate the effectiveness of a deployment or configuration by the number ofend users that it is able to serve. In all of our experiments, we only model active users,and they are statically matched to either a datacenter or an available edge-server that hasthe requested game and meets the user’s latency requirement. For our experiments, anedge server can only serve one end user at a time. An end user is served (or satisfied) ifone of the following conditions are true:

• Its latency to a DC is less than its required latency.• It is matched to an edge server that is within its latency requirement and hosts its

requested game.

An end user may be unmatched if a DC cannot meet its latency requirement and allsuitable edge servers are matched to other end users.

14.3.3.2 Determining the Size of the Augmented Infrastructure. Wenow focus on the 80 milliseconds target response time (for reasons that are described inSection 14.3.2.1), and we consider the factors that affect the performance when additionalservers are added to the existing cloud infrastructure. Upon closer inspection, we are abledetermine whether clients are covered/served by EC2 or not. The EC2-uncovered clientscan then also be differentiated between those who may be covered by a edge-server andthose who are unreachable for a given response time.

In this experiment, each edge server hosts one game, and there is only one gamein the system. Furthermore, we restrict edge-servers to serve a single user as opposedto many users. Figure 14.5 shows that approximately 10% of end users are unable tomeet the 80 milliseconds latency target using EC2 or be served by edge servers. These

“9780471697558c14” — 2015/3/20 — 12:16 — page 345 — #13

UGC LIVE STREAMING 345

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600 700 800 900 1000

Ra

tio

of se

rve

d u

se

rs a

mo

ng

EC

2 u

nre

ach

ab

le

Number of smart edges

500 Clients

1,000 Clients

2,000 Clients

Unreachable clients

Figure 14.5. Ratio of served end users among the EC2-uncovered end users. Each edge-server

can host one game, and there is only one game in the system. One edge-server can serve up

to one on-demand gaming end user. The gray area indicates the percentage of end users that

cannot be served by both edge-servers and EC2 datacenters.

end users exhibit excessive delay to all edge servers and datacenters, which is likely dueto nonnetwork delays that are outside of our system’s control. Therefore, the system’sperformance with respect to the ratio of covered end users is limited by this ceiling.

The results of our measurement study point to a hybrid DC-CDN infrastructure thatcombines existing DCs with CDN servers. Because CDN servers are in closer prox-imity to end users, they are able to provide lower latency for end users than distantlylocated cloud DCs. In addition, a hybrid DC-CDN infrastructure is more attractive than amulti-CDN because DC, which are less costly than CDN, can serve a significant fractionof users. Therefore, DC-CDN is attractive for such demanding interactive, multimediaservice.

Yet, there are still many challenges that need to be addressed. One challenge is todetermine the selection of edge servers that maximizes user coverage. Unfortunately,this is an instance of the facility location problem which is NP-hard. Furthermore, sinceedge servers cannot host an infinite number of games, due to physical limitations andcost considerations, another challenge is to strategically place games on edge servers inorder to achieve a maximal matching between end users and edge servers. Solutions tothese challenges are especially required in case of a growth of the number of concurrentgamers.

14.4 UGC LIVE STREAMING

Over-the-top (OTT) TV channels mimic regular TV channels, but instead of using tra-ditional mass communication medium (e.g., broadcast, satellite and cable) they use theInternet to deliver live video stream to their audience. A key consequence of the devel-opment of OTT TV services is that anybody can be a TV provider. Crowdsourced news

“9780471697558c14” — 2015/3/20 — 12:16 — page 346 — #14


channel [2] and e-sport channels [3] are examples of the emerging usages that are enabledby TV delivery over an open medium. The user became an important source of contentproviding to the services continuously massive amount of information. The vast majorityof works related to the delivery of live streams, both with P2P (see Ref. [1] for a survey)and CDN (see Refs. [43–45] for recent works).

The behaviors of contributors to video sharing platforms like YouTube has beenextensively studied since [46]. For example, a study presented in Ref. [47] estimates thatin May 2011 there was a total of roughly 500 millions YouTube videos, a minimum totalstorage needed for these videos was around 5 petabytes (PBs) and the network capacityto run YouTube ranged from 17 to 46 PBs/day. To the best of our knowledge, there isno similar measurements for UGC live streaming. A characterization of professionalplayers broadcasting in twitch.tv (a branch of justin.tv exclusively for gamecasting) ispresented in Ref. [3]. Another work [48] focusing on gamecasting community is aboutXFire, a social network for gamers featuring live video sharing. Live video sharing isalso explored in Ref. [49], where authors analyzed 28 days of data from two channelsassociated with a popular Brazilian TV program aired in 2002. A study over a free-to-use P2P live streaming system, namely Zattoo, with provider side traces pointed outthat it served over 3 million registered users across eight European countries with peaksof 60, 000 simultaneously users on a single channel [50]. During China 2008 OlympicGames, data were collected from the largest Chinese CDN [51], showing that the livenature of such events results in differences on access patterns compared to video ondemand (VoD) and other UGC systems. However, none of these works has analyzed thebehavior of contributors nor estimated the size of the delivery networks.

To understand the behavior of UGC live video-streaming services, we performed anextensive study over real traces of a major live streaming service, namely justin.tv.

14.4.1 Analysis of justin.tv UGC Live Streaming System

Justin.tv offers a free platform for publishing user-generated live video content. In thefollowing, we distinguish uploaders and viewers. The uploaders are registered users thathave been captured broadcasting one live video at least once during the months of ourstudy. An uploader is the generator of only one given channel (a live video stream), sowe will interchangeably use the terms channel and uploader hereafter. A channel can beeither online at a given time, which means that it can be viewed by viewers, or offlinewhen the user in charge is not uploading video on this channel. A channel can alter-natively switch from offline to online and vice versa during our analysis. Viewers cansubscribe to a channel so that they are notified every time the channel switches on.

We use justin.tv REST API with a set of synchronized computers to collect a globalview of the justin.tv system every 5 min. We fetch information about the global popularity(total number of viewers in the system), total number of streams, channel’s popularity(number of viewers by channel), and channel’s metadata every five minutes. From thecollected data, we target the months of August and November 2012. Supported by theresults of our measurement campaign, we give two messages: A large and internationalpopulation and Uploaders guarantee a 24/7 TV-like service.

“9780471697558c14” — 2015/3/20 — 12:16 — page 347 — #15


Asia EuropeAfrica Oceania Americas

0

0.2

0.4

0.6

0.8

.11

.23

0 0

.63

.24.29

.01 .01

.43

.11

.26

0 .01

.6

.24.29

.01 .01

.43

Region

Fra

ction o

f uplo

aders

and v

iew

ers

August

Uploaders

Viewers

November

Uploaders

Viewers

Figure 14.6. Each of five regions fraction of viewers and uploaders.

14.4.1.1 A Large and International Population. First we emphasize thatjustin.tv is (1) an international service, and (2) a service that is fueled by a large popu-lation of uploaders. We first use our data traces to get the origin of uploaders, which weassociate to five regions (Africa, Americas, Oceania, Europe, and Asia). The origin ofthe viewers is not provided by justin.tv API, so we collected the estimated viewers infor-mation from Google Ad Planner service, which includes geolocalization data. Likewise,the viewers were grouped into five regions.

Our main observation in Figure 14.6 is that the viewers distribution conforms tothe distribution of Internet users [52–54]. It is important to note that previous workrelated to P2P UGC live video systems (e.g., Ref. [55]) does not highlight such well-balanced distribution of viewers. We can also notice that in both months there is anover-representation of uploaders located in Americas. We suspect that uploaders do notpay full attention to their profile settings, the default country being America.

We then want to show how vast is the population of uploaders. We analyze both entireperiods to measure the number of distinct channels that had been online. In average, thereare around 2,000 simultaneous online channels. In August, we find that around 200,000distinct uploaders have started channels during this one month period and almost 240,000for the same period analyzed in November. This number demonstrates the massivenessof UGC live streaming system in comparison with traditional IPTV systems.

14.4.1.2 Uploaders Guarantee a 24/7 TV-like Service. Our second mes-sage is that justin.tv is an always-on service, thanks to its contributors. We have to recallthat justin.tv differs from other UGC services like VoD in the sense that the servicedepends on the activity of uploaders at every time. There is a critical need of onlinechannels. Fortunately, justin.tv has loyal uploaders, who manage to be more consis-tently active (here online) than on other typical UGC platforms. It thus guarantees servicecontinuity.

We measure the number of online channels over the whole month, and then we com-pute the average numbers per hour of a day (respectively per day of a week), thus we

“9780471697558c14” — 2015/3/20 — 12:16 — page 348 — #16


6 12 180

0.25

0.5

0.75

1

0

0.25

0.5

0.75

1

YouTube

Justin.tv

(a) (b)

On

line

ch

an

ne

lsra

tio

Mon Tue Wed Thu Fri Sat

Justin.tv

YouTube

Weekday

On

line

ch

an

ne

lsra

tio

November August

Blog

post

Blog

postSun

Hour

Figure 14.7. Normalized average of diurnal (a) and weekly (b) peak ratio of simultaneous

online channels.

measure diurnal (respectively weekly) patterns. We normalize the results so that the peakof the number of online channels is equal to 1. We show our results in Figure 14.7.

To demonstrate the continuity of the live service over any time of the day we exploredthe diurnal pattern of concurrent online channels. The diurnal pattern has a traditionalshape (daylight), but the main important point to note is that this pattern is low in compar-ison with other platforms. We draw with thin lines the same lowest popularity in a scaleto 1 for two other UGC platforms: YouTube (discussed in Ref. [56] and [57]) and blog-posts ( [58] described it in 2009). It is noteworthy that justin.tv lowest global popularityin a day is more than 0.65 of its peak (noted 0.65:1), which means that there are manyonline channels all along the day. On YouTube, the number of uploaded videos is signif-icantly less important at some day time than other (nearly 0.37:1). If justin.tv followedthe same pattern as YouTube, there would be some day time without enough channels toguarantee a large enough choice of channels. Finally, blogposts have a gigantic diurnalpattern according to Ref. [58] (around 0.05:1).

The same observation holds for weekdays. The difference between its lowest andpeak global popularity is not significant on justin.tv for the month of August (0.92:1)and rather interesting for the month of November (0.83:1). In other words, there areonline channels all along the week. These results are comparable with YouTube (0.84:1)and outperform blogposts (0.06:1).

14.4.2 Motivations for a Hybrid DC-CDN Delivery

We now discuss a selection of insights to justify the usage of a hybrid DC-CDN deliverymodel in the case of UGC live streaming systems. First, it is well known that a small num-ber of contributors of UGC systems represents the vast majority of the global popularityof these platforms. Such distribution simplifies the management of CDN infrastructures.The provider is also interested in delegating to the CDN the channels with the highestresolution, which can throttle the limited bandwidth capacity of DCs. Finally, channelsthat are stable over time are easier to manage in CDN, with less configuration of edgeservers. We are interested in measuring such facts for justin.tv.

14.4.2.1 Most of the Traffic Comes from a Tiny Proportion ofUploaders. This is our main observation, and it is important to understand that these

“9780471697558c14” — 2015/3/20 — 12:16 — page 349 — #17


TABLE 14.1. Number of channels for top categories

Top

10 20 30 40 50

Aug. # channels 559 1086 1499 1830 2166Nov. # channels 458 922 1342 1670 1985

special uploaders are not online simultaneously. They have alternatively been online andoffline; but at every time, the subset of online channels out of this tiny subset of uploadersrepresents most of the traffic.

Every 5 minutes, we collect the k most popular channels. To simplify, we focus hereon values of k in {10, 20, 50}. Please recall that there are around 2,000 simultaneousonline channels, so these top channels represent a small fraction of all uploaders. Overall,for each month, we gathered more than 8,500 different lists (one new list every fiveminutes) of top-k channels.

We show that a small number of distinct channels occurs in these top-channel listsover the whole months. In Table 14.1, we give the number of different channels (# chan-nels) having at least one occurrence in these lists over the whole months. Only 559uploaders (0.3% of monthly total) have occurred in the top-10 channels in August. Itmeans that 559 uploaders have occupied the over 85,000 “spots” that were available inthe month. This result is even stronger in November for which the number of distinctuploaders is only 458 (0.2% of monthly total) although the overall number of distinctchannels is larger than in August, as discussed in Section 14.4.1.1.

We then measure the popularity of these channels and calculate the footprint of topchannels on the overall traffic of justin.tv by collecting the bitrates given in the API. Wecan thus extrapolate the total bandwidth in justin.tv system. This information is depictedin Figure 14.8. First, as can be expected, the popularity of top channels decreases fast.The gap between top-10 and top-20 channels is around 10% of the overall traffic, andalso small between top-20 and top-50 channels. Second, the peak of global popularityof justin.tv can be exclusively credited to the top-10 channels. We see a direct correla-tion between peak of overall popularity and peak in top-10 channels. A third remarkableobservation is that November peak accounted for 1 Tbps of uplinked data. Such enor-mous bandwidth makes the case for interfacing justin.tv with CDN. A fourth remark isregarding the usage of a hybrid DC-CDN model on this scenario. For example, with aDC provisioned with 100 Gbps of bandwidth capacity, all the peak traffic would havebeen sent to CDN and the DC capacity would have being almost fully used almost everyother time of the month.

14.4.2.2 The Most Popular Channels are in the Highest Resolutions.Another noteworthy observation is that the ratio of traffic generated by the aforemen-tioned top channels is bigger than for the ratio of viewers. During peaks, almost 98% oftraffic comes from the very small subset of top channels that are online, while viewersaccount at most 70%. The reason for such difference between the ratio of viewers and

“9780471697558c14” — 2015/3/20 — 12:16 — page 350 — #18


5 10 15 20 25 30

0.2

0.4

0.6

0.8

1

Days

Ba

ndw

idth

ratio

Top 50 Top 20 Top 10

5 10 15 20 25 30

500

1 , 000

100

Days

Ba

ndw

idth

(Gbps)

CDN DC

0

Figure 14.8. Maximum of bandwidth usage ratio by each hour per top 10, 20, and 50 channels,

and total bandwidth of November.

240p 360p 480p 720p 1080p

0

200,000

400,000

600,000

800,000

1,000,000

Video quality

Nb.

ofstr

eam

s

0

100,000,000

200,000,000

300,000,000

Nb.

ofvie

wers

Nb. of streams Nb. of viewers

Figure 14.9. Total number of streams and viewers for each video quality in November.

the ratio of traffic for top channels is revealed in Figure 14.9. We associate each rangeof bitrates with a determined video quality, based on the values of YouTube LiveStreamGuide and what we get from the API. As can be noted, videos with better quality are morepopular (720p being the resolution for which videos are the most popular) although theseresolutions represent a small portion of total streams.

Overall, these observations are significant in the perspective of integrating justin.tvinto a hybrid CDN-DC architecture. CDN are efficient to handle a small number of verypopular content. Based on our findings, we claim that it is easy to integrate justin.tv intoa CDN. Since a relatively small number of uploaders (around one thousand) can (at least)halve the burden on DCs, justin.tv platform should focus on these uploaders and ensurethat they get handled by the CDN as soon as they switch on their channels.

14.4.2.3 The Number of Simultaneously Online Popular Channels isStable. In the scenario where a CDN manages top-k channels, a question is the numberof uploaders that are simultaneously online at a given time. As previously said, CDNknows how to manage a small number of channels. We measure the number of onlinechannels out of the overall population of top-k channels, every 5 minutes. We present in

“9780471697558c14” — 2015/3/20 — 12:16 — page 351 — #19

TIME-SHIFTING VIDEO STREAMING 351

0 6 12 180

100

200

HourNb.

ofsim

ultan

.onlin

echannels

August Top 10 Top 20 Top 50

November Top 10 Top 20 Top 50

10 20 300

100

200

Day

Figure 14.10. Average number of simultaneously online channels per hour and day.

Figure 14.10 the results, where the average number every hour over all days in the monthis given in the first graphic of Figure 14.10, while the evolution during the month is givenin the second one.

The number of simultaneously online uploaders out of the population of CDN-friendly uploaders is both stable and small. Typically for the set of thousand uploadersthat occurred at least once in the top-20 channels, the number of online channels isbetween 100 and 130 for the month of August, which is a range that a CDN can han-dle without problem. To conclude, we claim that justin.tv platform can easily interfacewith a hybrid DC-CDN model because a small and stable population of uploaders isresponsible of the traffic peaks.

14.5 TIME-SHIFTING VIDEO STREAMING

As stated in Section 14.1, time-shifted TV is a core element of a number of potential killerapps of the connected TV. We emphasize below some of the most critical differencesbetween VoD services and time-shifted TV services:

• Time-shifted services allow end users to time-shift a program that is still on air(typically via the popular pausing feature of Personal DVR). Studies have shownthat most time-shifted requests are for the ongoing TV program [59], so deliverymodels that do not consider simultaneous ingestion and delivery of content do notmeet the demand from the end users. Typically, catch-up TV services, where everyprogram is proposed separately after it has been fully broadcast and recorded, donot provide the interactivity expected by most users.

• The length of a TV stream is several orders of magnitude longer than a typicalmovie in VoD. While a movie can be considered as one unique object, the streamof a time-shifted video is a series of portions, which are not uniformly popular. Thepopularity of video portions in a time-shifted streaming system is complex becauseit depends on multiple parameters including the popularity of the TV program

“9780471697558c14” — 2015/3/20 — 12:16 — page 352 — #20


associated with a given portion and also the time at which the portion has beenbroadcasted. Moreover, the popularity of a given portion varies with time ; it usu-ally tends to decrease with time but sometimes events that were unnoticed at thebroadcasting time can become popular later due to, for example, social networks.

• The volatility of viewers is more important than in VoD. In Ref. [60], a peak hasbeen identified at the beginning of each program, where many clients start stream-ing the content, while the spikes of departure occur at the end of the program.More than half of the population quits during the first 10 min of a program inaverage, and goes to another position in the history [61]. In a same session, a userof time-shifted TV systems (hereafter called shifter) is interested in several distinctportions, which can be far from each other in the stream history.

The characteristics of time-shifted streaming services make the delivery especiallychallenging. In particular, DC-based solutions have some serious weaknesses becausecurrent servers do not meet all the requirements. First, conventional disk-based VoDservers cannot massively ingest content, and keep pace with the changing viewing habitsof subscribers, because they have not been designed for concurrent read and writeoperations. Second, client-server delivery systems are not cost-efficient in the case ofapplications where clients require distinct portions of a stream. They can indeed not usegroup communication techniques such as multicast protocols. As a matter of facts, cur-rent time-shifted services managed by TV broadcasters are restricted to a time delayranging from 1 to 3 hours, despite only 40% of shifters watch their program less thanthree hours after the live program [59].

Due to lack of space, we will not enter here into the details of the different propos-als for delivering time-shifted streams. The most complete overview of the literature isin [62]. Previous work has highlighted the problems met by time-shifted systems basedon a DC infrastructure [63–65]. New server implementations are described in [65]. Cachereplication and placement schemes are extensively studied by the authors of [63]. Sucha solution corresponds to a hybrid CDN-DC infrastructure. A different option is to optfor DC-based solutions such as Ref. [64]. When several clients share the same opticalInternet access, a patching technique is used to handle several concurrent requests so thatthe server requirement is reduced.

The delivery model that appears to be the most attractive is a hybrid P2P-DC solu-tion. In such architecture, the main motivation is that the most popular video portions at agiven time are usually the video portions that just aired a few minutes ago. The idea is thusto cache these fresh downloaded portions in the viewers’ computer or home gateway—this is the P2P part—while the older and less popular video portions are stored in theDC using cost-effective storage systems—this is the DC part. Hybrid P2P-DC solutionshave been the topic of several papers [66–69].

In Table 14.2, we summarized the results of simulations that we conducted on aset of synthetic traces from [68]. These results indicate the percentage of video portionsthat are served by either the P2P delivery models or the DC. As can be seen, results candiffer a lot among the presented solutions. The DC is used when either the portion is notavailable in the P2P system (typically because it has not been stored in the user computer)or the P2P system does not have enough capacity to serve the users.

“9780471697558c14” — 2015/3/20 — 12:16 — page 353 — #21

OPEN CHALLENGES 353

TABLE 14.2. Ratio of video portions from P2P vs. DC

from P2P from DC

missing portion in P2P not enough capacity in P2P

PACUS [68] 75.2% 0.1% 24.7%Turntable [67] 78.5% 0% 21.5%P2TSS-Rand [66] 11.2% 23% 65.8%P2TSS-Live [66] 2.8% 22.8% 74.4%

14.6 OPEN CHALLENGES

The management of interactive multimedia service is still considered a challenging task.The solutions that have been described throughout this chapter fix some of the mostprevailing challenges and allow today’s services to be used all over the world. But thereare still some open challenges, which will require a significant effort from the scientificcommunity in the next years. We would like to highlight three topics, which, in ouropinion, will matter in the near future.

• Economics of networks: Behind the services that everybody enjoys everyday, thereis a complex value chain where multiple actors interact to provide components ofthe service (to name some of the most important actors, the CDN provider, theISP, the transit network operator and the content producer). For a given service,each actor should be profitable (its revenues should exceed the cost of the infras-tructure it provides for the said service) and aims to maximize their profits. Inthe case of multimedia services, it is frequent that a decision taken by one actorhas implications on the context of another actor. Such an interplay between actorsmakes the design of services even harder. To study the behavior of rationale actorsand the consequences of their actions on the global system, scientists use theo-retical models combining game theory and discrete optimization [70]. The recentdisputes between major Internet actors (for example Netflix and Comcast4) havehighlighted the complexity of wide-scale multimedia services and the stress theseservices impose on the infrastructure. Despite the relative youth of network eco-nomics as a scientific domain, we believe that future works related to CDN andcontent provider should take into account the economical drivers for these actors.

• Virtualization for Intensive Multimedia Tasks: Multimedia services have a highdemand for specialized resources, for example, Graphics Processing Unit (GPU).The migration from private DCs (with dedicated hardware) to the cloud (with vir-tual machines, shared resources, and standard hardware) is a long, still ongoingjourney. The elasticity of DCs has the potential to convince service providers tomigrate their most intensive tasks but some of these tasks are difficult to migrate

4http://blog.streamingmedia.com/2014/02/heres-comcast-netflix-deal-structured-numbers.html

“9780471697558c14” — 2015/3/20 — 12:16 — page 354 — #22


because the commoditized hardware cannot accomodate the requirements of spe-cialized software (e.g. a game engine requires GPU), and because these softwarehave been designed to maximize the utilization of hardware although DC man-agement requires smart resource sharing. Among the advanced solutions, thedevelopment of virtual DC is expected to offer well-configured computing infras-tructure in shared data-center [71]. Virtual DCs are however still in their infancy.More generally, although some vendors claim that some tasks can now be run inthe cloud (see the Amazon Elastic Transcoding offer), we believe that scientistsdealing with system and network management will find in the next years a lot ofopen problems related to the hosting of multimedia software services in sharedhardware resources.

• Improvement of Adaptive Streaming. Dynamic adaptive streaming technologieshave been recently adopted by a majority of streaming vendors and serviceproviders. The standardization efforts at MPEG has allowed various key advance-ments in the technologies, but they also reveal the multiple open problems that stillneed proper solutions. We emphasize three topics. First the work related to Serverand network-assisted DASH Operations (SAND) at MPEG is key for those whocall for a better integration of network operators in the adaptive process. Today’ssolutions are only based on the client side. As it has been shown in various papers(e.g., Ref. [72]), a client-only adaptive systems has serious weaknesses. A bet-ter collaboration between every actor in the chain would be beneficial, while thetechnology must keep its current simplicity, which is part of the reasons for itswidespread adoption. The second important topic is the implementation of low-latency live streaming. Some papers (e.g. Ref. [73]) have started studying liveadaptive streaming more carefully with the goal of offering the same level of adap-tivity as for regular stored video although the stream is generated on the fly. Finally,a third topic which requires extra-attention is the pre-delivery phase in multime-dia services. The decision of how to encode the stream that has to be delivered(the number of representations, the bit-rates, the resolutions) is typically criti-cal because the whole delivery infrastructure has to address the consequences ofthese decisions. Preliminary works have studied some of the problems in a formalway [74], but much more has to be done.

14.7 CONCLUSION

Interactive multimedia services have become a key component of the Internet. In thischapter, we highlighted three of them: cloud gaming, UGC live streaming and time-shifted TV. These services are however extremely challenging to implement, deploy andmanage. The delivery infrastructure, which is referred to as the cloud, is far more complexthan for typical static websites.

One of the main messages we conveyed in this chapter is that hybrid deliveryarchitecture feature attractive characteristics to address the challenges of interactivemultimedia services. Their management is however difficult. Moreover, there is no

“9780471697558c14” — 2015/3/20 — 12:16 — page 355 — #23

REFERENCES 355

“one-fits-all” solution. We exhibited in this chapter that, for each service, a differenthybrid architecture is the most appropriate.

The management of hybrid architectures is a tremendously promising research area.In particular, recent works have shown that cost savings by an order of magnitude canbe achieved by the implementation of smart hybrid architecture instead of a more con-ventional DC-only or CDN-only infrastructure. A lot of opportunities exist typically inexploring content delivery with optimization approaches, in applying data analysis tech-niques to large-scale services, and in leveraging new multimedia technologies to improvethe QoE of mobile users.

REFERENCES

1. Andrea Passarella. A survey on content-centric technologies for the current internet: Cdn andp2p solutions. Computer Communications, 35(1):1–32, 2012.

2. Usama Mir, Houssein Wehbe, Loutfi Nuaymi, Aurelie Moriceau, and Bruno Stevant. Thezewall project: Real-time delivering of events via portable devices. In Proceedings of the77th IEEE vehicular Technology Conference, VTC Spoing 2013, June 2–5, Dresden, Germany.IEEE, 2013.

3. Mehdi Kaytoue, Arlei Silva, Loïc Cerf, Wagner Meira Jr., and Chedy Raïssi. Watch me play-ing, i am a professional: A first study on video game live streaming. In Proceedings of the 21stworld wide web Conference, www 2012, April 16–20, Lyon, France. ACM, 2012.

4. Nielsen Company. Three Screen Report Q1, June 2010.http://www.nielsen.com lusienlinsignts/reports/2010/three-screen-report-91-2010. html

5. Michael Jarschel, Daniel Schlosser, Sven Scheuring, and Tobias Hoßfeld. Gaming in theclouds: Qoe and the users’ perspective. Mathematical and Computer Modelling, 57: 2883–2894, 2013.

6. Marcel Dischinger, Andreas Haeberlen, P. Krishna Gummadi, and Stefan Saroiu. Character-izing residential broadband networks. In Proceedings of the 7th ACM SIGCOMM Conferenceon Intenent Measuremer, 2007, October 22–26, San Diego, CA 2007.

7. Srikanth Sundaresan, Walter de Donato, Nick Feamster, Renata Teixeira, Sam Crawford, andAntonio Pescapè. Broadband internet performance: A view from the gateway. In Proceedingsof the ACM SIGCOMM 2011 Conference on Applications, Technologies, Architectures, andProtocals for Computer Communications, August 15–19, Toronto, ON, 2011.

8. Lucas DiCioccio, Renata Teixeira, and Catherine Rosenberg. Impact of home networks onend-to-end performance: Controlled experiments. In Sigcomm Workshop on Home Networks,New Delhi 2010.

9. Stacey Higginbotham. Smart TVs cause a net neutrality debate in S. Korea. Giga OM, Febuary2012. http:// gigaom. com/2012/02110/smart-trs-cause-a-net-neturality-debate-in-s-korea/

10. Stephen M. Rumble, Diego Ongaro, Ryan Stutsman, Mendel Rosenblum, and John K. Ouster-hout. It’s time for low latency. In 13th Workshop opn Hot Topics in Operating Systems,HotOS.XIII May 9–11, Napa, CA, 2011.

11. Luiz André Barroso and Urs Hölzle. The datacenter as a computer: An introduction to thedesign of warehouse-scale machines. Synthesis Lectures on Computer Architecture, 4(1):1–108, 2009.

“9780471697558c14” — 2015/3/20 — 12:16 — page 356 — #24


12. Adele Lu Jia, Lucia D’Acunto, Michel Meulpolder, and Johan A. Pouwelse. Modeling andanalysis of sharing ratio enforcement in private bittorrent communities. In Proceedings of theIEEE International Conference on Communication, ICC 2011, June 5–9, Kyoto, Japan, 2011.

13. Jan Seedorf, Sebastian Kiesel, and Martin Stiemerling. Traffic localization for p2p-applications: The alto approach. In Proceedings P2P 2009, Nineth International Conferenceon Peer-to-Peer Computing, September 9–11, Seattle, WA, 2009.

14. Avraham Leff and James T. Rayfield. Alternative edge-server architectures for enterprise jav-abeans applications. In Proceedings of the 5th ACM/IFIP/USENIX International Conferenceon Middleware, Middleware ’04, pages 195–211, New York, 2004. Springer-Verlag, NewYork.

15. Mikael Desertot, Clement Escoffier, and Didier Donsez. Towards an autonomic approachfor edge computing: Research articles. Concurry and Computation: Practice Experceince,19(14):1901–1916, September 2007.

16. Kris Alexander. Fat client game streaming or cloud gaming. Akamai Blog, Aug. 2012.https://blogs.akamai.com/2012/08/part-2-fat-client-game-streaming-or-cloud-gaming.html.

17. Vijay Kumar Adhikari, Sourabh Jain, Yingying Chen, and Zhi-Li Zhang. Vivisecting youtube:An active measurement study. In INFOCOM, Orlando, FL. IEEE, New York, 2012.

18. Vijay Kumar Adhikari, Yang Guo, Fang Hao, Matteo Varvello, Volker Hilt, Moritz Steiner,and Zhi-Li Zhang. Unreeling netflix: Understanding and improving multi-cdn movie delivery.In INFOCOM, Orlando, FL. IEEE, New York, 2012.

19. Hao Yin, Xuening Liu, Tongyu Zhan, Vyas Sekar, Feng Qiu, Chuang Lin, Hui Zhang, andBo Li. Design and deployment of a hybrid cdn-p2p system for live video streaming: Experi-ences with livesky. In Proceedings of the 17th International Confernece on Multimedia 2005,October 19–24, Vancouver, BC. ACM, 2009.

20. Todd Hoff. Gone fishin’: Justin.tv’s live video broadcasting architecture. High Scalabilityblog, November 2012. http://is.gd/5ocNz2.

21. Pietro Michiardi, Damiano Carra, Francesco Albanese, and Azer Bestavros. Peer-assistedcontent distribution on a budget. Computer Networks, 56(7):2038–2048, 2012.

22. Paarijaat Aditya, Mingchen Zhao, Yin Lin, Andreas Haeberlen, Peter Druschel, Bruce Maggs,and Bill Wishon. Reliable client accounting for p2p-infrastructure hybrids. In NSDI, San Jose,CA. USENIX, Berkeley, CA, 2012.

23. Ricardo A. Baeza-Yates, Aristides Gionis, Flavio Junqueira, Vassilis Plachouras, and LucaTelloli. On the feasibility of multi-site web search engines. In Proceedings of the ACM CIKM,Hong Kong, Chince 2009.

24. Jimmy Leblet, Zhe Li, Gwendal Simon, and Di Yuan. Optimal network locality in distributedvirtualized data-centers. Computer Communications, 34(16):1968–1979, 2011.

25. Nikolaos Laoutaris, Michael Sirivianos, Xiaoyuan Yang, and Pablo Rodriguez. Inter-datacenter bulk transfers with netstitcher. In Proceedings of the ACM SIGCOMM 2011Conference on Application, Technologies, and Protocols for Computer Communications,August 15–19, Toronto, ON. ACM, 2011.

26. Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy H. Katz, AndyKonwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. Aview of cloud computing. Communications of the ACM, 53(4):50–58, 2010.

27. Ruben Van den Bossche, Kurt Vanmechelen, and Jan Broeckhove. Cost-optimal schedulingin hybrid iaas clouds for deadline constrained workloads. In IEEE, International Conferenceon Cloud Comuting, CLOUD 2010, July 5–10, Miami, FL, 2010.

“9780471697558c14” — 2015/3/20 — 12:16 — page 357 — #25

REFERENCES 357

28. Michael Smit, Mark Shtern, Bradley Simmons, and Marin Litoiu. Partitioning applicationsfor hybrid and federated clouds. In Center for Advanced Studies on Collaborative ResearchCASCON’12, November 5–7, Toronto, ON. IBM / ACM, 2012.

29. Hui Zhang, Guofei Jiang, Kenji Yoshihira, Haifeng Chen, and Akhilesh Saxena. Intelligentworkload factoring for a hybrid cloud computing model. In 2009 IEEE Congress on Services,Part I, SERVICESI2009, July 6–10, Los Anseles, CA. IEEE, 2009.

30. Michael Jarschel, Daniel Schlosser, Sven Scheuring, and Tobias HoÃŸfeld. An evaluationof qoe in cloud gaming based on subjective tests. In Proceedings of the Fifth InternationalConference on Innovative Mobile and Internet Services in Ubiquitous Computing, 1MIS 2011,June 30–July 2, Seoul, Korea, 2011.

31. Mark Claypool and Kajal T. Claypool. Latency and player actions in online games. Commu-nications of the ACM, 49:40–45, 2006.

32. Mark Claypool and Kajal Claypool. Latency can kill: Precision and deadline in online games.In MMSys, Phoenix, AZ 2010.

33. Sean K. Barker and Prashant Shenoy. Empirical Evaluation of Latency-sensitive ApplicationPerformance in the Cloud. In MMSys, Phoenix, AZ, 2010.

34. Sharon Choy, Bernard Wong, Gwendal Simon, and Catherine Rosenberg. The brewing stormin cloud gaming: A measurement study on cloud to end-user latency. In Proceedings of ACMNetGames, Venice, Italy 2012.

35. Kuan-Ta Chen, Yu-Chun Chang, Po-Han Tseng, Chun-Ying Huang, and Chin-Laung Lei.Measuring the latency of cloud gaming systems. In ACM Multimedia, Scottsdale, AZ,2011.

36. Sharon Choy, Bernard Wong, Gwendal Simon, and Catherine Rosenberg. A hybrid edge-cloudarchitecture for reducing on-demand gaming latency. Multimedia Systems Journal, 20(2),503–519 March 2014.

37. Andy Bavier, Mio Bowman, Brent Chun, Daird Culler, Scott Karlin, Steve Muir, harry. Peter-son, Jimothy. Roscoe, Jommo. Spalink, and Mike Wawrzoniak. Operating system supportfor planetary-scale network services. In 1st Symposium on Networks System Design andImplementation (NSDI 2004), March 29–31, 2004, San Francisico, CA, 2004.

38. Xav de Matos. Source: Onlive averaged 1800 concurrent users, ceo promised to pro-tect patents against gaikai. http://www.joystiq.com/2012/08/17/source-onlive-ceo-showed-no-remorse-when-announcing-layoffs/.

39. Albert Greenberg, James Hamilton, David A. Maltz, and Parveen Patel. The cost of a cloud:Research problems in data center networks. SIGCOMM Comput. Commun. Rev., 39(1):68–73,December 2008.

40. Gaikai will be fee-free, utilize 300 data centers in the us. http://www.joystiq.com/2010/03/11/gaikai-will-be-fee-free-utilize-300-data-centers-in-the-us/.

41. Gdc09 interview: Onlive founder steve perlman wants you to be skeptical.http://www.joystiq.com/2009/04/01/gdc09-interview-onlive-founder-steve-perlman-wants-you-to-be-sk.

42. Lothar Pantel and Lars C Wolf. On the impact of delay on real-time multiplayer games. InProceedings of the 12th International Workshop on Network and Operating Systems Supportfor Digital Audio and Video, pages 23–29. May 12–14, Miani Beach, FL ACM, 2002.

43. Micah Adler, Ramesh K. Sitaraman, and Harish Venkataramani. Algorithms for opti-mizing the bandwidth cost of content delivery. Computer Networks, 55(18):4007–4020,2011.

“9780471697558c14” — 2015/3/20 — 12:16 — page 358 — #26


44. Jiayi Liu, Gwendal Simon, Catherine Rosenberg, and Géraldine Texier. Optimal deliveryof rate-adaptive streams in underprovisioned networks. IEEE Journal on Selected Areas inCommunications, 32: 706–713, 2014.

45. Jiayi Liu and Gwendal Simon. Fast near-optimal algorithm for delivering multiple live videostreams in cdn. In 22nd International Conference on Computer Communication and Networks,ICCCN 2013, July 30–August 2, Nassau, Bahamas. 2013.

46. Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-Yeol Ahn, and Sue B. Moon. I tube,you tube, everybody tubes: Analyzing the world’s largest user generated content video system.In. ACM, 2007.

47. Jia Zhou, Yanhua Li, Vijay Kumar Adhikari, and Zhi-Li Zhang. Counting youtube videos viarandom prefix sampling. In Proceedings of the 11th ACM SIGCOMM Conference on InternetMeasurement, IMC’11, November 2, Berlin Gemany. ACM, 2011.

48. Siqi Shen and Alexandru Iosup. XFire online meta-gaming network: Observation and high-level analysis. In The 4th International Workshop on Massively Multiuser Virtual Environ-ments at IEEE International Symposium on Audio-Visual Environments and Games (HAVE2011) October 15, Hebel, China, 2011.

49. Eveline Veloso, Virgílio A. F. Almeida, Wagner Meira Jr., Azer Bestavros, and Shudong Jin.A hierarchical characterization of a live streaming media workload. IEEE/ACM Transactionson Networking, 14(1):133–146, 2006.

50. Hyunseok Chang, Sugih Jamin, and Wenjie Wang. Live streaming performance of the zattoonetwork. In Proceedings of the 9th ACM SIGCOMM Conference on Intennet MeasumentsNovember 4–6, 2009, Chicago, IL. ACM, 2009.

51. Hao Yin, Xuening Liu, Feng Qiu, Ning Xia, Chuang Lin, Hui Zhang, Vyas Sekar, andGeyong Min. Inside the bird’s nest: Measurements of large-scale live vod from the 2008olympics. In Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement2009, November 4–10, Chicago, IL. ACM, 2009.

52. Yuan Ding, Yuan Du, Yingkai Hu, Zhengye Liu, Luqin Wang, Keith W. Ross, and AnindyaGhose. Broadcast yourself: understanding youtube uploaders. In Proceedings of the 11th ACMSIGCOMM Conference on Intenent Measurement, 1MG’11, November 2, Berlin, Germmay.ACM, 2011.

53. Sunghwan Ihm and Vivek S. Pai. Towards understanding modern web traffic. In Proceedingsof the 11th ACM SIGCOMM Conference on Internet Measurement, 1MC’11, November 2,Berlin, Germany, 2011.

54. Zi Hu, John Heidemann, and Yuri Pradkin. Towards geolocation of millions of ip addresses.In Proceedings of the 12th ACM SIGCOMM Conference on Internet Measurement, IMC’12November 14–16, Boston, MA, 2012.

55. Xiaojun Hei, Chao Liang, Jian Liang, Yong Liu, and Keith W. Ross. A measurementstudy of a large-scale p2p iptv system. IEEE Transactions on Multimedia, 9(8):1672–1687,2007.

56. Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-Yeol Ahn, and Sue B. Moon. Ana-lyzing the video popularity characteristics of large-scale user generated content systems.IEEE/ACM Transactions on Networking, 17(5):1357–1370, 2009.

57. Gloria Chatzopoulou, Cheng Sheng, and Michalis Faloutsos. A first step towards understand-ing popularity in youtube. In INFOCOM Workshops, San Diego, CA. IEEE, 2010.

58. Lei Guo, Enhua Tan, Songqing Chen, Xiaodong Zhang, and Yihong Eric Zhao. Analyzingpatterns of user content generation in online social networks. In Proceedings of the 15th ACM

“9780471697558c14” — 2015/3/20 — 12:16 — page 359 — #27

REFERENCES 359

SIGKDD International Conference on Knowledge Discovery and Data Mining, June 28–July1, Paris, France. 2009.

59. Nielsen Company. How DVRs Are Changing the Television Landscape, April 2009.http://www.nielsen.comlus/en/insigntslnews/2009/how-drrs-are-changing-the-television-lanscape.html

60. Tim Wauters, Wim Van de Meerssche, Filip De Turck, Bart Dhoedt, Piet Demeester, Tom VanCaenegem, and E. Six. Management of time-shifted IPTV services through transparent proxydeployment. In Proceedings of the Global Telecommunications Conference, GLOBE Com’ob,November 7–December 1, Franscisco, CA, pages 1–5, 2006.

61. Xiaojun Hei, Chao Liang, Jian Liang, Yong Liu, and Keith W. Ross. A measurement study of alarge-scale P2P IPTV system. IEEE Transactions on Multimedia, 9(8):1672–1687, December2007.

62. Niels Bouten, Steven Latré, Wim Van de Meerssche, Bart De Vleeschauwer, Koen De Schep-per, Werner Van Leekwijck, and Filip De Turck. A multicast-enabled delivery framework forqoe assurance of over-the-top services in multimedia access networks. Journal of Networkand Systems Management, 21: 1–30, 2013.

63. Juchao Zhuo, Jun Li, Gang Wu, and Su Xu. Efficient cache placement scheme for clusteredtime-shifted TV servers. IEEE Transactions on Consumer Electronics, 54(4):1947–1955,November 2008.

64. Wei Xiang, Gang Wu, Qing Ling, and Lei Wang. Piecewise patching for time-shifted TV overHFC networks. IEEE Transactions on Consumer Electronics, 53(3):891–897, August 2007.

65. Cheng Huang, Chenjie Zhu, Yi Li, and Dejian Ye. Dedicated disk I/O strategies for IPTVlive streaming servers supporting timeshift functions. In Seventh International Conference onComputer and Infomation Technology (C11 2007), October 16–19, 2007, University of Aizu,Fukusnima, Japan, 2007.

66. Sachin Deshpande and Jeonghun Noh. P2tss: Time-shifted and live streaming of video inpeer-to-peer systems. In IEEE International Conference on Multimedia and Expo, Hannover,Germany, June 2008.

67. Yaning Liu and Gwendal Simon. Distributed delivery system for time-shifted streaming sys-tems. In Proceedings of the 35th Annual IEEE Confernence on Local Computer Networks,LCN 2010, October 10–14, Denver, Co, 2010.

68. Yaning Liu and Gwendal Simon. Peer-assisted time-shifted streaming systems: Design andpromises. In Proceedings of IEEE International Conference on Communication, ICC 2011,June 5–9, Kyoto, Japan, 2011.

69. Fabio Victora Hecht, Thomas Bocek, Richard G Clegg, Raul Landa, David Hausheer, andBurkhard Stiller. Liveshift: Mesh-pull live and time-shifted p2p video streaming. In Proceed-ing of IEEE LCN, Bonn, Germany 2011.

70. Patrick Maillé and Bruno Tuffin. Telecommunication Network Economics: From Theory toApplications. Cambridge University Press, Cambridge 2014.

71. Mohamed Faten Zhani, Qi Zhang, Gwendal Simon, and Raouf Boutaba. VDC Planner:Dynamic migration-aware virtual data center embedding for clouds. In 2013 IFIP/IEEE Inter-national Symposium on Integrated Network Management (IM 2013), May 27–31, Ghent,Belgium, 2013.

72. Rémi Houdaille and Stéphane Gouache. Shaping http adaptive streams for a better user expe-rience. In Proceedings of the Third ACM SIGMM Conference on Multimedia System, MMSYS2012, Febuary 22–24, Chapel Hill, NC, 2012.

“9780471697558c14” — 2015/3/20 — 12:16 — page 360 — #28


73. Cyril Concolato, Nassima Bouzakaria and Jlan. Le Feuvre. Overhead and performance of lowlatency live streaming using mpeg-dash. In The 5th International Conference on Information,Intelligence, Systems and Applications, 11SA 2014, July 7–9, Chania, Crete, 2014.

74. Laura Toni, Ramon Aparicio Pardo, Gwendal Simon, Pascal Frossard, and Alberto Blanc.Optimal set of video representations in adaptive streaming. In Multimedia Systems Conference2014, MMSYS’14, March 19–21, Singapore, 2014.

“9780471697558c15” — 2015/3/20 — 12:17 — page 361 — #1

15

BIG DATA ON CLOUDS (BDOC)Joseph Betser and Myron Hecht

The Aerospace Corporation, El Segundo, CA, USA

15.1 INTRODUCTION

Big data is the term for a collection of data sets so large and complex that it becomes dif-ficult to process using on-hand database management tools or traditional data processingapplications. The challenges include capture, curation, storage, search, sharing, trans-fer, analysis, and visualization [1]. This chapter focuses on big data on clouds (BDOC).In fact, an excellent overview of the state-of-the-art and research challenges for themanagement of cloud computing enterprises is presented in Ref. [2]. Indeed, the mainthesis of that paper is that heterogeneity and scale are the driving forces of many ofthe research challenges for the management of cloud computing systems. BDOC furtherexacerbate both the scale and heterogeneity of the resulting enterprises. It is the thesis ofthis chapter that hybrid management, involving disciplined and innovative site reliabilityEngineering (SRE), is the enabling operations paradigm by which to successfully tacklethese growing, emerging challenges. By hybrid management we mean a combination ofan increasing level of automated, autonomic management, and fully engaged dynamichuman SRE organizations. The SREs provide operations oversight, as well as developincreased automation and insight, in order to afford yet greater scale, heterogeneity,overall enterprise capability, and business performance.


361

“9780471697558c15” — 2015/3/20 — 12:17 — page 362 — #2

362 BIG DATA ON CLOUDS (BDOC)

The business appetite for big data continues to grow as cloud computing continuesto emerge as the dynamic vessel by which to supply the ever-growing demand for ubiq-uitous online and mobile services. Social networks, technical computing, ever-growingglobal communities, and heterogeneous enterprises are the key drivers for ever-growingcloud computing systems. The successful management of these challenging global net-works and computing resources is important to successful business performance and highquality-of-service delivery across the globe. This chapter articulates some of the successenablers for deploying BDOC, in the context of some historical perspectives and emerg-ing global services. We consider cloud and mobile applications, complex heterogeneousenterprises, and discuss big data availability for several commercial providers. In addi-tion, we offer some legal insights for successful deployment of BDOC. In particular, wehighlight the emergence of emerging hybrid BDOC management roles, the developmentand operations (DevOps), and SRE. Last, we highlight science, technology, engineering,and mathematics (STEM) talent cultivation and engagement, as an enabler to technicalsuccession and future success for global enterprises of BDOC.

15.2 HISTORICAL PERSPECTIVE AND STATE OF THE ART

This section covers the historical perspective and then discusses some existing solutionsto some technical challenges presented by BDOC.

Cloud computing evolved over time, as connectivity of computer networks steadilyincreased in the advent of the Internet. The Internet itself started as an Advanced ResearchProjects Agency (ARPA, currently known as DARPA) research project which sought toconnect computer systems, and thus achieve greater availability in 1969 [3] at Universityof California, Los Angeles (UCLA). The greatest invention that propelled the Internetfrom a research, e-mail, and ftp infrastructure to an everyday utility was the Mosaic [4]Web browser, which was created at the University of Illinois in 1993. This enabled aplethora of new applications based on the higher connectivity and easier access via theweb browser. In fact, Zhang et al. [2] argue very well that most of the technologies thatenable cloud computing are not new. It is the heterogeneity and scale of today’s growingenterprises that call for innovative research for successful management of BDOC.

15.2.1 From Application Service Provider to Cloud Computing

One of the initial services that emerged was the use of the Application Service Provider(ASP) business model. This model used the Web browser as the primary user inter-face, and the service provider would run the application on a server. The initial levelof sophistication of these client-server architectures was low to moderate, and some ofthese applications are reviewed next.

15.2.1.1 E-mail, Search, and E-Commerce. E-mail is a service that existedin research environments from the 1970s, but did not become a common household tech-nology until the 1990s. As simple as e-mail appears today, it took considerable efforts ofthe Internet Engineering Task Force [5], in order to achieve the necessary communication

“9780471697558c15” — 2015/3/20 — 12:17 — page 363 — #3

HISTORICAL PERSPECTIVE AND STATE OF THE ART 363

protocol standardization that enabled various platforms and operating systems to be ableto seamlessly interoperate using (Simple Mail Transfer Protocol (SMTP). The Internetgovernance model of rough consensus and running code [6] speaks volumes in terms ofachieving interoperability over the Internet, as well as over BDOC. Things have to workand work properly for the end user, or the user will go elsewhere by a click of the mouse.

Search: Once Internet browsers became available, one of the earlier services offeredwas Internet search. Some of the early companies (AltaVista, etc.) are no longer in busi-ness in this very competitive space. It is now dominated by Google, Bing, and Yahoo,which together own over 90% of the search market [7]. Battelle gives an excellent reviewof the evolution of the search market. Updated information can be found at Battelle’smedia blog [8].

Once search engines gained considerable capability, e-commerce was born. With theinstant ability to identify merchandise items of interest with a click of a mouse, brick andmortar stores became obsolete, and an increasing volume of business moved to the cyberdomain. In December 2013, both UPS and FedEx were overwhelmed with the volume ofpackages being shipped, and experienced significant delays during holiday deliveries!

Overall, these emerging trends, boosted by affordable increasing computationalpower, as well as by network bandwidth availability, brought about the concept of “Theworld is flat” [9].

15.2.1.2 Grid Computing, and Open Grid, Global Grid. Grid computingaddresses loosely coupled computers, typically owned by different research organiza-tions, which collaborate on various computing tasks. The management of such grids islooser, and grid computing middleware provides the interface for these tasks. Most ofthe computing performed by such grids is scientific and technical computing.

The Open/Global Grid Forum [10] is the global forum that provides for the inter-national collaboration among the researchers and scientists, in order to provide theinteroperable middleware that enables grid computing.

15.2.1.3 Openstack. Openstack [11] is the virtual organization that promotesthe open standardization of cloud technologies. Since BDOC requires all componentsto interoperate efficiently for high-performance computing, it is critically important tomaintain open interfaces, so that technologies developed by global collaborators couldbe well integrated and interoperate smoothly.

15.2.1.4 Apple iCloud, Yahoo, Google, Amazon, and DropBox. Thesecommercial cloud service providers (CSPs) provide commercial cloud services to theglobal consumer community. This is a fiercely competitor marketplace, and there areother entrants into this space with perhaps less name recognition, but novel capabilitiesand unique price performance. It is anticipated that considerable consolidation will con-tinue to take place going forward. It is important to note that the cloud providers who alsooffer content and other associated services are in a stronger position than pure storageproviders and/or computing providers. For example, Apple provides access to i-tunesand many apps, and Google provides Gmail, Maps, Docs, News, and dozens of otherpopular apps.

“9780471697558c15” — 2015/3/20 — 12:17 — page 364 — #4


15.2.2 State of the Art and Available Technical Solutions

This subsection presents some recent technical capabilities that are available to theBDOC enterprise management community. These contributions are described herewith,and some of them are referenced throughout the chapter. It should be noted that oneof the challenges of BDOC is the heterogeneous nature of the hardware, software, userdemands, and geographically distributed nature of both the cloud components and usercommunity. In fact, a very good overview of cloud computing is presented in Ref. [2].Some important areas identified for promising research include: Automated service pro-visioning, virtual machine (VM) migration, server consolidation, energy management,traffic management and analysis, software frameworks, storage technologies and datamanagement, and novel cloud architectures. The discussion is based on the classicalcloud architecture of physical layer, infrastructure-as-a-service (IaaS) layer, platform-as-a-service (PaaS) layer, and finally the software-as-a-service (SaaS) layer that runs theAPP that the end user interacts with.

In this chapter, we chose to focus on service availability, data security, business con-siderations, SRE, and STEM talent considerations. For completeness, we mention heresome recent research.

15.2.2.1 Performance Enhancement—Rhea: Automatic Filtering forUnstructured Cloud Storage. Microsoft Research developed performance enhance-ments in order to expedite filtering of BDOC storage [12]. This technique helps co-locatedata and processing whenever possible, thus enhancing performance. They have shownthat this technology can expedite searches and reduce cost by 2x–13x.

15.2.2.2 Dynamic Service Placement in Geographically DistributedClouds. This chapter provides performance enhancement by developing dynamic algo-rithms using game theory and control techniques in order to enhance performance [13].They clearly demonstrate that such global optimizations work far better than localoptimizations of the subsystems.

15.2.2.3 MemC3: Compact and Concurrent MemCache with DumberCaching and Smarter Hashing. This chapter authored by Carnegie Mellon andIntel develops caching and Cuckoo hashing schemes that enhance performance for read-mostly workloads [14]. This is an example of a specific strategy to enhance performanceunder a specific load pattern.

Additional references that are specific to the areas discussed in the forthcomingsections are embedded within those sections.

15.3 CLOUDS—SUPPLY AND DEMAND OF BIG DATA

The explosive growth in the prevalence of CSPs is driven by the plethora of online ser-vices, as well as by the growing communities that consume these services. This sectionreviews some of these trend-setting phenomena.

“9780471697558c15” — 2015/3/20 — 12:17 — page 365 — #5

EMERGING BUSINESS APPLICATIONS 365

15.3.1 Social Networks

Social networks started their explosive commercial growth in the early 2000s with com-panies such as Facebook, Twitter, Google+, Tmblr, and others. These social networksgrew in break neck speed, and Facebook in 2014 is offering service to over a billionusers worldwide. This kind of scale and heterogeneity are unprecedented, and in factconnect some 18% of the world population on many types of servers and edge devices.Other social networks are growing rapidly, and the infrastructure needed to support themis indeed BDOC based.

15.3.2 Communities

Online communities exist in many areas. Some of the communities are social networkbased, others are based on professional activities and interests, and still others are basedon hobbies, travel, and so on. Online communication is continually taking over thepapyrus based communication since 1993. The most up-to-date professional publica-tions are online publications. The same is true for many other types of information andexpertise for many communities of interest.

15.3.3 New Business Models

The online communities that continue to expand present a business audience to manyinnovative companies. This new audience is used for advertisement and marketing cam-paigns. The business models for these e-business campaigns is quite novel, and is rapidlytaking market share from traditional advertisement media such as TV, radio, and news-papers. The TV advertising market is $70B versus $50B of the online advertising Market[15]. Hence, we are quickly approaching the tipping point where online advertisementwill take over TV advertisement. This trend is similar to other e-commerce trends, in thatthe internet business engagement is quickly overtaking the brick-and-mortar traditionalcommerce. This will be discussed in further detail in the next section.

Additional details can also be found in Architecting the Enterprise via Big DataAnalytics [16].

15.4 EMERGING BUSINESS APPLICATIONS

Cloud computing and the Internet have completely revolutionized the business world.The instant access to people, information, computing, and network resources indeedmake our world “flat.” BDOC computing is the enabling resource that makes all thispossible. This section will examine a number of dimensions of these emerging businessmodels of BDOC and the successful management of these global resources.

15.4.1 Growing Global Enterprises

When one examines the global resources of some of the CSPs, it becomes clear that theseenterprises exhibit an unprecedented scale, size, heterogeneity, and scope of operations.

“9780471697558c15” — 2015/3/20 — 12:17 — page 366 — #6


Google has cloud data center facilities in places such as Finland, Oklahoma, Oregon,and South Carolina, to name only a few of the cloud hosting locations. Since these areenergy consuming behemoth, it makes sense to place them near energy sources, such ashydro-electric and geo-thermal locations. On the other hand, most of the talent manag-ing this vast cloud is located where talent is concentrated, that is, near universities andmajor metropolitan centers, where Google has technical engineering offices. This willbe discussed further within the “Site Reliability Engineers” section.

15.4.2 Technical Computing

Technical computing is not as big as business computing, but did spearhead the develop-ment of the BDOC technologies that now enable vast business enterprises. Technicalcomputing is mostly focused on scientific and technical tasks of high computationalcomplexity. Examples include high energy physics, oil field exploration and simula-tion, aerodynamic simulation, jet engine simulation, traffic simulation for transportationsystem, discrete event simulation for networks and communication switches, electro-magnetic field simulation, finite element structures analyses, finite difference fluid flowsimulations, and so on. Overall, even though the sophistication of these technical disci-plines is high, the scope of these activities is relatively small. They are done mostly byspecific organizations using super computers or clusters, and the level of cloud comput-ing utilization is not high. To their credit, many of these technical and research activitiesare crucial for the invention and development of novel technologies, including BDOCcomputing. Since the scope of business activities on BDOCS computing is considerablylarger, we focus our discussion on business applications.

15.4.3 Online Advertising

As indicated earlier, online advertising is the foundational business model and drivingforce of many of the innovative companies that experience very high growth. Companiessuch as Google, Facebook, Twitter, Tumbler, and others generate most of their revenuestream from targeted advertisement, which in turn generates sales for the advertisersand customers of the BDOC companies. The loss of any production application for anyof these companies result in immediate and substantial revenue loss, as well as serviceinterruption for the customers and users, which reduces satisfaction, and in extreme casescan cause users and customers to shift their interest and investment of both time andmoney to competitive online services.

15.4.3.1 E-Commerce. Some companies are dedicated to online commerce ore-commerce. Examples that come to mind are Amazon, e-Bay, Google, and others.In addition to these companies, which are exclusively online companies, many brickand mortar companies establish successful online presence. Such companies includeWalmart, Fry’s, Best Buy, and others. Related infrastructure companies include ship-ping companies like FedEx and UPS, and other companies involved in supply chainmanagement. The better the integration of the supply chain management companies and

“9780471697558c15” — 2015/3/20 — 12:17 — page 367 — #7

EMERGING BUSINESS APPLICATIONS 367

the e-commerce companies, the better the service, delivery speed, and ultimate customerexperience. Some fulfillment centers of the e-commerce companies are collocated withshipping hubs of the shipping companies in order to expedite service. In addition theBDOC enterprise systems of these collaborators enjoy considerable interoperability, suchthat customers can track packages from the e-commerce companies, and the companiesare better able to predict in at the time of sale the delivery times of their shippers.

15.4.4 Mobile Services

Mobile is big and growing bigger fast. In addition to the convenience of online accessavailability 24 hours a day, 7 day a week (24/7), mobile access can readily provide geo-location of the mobile device, hence the location of the end user. This information isextremely useful to the BDOC providers, as they can fine tune advertisement placementfor optimal sales and service capture. This dynamic capability makes the BDOC ASPmore nimble and at the same time more complex to design, develop, and operate. Onthe positive side, the SRE team supporting this BDOC ASP, can use mobile devices tosupport the smooth operation of the applications (APPs).

15.4.5 Site Reliability Engineers

SREs are the professionals who work together with the development team in making surethat the BDOC app is up and running at all times. It is a novel BDOC APP managementapproach to have such strong collaboration among the development teams and the SREteams. Google is one of the pioneers to take this new approach [17], but other applicationservice providers soon followed. This is driven by the notion that the huge scale of BDOCASP requires very high reliability.

It is critically important to develop software that will automate as much as possibleof the site reliability engineering capability. In a sense, the successful SRE strives to“automate the human out of manual tasks.” Achieving this enables the humans to focuson the creative aspects of system reliability engineering, by building intelligent tools thatwill fix the system automatically as much as possible, or issue an alert/trouble ticket to ahuman SRE. In those exceptional cases, the SRE works the issues. If necessary, the SREengages the development teams. In all cases, a post mortem is written, in order to traceall outages to root causes. Ultimately, it is the role of the SRE and the developers to fixthe root cause.

It is important to notice that the SRE operates at a high abstraction and semantic levelof the actual app or service. Unlike the traditional network operations center (NOC) thatdeals with links, packets, and nodes, the SRE is focused at the level of the BDOC APP.Attention might need to address lower level layers such as IaaS, PaaS, and SaaS, butthe end user or the customers do not care about anything else, as long as their transpar-ent service or APPs are up. Hence, that is the focus of the SRE team. The SRE teamsspecialize in specific APPs, and continually strive to improve their reliability and qual-ity of service. As the scale and heterogeneity of the BDOC enterprise grow, more andmore of the enterprise management services are automated. This allows the enterprise to

“9780471697558c15” — 2015/3/20 — 12:17 — page 368 — #8


continually grow is scope, heterogeneity, and capability. It is the role of the SRE teamand the DevOps team and the development team to work together on improving overallperformance for the APPs that they are responsible for. Overall, the concept of “Site”for the SRE does not mean a physical location or any of the layers above it. It meansthe actually web enabled service that supports the APP, whether mobile or wired. It isall about providing quality service to the customer, and a growing revenue stream to theservice provider.

With that in mind, the next section will examine BDOC ASP service availability,and the following section will examine legal aspects associated with BDOC ASP appsand services. We will then return to the role of the SRE, and offer strategies to grow theSRE team availability and capabilities.

15.5 CLOUD AND SERVICE AVAILABILITY

Public cloud big data platform offerings [18] provide compelling pricing, outsourcingof support resources, and no capital budgeting. Thus, they are likely to be the dominantplatform for big data computing. In this construct, the availability of a big data cloudresident application is dependent on (i) uptime under normal circumstances, which wecall operational availability, and (ii) disaster tolerance. The following subsections discusseach of these topics.

15.5.1 Operational Availability

The operational availability of the big data cloud application is dependent on

a. the likelihood that computing resources will be available upon demand,

b. the availability of communication networks for the transfer of data and results toand from the cloud application,

c. the probability of the platform and infrastructure resources of the CSP beingoperational throughout the data analysis operation,

d. The probability of successful operation of the big data application itself.

Operational availability requires that all of these conditions be met. This can be repre-sented mathematically as follows:

Aop = AaAbAcAd (15.1)

where Aop is the operational availability and the terms on the right-hand side correspondto the four points listed earlier. For the purposes of quantitative prediction, availabilitiesAa, Ab, and Ac are determined by the service-level agreements (SLAs) of the platformand Internet service providers. However, a method for computing Aa based on stochasticPetri nets (SPNs) was documented by Khazaei et al. [19]; a model for computing Ac wasdescribed by Longo et al. [20] Both models were developed from the perspective of theservice provider rather than the data owner.

“9780471697558c15” — 2015/3/20 — 12:17 — page 369 — #9

CLOUD AND SERVICE AVAILABILITY 369

One availability issue of big data implementations on cloud computing arises fromnetwork bottlenecks for which several solutions such as Camdoops [20] and FlowComb[21] have been proposed. A second can arise from high workloads imposed by a largenumber of users for which one reported effective solution is a large-scale implementa-tions of memcached at Facebook [22]. Another arises from the actual size of the storedata which, by virtue of its volume, adds to the likelihood of failure. An example of anapproach to address this problem is Scalus for HBase [23]. Other failure causes in cloudcomputing platforms are no different than for other computing systems: hardware fail-ures, software programming errors, data errors, network errors, system power failures,application protocol errors, procedural errors, and redundancy management. Architec-tures and system management practices for maintaining uptimes are well known anddocumented elsewhere [24].

Performing large-scale computation is difficult. To work with this volume of datarequires distributing parts of the problem to multiple machines to handle in parallelusing approaches such as MapReduce. A MapReduce program consists of four func-tions: map, reduce, combiner, and partition. The input data are split into chunks and,assuming with approximately single chunks stored per server. Usually, a chunk is nolarger than 64 Mbytes is used to increase parallelism and improve performance if tasksneed be rerun [25]. As the number of number of machines used in cooperation with oneanother increases, the probability of failures rises. Big data platforms will handle suchfailures in various mechanisms. The following description for Hadoop is illustrative [1].

The failure detection and recovery scheme of Hadoop is based on three entities:tasks, the tasktracker (which monitors tasks), and the jobtracker (which monitors jobs).When the jobtracker is notified of a task attempt that has failed (by the tasktracker’sheartbeat call or a runtime exception), it will reschedule execution of the task. The job-tracker will try to avoid rescheduling the task on a tasktracker where it has previouslyfailed. Furthermore, if a task fails four times (or more), it will not be retried further. Thisvalue is configurable: the maximum number of attempts to run a task is controlled by themapred.map.max.attempts property for map tasks and mapred.reduce.max.attempts forreduce tasks. By default, if any task fails four times (or whatever the maximum numberof attempts is configured to), the whole job fails.

Child tasks failures are detected either through the absence of heartbeats or runtimeexceptions. If a child task throws a runtime exception (either due to user code in the mapor a reduce task exception), the child JVM reports the error back to its parent tasktracker,before it exits. The error ultimately makes it into the user logs. The tasktracker marksthe task attempt as failed, freeing up a slot to run another task. In Streaming tasks, ifthe Streaming process exits with a nonzero exit code, it is marked as failed. Anotherfailure mode is the sudden exit of the child JVM—perhaps there is a JVM bug that causesthe JVM to exit for a particular set of circumstances exposed by the MapReduce usercode. In this case, the tasktracker notices that the process has exited and marks the attemptas failed.

Hanging tasks failures are detected by means of a failure of a progress update fora while and proceed to mark the task as failed. If a tasktracker has not received updatesafter an expiration period, the child JVM process is killed. The timeout period is normally10 minutes and can be configured on a per-job basis (or a cluster basis) by setting the

“9780471697558c15” — 2015/3/20 — 12:17 — page 370 — #10


mapred.task.timeout property to a value in milliseconds. Setting the timeout to a valueof zero disables the timeout. This measure should be avoided because the hanging slotwill not be freed; and over time, there may be cluster slowdown as a result.

The maximum percentage of tasks that are allowed to fail without triggering jobfailure can be set for the job. Map tasks and reduce tasks are controlled independently,using the mapred.max.map.failures.percent and mapred.max.reduce.failures.percentproperties.

A task attempt may also be killed, because it is a speculative duplicate or becausethe tasktracker it was running on failed, and the jobtracker marked all the task attemptsrunning on it as killed. Killed task attempts do not count against the number of attemptsto run the task (as set by mapred.map.max.attempts and mapred.reduce.max.attempts),since it wasn’t the tasks fault that an attempt was killed. Users may also kill jobs or failtask attempts using the Web UI or the command line.

If a tasktracker fails by crashing, or running very slowly, it will stop sendingheartbeats to the jobtracker (or send them very infrequently). The jobtracker detecta tasktracker failure through a timeout (default is 10 minutes, configured via themapred.tasktracker.expiry.interval property, in milliseconds) and remove it from its poolof tasktrackers to schedule tasks on. The jobtracker arranges for map tasks that were runand completed successfully on that tasktracker to be rerun if they belong to incompletejobs, since their intermediate output residing on the failed tasktracker’s local file systemmay not be accessible to the reduce task. Any tasks in progress are also rescheduled.A tasktracker can also be blacklisted by the jobtracker, even if the tasktracker has notfailed. A tasktracker is blacklisted if the number of tasks that have failed on it is signifi-cantly higher than the average task failure rate on the cluster. Blacklisted tasktrackers canbe restarted to remove them from the jobtrackers blacklist. Failure of the jobtracker is themost serious failure mode. Currently, Hadoop has no mechanism for dealing with failureof the jobtracker—it is a single point of failure—so in this case the job fails. However,this failure mode has a low chance of occurring, since the chance of a particular machinefailing is low.

15.5.2 Disaster Tolerance

Disaster tolerance, also referred to as business continuity, addresses measures to resumeoperations after damage from “force majeure” events such as fire, flood, atmosphericelectrical discharge, solar induced geomagnetic storm, wind, earthquake, tsunami, explo-sion, nuclear accident, volcanic activity, biological hazard, civil unrest, mudslide, andtectonic activity. These events are generally out of scope of the availability considera-tions described earlier—replication or restart will not be effective if the physical buildinghousing the cloud data center is flooded.

A necessary condition for disaster tolerance is a partial or complete replication of thedata and system resources in an alternate geographical location that is sufficiently distantthat it is unlikely to be affected by the event which damaged or destroyed the primarylocation. Disaster tolerance requires planning. Such planning includes procedures forestablishing organizational contacts for the purposes of decision-making, activating the

“9780471697558c15” — 2015/3/20 — 12:17 — page 371 — #11

CLOUD AND SERVICE AVAILABILITY 371

remote site (if it is not already activated), transferring the most recent data (if possible andif not already at the remote site), and changing IP addresses at the appropriate routers.

The value of business continuity depends on the impact of the loss of the analy-sis function to the enterprise. For example, disruption of continuously running big dataoperations which are business critical (e.g., fraud detection, information system log mon-itoring, click stream monitoring, or weather prediction) can have a high organizationalimpact and justify the expenditure of considerable expenditures.

The key performance metrics in business continuity are the time needed to recoverand resume and the amount of tolerable degradation in service once operations areresumed at the alternate location—the higher the value of either metric, the greater thecost. The business case for business continuity measures is that the cost of these mea-sures (both initial nonrecurring and recurring) are less than the expected value of thedamage or impact of the loss of the operations. This condition can be summarized by thefollowing equation:

∑j

[Cj(tr) + NPV [Rj(tr)]] ≤∑

i

piNPV [Di(tr)] (15.2)

where Cj is the capital (nonrecurring) cost of the jth continuity measure, tr is the resump-tion time associated with that capability, NPV is the net present value function; Rj isthe recurring cost of the jth continuity measure over the time period under considera-tion, pi is the probability of the ith disaster of business continuity loss event (cumulativefor the entire time period under consideration), and Di is economic value of the damageor impact to the enterprise associated with the ith disaster of business continuity lossevent—which is a function of tr, the resumption time

The assumptions of this equation are as follows:

1. Transition from the primary to alternate data center occurs with 100% success.

2. Downtime associated with this transition has insignificant cost (or in the alterna-tive, that it occurs in 0 time).

A more complete model that relaxes these assumptions has been created [26].The left-hand side of this equation (recurring and nonrecurring costs of business

continuity measures) includes not only the costs of the primary resources necessary forresumption but also dependencies such as processes, applications, business partners, andthird party service providers. In many cases, the left-hand side might simply be the cost ofcreating additional replicas in other geographically diverse data centers and establishinga periodic data update procedure.

The right-hand side (expected value of the loss) includes both the probability ofthe event and the economic impact of the loss. The probability of disruptive events isdependent on the geographic location of the CSP and the physical measures it under-takes to protect the facility. These probabilities can be reduced by avoiding locationssubject to high probability environmental risks, implementing strong security measures,and housing centers in structures most able to withstand flood, earthquake, winds, andother environmental forces. The economic value of the impacts resulting from planned

“9780471697558c15” — 2015/3/20 — 12:17 — page 372 — #12


or unplanned disruptions depend on the duration of the disruption and may also varyover time (e.g., credit card fraud detection during peak shopping seasons). If the alterna-tive site offers less than a full-service capability, the value of the loss of functionality indegraded must needs to also be considered.

Next we discuss a number of security issues that affect BDOC.

15.6 BDOC SECURITY ISSUES

The importance of data security is related to the consequences of loss of integrity, avail-ability, or confidentiality of the data. For example, Hadoop provides no security model,nor safeguards against maliciously inserted data; it cannot detect a man-in-the-middleattack between nodes [27]. If the data used in or result from the big data application aresensitive, cloud computing should be approached carefully with due consideration to thatsensitivity. The cloud used in the big data deployment might be entirely under the controlof the customer, utilize the platform of the cloud provider, or use a cloud implementa-tion of a service entirely. Thus, any of the three NIST models of cloud services: IaaS,PaaS, or SaaS) might apply. The implementation might reside on an organizations owncloud (internal cloud), a cloud provided by a third-party provider (external cloud), or acombined cloud (hybrid cloud).

If the cloud is entirely under organizational control (a private cloud), security con-cerns, assurance processes, and practices are defined by general IT security guidelinesand standards [28, 29] as well as domain specific standards [30–32]. However, publiccloud big data platform offerings [18] provide compelling pricing, outsourcing of supportresources, and no capital budgeting. Thus, they are likely to be the dominant platform forbig data computing. Thus, we will assume for the remainder of this section that there aretwo separate organizational entities: the customer, also known as the tenant, that owns thebig data and associated analytical applications and VM templates, and the CSP, whichprovides the hardware and software platform upon which the customer’s applicationsrun. A public Internet network cloud is used to move data to the CSP and return resultsto the big data owner.

The “outsourcing” of the computing platform to a multitenant cloud provider fromthe resource owned and controlled by a data owner represents a significant paradigm shiftfrom the conventional norms of an organizational data center to an infrastructure withoutan organizationally controlled security perimeter thereby more open to exploitation bypotential adversaries.

15.6.1 Threats and Vulnerabilities

The security challenges of big data on a multitenant CSP are formidable. General classesof vulnerabilities in the cloud computing platforms used in big data processing includethe following [33, 34].

• Session riding and hijacking: Web application technologies must overcome theproblem that, by design, the HTTP protocol is a stateless protocol, whereas Web

“9780471697558c15” — 2015/3/20 — 12:17 — page 373 — #13

BDOC SECURITY ISSUES 373

applications require some notion of session state. Many techniques implementsession handling and are vulnerable to session riding and session hijacking [29].

• Erosion of encryption algorithms: Encryption is currently relied upon as the pri-mary defense against data breaches [35]. However, technical advances as wellas faster processors are rendering an increasing number of cryptographic mecha-nisms less secure as novel methods of breaking them are discovered. In addition,flaws exist in cryptographic algorithm implementations, which can turn strongencryption into weak encryption (or sometimes no encryption at all). For exam-ple, cryptographic vulnerabilities might exist if the abstraction layer betweenthe hardware and OS kernel has flawed mechanisms for tapping that entropysource for random number generation, or having several VM environment son thesame host might exhaust the available entropy, leading to weak random numbergeneration [29].

• Limited system monitoring: CSPs offer limited system monitoring for the purposesof performance and availability monitoring, but do not provide the complete trafficn network monitoring, logging, and intrusion detection that is often used in internalinformation system installations [27].

• Configuration management and control: One of the most common vulnerabilitiesin IT systems is incomplete change and configuration management and resultantoutdated and incomplete system documentation and organizational policies forconfiguration management and control (including system documentation) maybe in place for the internal IT system, but they may be difficult or impossibleto enforce on the external CSP.

• Inability to sanitize storage media: Policies for disk reformatting, degaussing, oreven destruction that might be in place to prevent malware propagation or to mit-igate data spillage cannot be readily applied or enforced in public clouds, wherethe hardware is under the control of the service provider [27, 29].

Cloud-based big data installations have all the weaknesses of standard IT installations.In addition, there are unique weaknesses including the following:

• Breaching tenant boundaries: an attacker might successfully escape from theboundaries on which public clouds rely to separate tenants including VMs [36],big data services, data base management systems, and communication infrastruc-tures [37]. In 1 year, VMware had released 14 security advisories [38]. IBM foundthat that more than 50% of the 80 VM vulnerabilities it found in 2010 couldcompromise the administrative VM or lead to hypervisor escapes [39].

• Vulnerability to disaffected insiders: Disaffected insiders are not unique to cloudcomputing; they are a threat to any organization. However, the damage they cancause is can do is—even if they are unable to defeat the account control and accessprivileges of the infrastructure itself. For example, In February 2011, a terminatedIT administrator at a pharmaceutical company used a service account to createan unauthorized installation of VMware vSphere to delete 88 virtual servers [40].While this incident involved a private cloud, it could also affect a public cloud

“9780471697558c15” — 2015/3/20 — 12:17 — page 374 — #14


and could result in the loss of massive amounts of data or the deletion of machineimages that contain specific configuration, encryption key, or other informationthat could be difficult to replace.

• User authentication defects: Many widely used authentication mechanisms areweak. For example, usernames and passwords can be compromised by insecureuser behavior (weak passwords, reused passwords) or the inherent limitations ofone-factor authentication—even encryption is used for the remote login process.Use of multifactor authentication and role-based access by means of LDAP orActive Directory can be complicated by the need to maintain multiple account filesin different servers because the organizational private directory servers cannot beintegrated into the public cloud infrastructure.

• Configuration stability of VM environments: Cloud elasticity and metering as wellas live migration help organizations harness the power of virtualization and makethe processing environment extremely dynamic.

• Propagation of flawed or vulnerable VM templates: Vulnerable VM templateimages cause OS or application vulnerabilities to spread over many systems. Anattacker might be able to analyze configuration, patch level, and code in detailusing administrative rights by renting a virtual server as a service customer,and thereby gaining knowledge helpful in attacking other customers images.Other attacks can use side channels from a co-resident VM [18, 41]. The use ofothers specialized big data implementations because of purported superior prop-erties might be taken from an untrustworthy source and have been manipulatedso as to provide back-door access for an attacker. Data leakage by VM repli-cation is a vulnerability that’s also rooted in the use of cloning for providingon-demand service. Cloning leads to data leakage problems regarding machinesecrets: certain elements of an OS—such as host keys and cryptographic saltvalues—are meant to be private to a single host. Cloning can violate this privacyassumption [29].

• Presence of large amounts of unencrypted data: Big data that needs to be decryptedfor processing by a framework such as Hadoop is exposed as the analysis is beingdone, and then the results are transferred back in to some traditional data ware-house or business intelligence framework. A capable attacker would probably notmove or destroy thousands of terabytes of data to avoid detection. However, suchdata would be attractive to an attacker might be looking for patterns that match acredit card number or a Social Security number regardless of size [42].

15.6.2 Mitigation Approaches

Mitigating the risks and addressing the threats defined above involves various approachesand tasks including the following:

• security planning• ensuring network security and interoperability

“9780471697558c15” — 2015/3/20 — 12:17 — page 375 — #15


• addressing the unique security strengths and weaknesses of VM infrastructures incloud computing

• ensuring tenant separation• identity and access management

These measures are discussed in the following subsections.

15.6.2.1 Security Planning and Risk Assessment for Big Data Process-ing on Cloud Computing Platforms. Planning is necessary to address these andother threats and to maximize the security of the computing environment. Risk assess-ment is necessary to weigh the cost of implementing this security versus the sensitivityof the data. Factors influencing planning include organizational policies (including com-mitments to conformance to specific security standards), contractual commitments onconfidentiality and non-disclosure, and legal requirements with respect to privacy andsecurity. These should be documented in an Information Security Management Plan(or an equivalent document). Among the aspects of the plan that affect use of big dataplatforms on clouds are the following [43]:

◦ Risk management

◦ Security policy

◦ Organization of information security and incident response

◦ Information asset management

◦ Communications and operations management

◦ Access control

◦ Information systems acquisition, development, and maintenance.

The products of such planning would include specific requirements and conformancecriteria, contractual requirements on the service providers, design and configurationmeasures to be undertaken by the cloud users, and processes and procedures.

15.6.2.2 Ensuring Networking Communications Security and Interop-erability. The CSP and the data owner share responsibility for security of big datawhile it is being transferred from the data owner to the cloud. The CSP also has addi-tional responsibilities to monitor and safeguard its network perimeter and to prevent theintroduction of rogue devices into its facility. The following are relevant practices andprocedures:

1. Standardized network protocols: The CSP should provide secure (e.g., noncleartext and authenticated) standardized network protocols for the import and exportof data and to manage the service. Documentation for the data owners describingthe protocols should be sufficient to enable the data owner organization to be ableto create and configure network links of adequate circuitry.

“9780471697558c15” — 2015/3/20 — 12:17 — page 376 — #16


2. Network and system monitoring: The CSP network environments and virtualinstances should be designed and configured to control (and restrict if necessary)network traffic, reviewed at planned intervals, supported by documented businessjustification for use of all services, protocols, and ports allowed, including ratio-nale or compensating controls implemented for those protocols considered to beinsecure.

3. Response to attacks: The cloud service provide should have the capability todetect attacks (e.g., deep packet analysis, anomalous ingress or egress traffic pat-terns) and defend the perimeter (e.g., traffic throttling, and packet black-holing)for detection and timely response to network-based attacks MAC (e.g., spoofingand ARP poisoning, and distributed denial-of-service or DDoS attacks)

4. Documentation: Network architecture diagrams must clearly identify high-riskenvironments and data flows that may have legal, statutory, and regulatorycompliance impacts.

5. Configuration of boundary devices: Policies and procedures shall be estab-lished, and supporting business processes and technical measures implemented,to protect environments, including the following:

◦ Perimeter firewalls implemented and configured to restrict unauthorizedtraffic,

◦ Security settings enabled with strong encryption for authentication and trans-mission, replacing vendor default settings (e.g., encryption keys, passwords,and SNMP community strings),

◦ User access to network devices restricted to authorized personnel,

◦ The capability to detect the presence of unauthorized (rogue) network devicesfor a timely disconnect from the network.

15.6.2.3 Addressing VM Security. The VM infrastructure within a cloudinstallation consists of are executable software and as such, provide an additional attacksurface. There are additional infrastructure and management layers to protect as well asthe hypervisor itself. Virtual systems aren’t unique and are just as vulnerable as any othersystem running code. If it runs code, someone can compromise it. The NIST’s Guideto Security for Full Virtualization Technologies provides vendor-agnostic guidance onsecuring virtual environments [44]. The CSP is responsible for configuration, monitor-ing, and control of the virtual infrastructure. However, depending on the sensitivity ofthe data, the responsibility for ensuring that the safeguards are appropriate to the risk isthe responsibility of the data owner. The following are specific requirements generatedby the Cloud Service Alliance [34].

1. The CSP shall inform data owner (tenant) of policies, procedures, support-ing business processes and technical measures implemented, for timely detec-tion of vulnerabilities within organizationally-owned or managed (physical

“9780471697558c15” — 2015/3/20 — 12:17 — page 377 — #17


and virtual) applications and infrastructure network and system components,applying a risk-based model for prioritizing remediation through change-controlled, vender-supplied patches, configuration changes, or secure softwaredevelopment for the organization’s own software.

2. The provider should use well-known virtualization platforms and standard virtu-alization formats (e.g., OVF) to help ensure interoperability. Customized changesmade to any hypervisor should be available for data owner (tenant) review.

3. The provider should ensure the integrity of all virtual machine images. Anychanges made to virtual machine images must be logged and an alert rose regard-less of their running state (e.g., dormant, off, or running). The results of a changeor move of an image and the subsequent validation of the image’s integrity shouldbe reported immediately to data owners.

4. Each operating system should be hardened to provide only necessary ports,protocols, and services to meet business needs.

5. Virtual machines should include antivirus, file integrity monitoring, and loggingas part of their baseline operating build standard or template.

15.6.2.4 Ensuring Tenant Separation. Segregation of tenants in data centersowned or managed by CSP is a concern affecting (physical and virtual) applications,and infrastructure system and network components. These should be designed, devel-oped deployed and configured such that provider and data owner (tenant) user access isappropriately segmented from other tenant users, based on the following considerations:

• Established policies and procedures• Isolation of business critical assets and/or sensitive user data and sessions that

mandate stronger internal controls and high levels of assurance• Compliance with legal, statutory and regulatory compliance obligations

15.6.2.5 Identity and Access Management. Identity and access manage-ment is a joint concern between the CSP and the big data user. The data owner isresponsible for the identity and access management of its cloud users (or attackerswho have misappropriated its credentials) whereas the CSP is responsible for regulat-ing access of its own staff as well as other tenants to system infrastructure, monitoring,and configuration resources as well as to log data that could be misused by an attacker.

Policies, procedures, supporting business processes, and technical measures mustbe established and documented for ensuring appropriate identity, entitlement, and accessmanagement for (i) data owner (tenant) users to their data and applications and (ii) serviceprovider staff to its owned or managed (physical and virtual) application interfaces andinfrastructure network and systems components. These policies, procedures, processes,and measures should address the following [34]:

• Roles and responsibilities for provisioning and de-provisioning user accountentitlements following the rule of least privilege based on job function;

“9780471697558c15” — 2015/3/20 — 12:17 — page 378 — #18


• Criteria for higher levels of assurance and multifactor authentication secrets (e.g.,management interfaces, key generation, remote access, segregation of duties,emergency access, large-scale provisioning or geographically distributed deploy-ments, and personnel redundancy for critical systems);

• Access segmentation to sessions and data in multitenant architectures by any thirdparty (e.g., provider and/or other customer (tenant));

• Identity trust verification and service-to-service application (API) and informationprocessing interoperability (e.g., single sign on (SSO) and federation);

• Account credential lifecycle management from instantiation through revocation;• Account credential and/or identity store minimization or re-use when feasible;• Authentication, authorization, and accounting (AAA) rules for access to data

and sessions (e.g., encryption and strong/multifactor, time-limited, nonsharedauthentication secrets);

• Permissions and supporting capabilities for customer (tenant) controls over AAArules for access to data and sessions;

• Adherence to applicable legal, statutory, or regulatory compliance requirements;• Access to, and use of, audit tools that interact with the organization’s information

systems shall be appropriately segmented and restricted to prevent compromiseand misuse of log data;

• User access to diagnostic and configuration ports shall be restricted to authorizedindividuals and applications;

• Management of identity information about every person who accesses IT infras-tructure and to determine their level of access;

• Control access to network resources based on user identity;• Control of user access based on defined segregation of duties to address business

risks associated with a user-role conflict of interest.

15.6.2.6 Data Security. Defenses such as data encryption and access controlare essential because (i) systems that collect sensitive data such as consumer informa-tion are attractive targets; (ii) they may be required contractual terms and laws; and(iii) release of such data may significantly damage the organization. Policies and pro-cedures business processes and technical measures should be defined and implementedfor the following:

• Use of encryption protocols for protection of sensitive data in storage and in trans-mission. These measures have to balance the need for security against the overheadof decryption and re-encryption as the data are processed in frameworks such asHadoop.

• Key management and usage. Keys should not be stored at the CSP, but maintainedby the cloud consumer or trusted key management provider. Key managementand key usage should be separated, but the extent of this separation requires thebalance between throughput and security.

“9780471697558c15” — 2015/3/20 — 12:17 — page 379 — #19

BDOC LEGAL ISSUES 379

• Access control, input, and output integrity routines (i.e., reconciliation and editchecks) to detect manual or systematic processing errors, corruption of data, ormisuse.

• Labeling, handling, and the security of data and objects which contain data. Mech-anisms for label inheritance shall be implemented for objects that act as aggregatecontainers for data.

• Data exchanged between one or more system interfaces—particularly whenaffected by legal, statutory and regulatory compliance obligations

15.6.3 Incident Response

Incident response in the event of a breach of a big data application and store on a remotecloud provider’s platform is more complicated than a breach of a single organizationbecause of organizational boundaries (and resultant non-aligned interests, contractualissues, and segregated system management structures. These must be planned for inadvance and documented in an incident response plan. Such plans vary by industry and bycircumstance. A general example is available from the American Bar Association [45].

Issues that must be addressed are as follows:

• Points of contact in both the CSP and the data owner for applicable regulationauthorities, national and local law enforcement, and other legal jurisdictionalauthorities for compliance issues and to be prepared for a forensic investigationrequiring rapid engagement with law enforcement.

• Policies, procedures, business processes and technical measures to triage security-related events and ensure timely and thorough incident management. Coordinationbetween the data owner and the CSP to agree on the priority and severity ofbreaches is necessary—particularly if the breach affects multiple tenants at theCSP site.

• Coordination on follow-up actions and investigations after an information securityincident requires legal action, including preservation of evidence, documentingchains of custody, and other actions needed to support potential legal action subjectto the relevant jurisdiction

The next section will cover legal aspects of BDOC.

15.7 BDOC LEGAL ISSUES

The information in this section is for informational purposes only and not for the purposeof providing legal advice. You should contact your attorney to obtain advice with respectto any particular issue of problem.

The two central legal issues are (1) management of the risks and liabilities of thedata owner for the data it relegates to the CSP which relate to privacy and governance

“9780471697558c15” — 2015/3/20 — 12:17 — page 380 — #20


and (2) the contractual terms and provisions (including the SLA) between the serviceprovider and the data owner. The following subsections discuss these issues.

15.7.1 Privacy and Governance

The privacy of data is a large concern and one that increases in the context of big data.Perhaps the most obvious example is location data [46], but others include health, brows-ing, or purchasing activity [47]. Big data privacy and governance concerns fall primarilyon the data owner but affect the relationship with the CSP. As such, there are both inter-nal organizational issues and supplier contractual issues that need to be considered. Thissection focuses on the legal responsibilities and potential liabilities of the data owner.

The data owner should evaluate its data and assess its risk of massive amounts ofdata in order to determine its approach to moving data onto cloud platforms. The generalapproach to this task is to

• Enumerate its legal, statutory, and regulatory compliance obligations associatedwith (and mapped to) sites where its data are stored. Examples of such data includeconsumer data that would be affected by state privacy breach laws, health data thatwould be affected by Federal HIPAA privacy regulations, or credit card data whosestorage and security requirements are affected by the Payment Card Industry DataSecurity Specification (PCI DSS). Some of these obligations will affect whetherdata organization can allow its data to be processed externally, others may affectits contractual requirements with the CSP.

• Classify its data and objects containing data shall be based on data type, juris-diction of origin, jurisdiction domiciled, context, legal constraints, contractualconstraints, value, sensitivity, criticality to the organization, third-party obligationfor retention, and prevention of unauthorized disclosure or misuse.

The data owner establish policies and procedures together with records of performanceand compliance to assert a defense against claims of negligence and non-conformanceby governmental agencies, contracting parties, or individuals affected by data breaches.

With this insight, the data owner can establish information security requirements forthe prospective service providers. Compliance with security baseline requirements mustbe reassessed at least annually unless an alternate frequency has been established andestablished and authorized based on business need higher levels of assurance are requiredfor protection, retention, and lifecycle management of audit logs and other evidence ofstatutory or regulatory compliance obligations.

15.7.2 Considerations for Contracting with a CSP

The most common form of a contracting with a CSP is an agreement with a standardset of terms and conditions that is nonnegotiable. These terms and conditions in generalpromise a best effort level of service and security but offer no guarantees (other than ade minimus refund or discount) and disclaim responsibility from legal consequences and

“9780471697558c15” — 2015/3/20 — 12:17 — page 381 — #21


damages from both outages and data breaches. The benefit of these “take it or leave it”arrangements is low cost of use; and if the legal risks and liabilities of outages or databreaches are low, the terms are quite appropriate. However, for applications where theserisks are significant, a negotiated contract with terms and conditions tailored to the needsof the data owner are appropriate. This section discusses the consideration that goes intoa discussion of what the considerations are from the perspective of the data owner.

15.7.2.1 General Contractual Terms. Contractual agreements between pro-viders and customers (tenants) should specify at least the following mutually agreed-upon provisions and terms:

• Parties: contract must specify the following:

° Service providers (there may be more than one, depending on if the analyticalservice provider is distinct from the infrastructure platform service provider),

° The service consumer—in this case, the data owner,

° Third parties—the task of these third parties may vary from measuring serviceparameters to taking actions on violations as delegated by either the serviceprovider or service consumer.

• Scope of business relationship and services offered (e.g., customer (tenant) dataacquisition, exchange and usage, feature sets and functionality, personnel andinfrastructure network and systems components for service delivery and support,roles and responsibilities of provider and customer (tenant) and any subcontractedor outsourced business relationships, physical geographical location of hostedservices, and any known regulatory compliance considerations).

• Expiration of the business relationship and disposition of the data owner (tenant)data at the end of the agreement.

• Customer (tenant) service-to-service application (API) and data interoperabil-ity and portability requirements for application development and informationexchange, usage, and integrity persistence.

• Policies, procedures, and terms for service-to-service application (API) and infor-mation processing interoperability, and portability for application developmentand information exchange, usage, and integrity persistence.

• Configuration management and control of IT infrastructure network and sys-tems components including change management policies and procedures prior todeployment, provisioning, or use and authorization prior to relocation or transferof hardware, software, or data.

• File formats for structured and unstructured data between the data owner and thecloud service provider.

15.7.2.2 Service-Level Agreements. SLAs nearly always specify levels ofavailability (with availability frequently being defined in terms of response times); insome cases they may specify security services, and liabilities for security breaches. Theseissues are discussed in Section 15.7.2.4.

“9780471697558c15” — 2015/3/20 — 12:17 — page 382 — #22


SLAs define technical and legal responsibilities of the data owner and the serviceprovider. These responsibilities depend on the cloud service model (IaaS, PaaS, or SaaS)and the cloud deployment model (private, hybrid, or public). If the big data platform is aprivate cloud, then SLA enforcement is an internal organizational matter. If the deploy-ment model is hybrid or public, then the SLA assumes legal significance and is subjectto the interpretation and enforcement of the judiciary. The less responsibility and controlthe data owner has, the more it relies on the provider and the more critical the terms ofthe SLA between those two parties becomes. The greater the complexity of the inter-action between systems owned by the third party provider and those owned by the dataowner, the greater the need to explicitly assign responsibilities and specify liabilities forbreaches.

In the case of SaaS, the SLA must address service levels, security, governance,compliance, and liability, expectations of the service and provider are contractually stip-ulated; managed to; and enforced by the SLA. In the case of PaaS or IaaS, it is theresponsibility of the consumer’s system administrators to effectively manage the same,with some offset expected by the provider for securing the underlying platform andinfrastructure components to ensure basic service availability and security. It should beclear in either case that one can assign/transfer responsibility by the SLA. To a limitedextent, accountability, can also be transferred to the service provider (e.g., by means ofindemnification clauses)—if the service provider is willing to accept. However, govern-mental laws particularly with respect to breaches of privacy may make it impossible forthe data owner to completely transfer liability to the service provider.

At a minimum, the following items, taken Web SLA (WSLA) framework [48],should be defined in an SLA:

1. SLA parameters: SLA parameters metrics define how service parameters can bemeasured. These include the following:

◦ Resource metrics are retrieved directly from the provider resources. In the caseof the big data cloud, these could include, data bandwidth, transaction count,uptime, RAM, infrastructure resources (processing capacity, platform middle-ware, storage capacity) resources, middleware resources (including Lucene,Solr, Hadoop, and HBase), and other required resources.

◦ Composite metrics represents a combination of several resource metrics, cal-culated according to a specific algorithm. For example, transactions per hourcombines the raw resource metrics of transaction count and uptime. Compos-ite metrics may be necessary to characterize higher level big data metrics suchas velocity, volume, and variety.

◦ Business metrics that relate SLA parameters to financial terms specific to aservice customer. These include the cost of the services.

2. Service-level objectives (SLOs): WSLA, these are a set of formal expressions inthe form of an if-then structure. The antecedent (if) contains conditions and thenine consequent (then) contains actions. An action represents what a party hasagreed to perform when the conditions are met. In the context of a plain language

“9780471697558c15” — 2015/3/20 — 12:17 — page 383 — #23


legal document, these conditions would include the cost of service (when metricsare met) and the consequences of not meeting those metrics.

3. Data recording and analysis requirements to establish conformance with avail-ability and performance parameters

4. Terms specifying the penalties or monetary damages for outages. In standardSLAs from service providers such as Amazon, the maximum liability for out-ages is refunds or additional time at low cost, and liability for damages due to theconsequences of security breaches and cyber attacks are specifically excluded.

15.7.2.3 Disaster Tolerance and Business Continuity. General considera-tions disaster tolerance were identified in Section 15.5.2 including the basis for decidingon the locations and extent of the standby resources. An SLA for the standby resourcesshould be established. In addition, the requirements for standby datacenter security,readiness, and monitoring should be established. The requirements placed on the CSPshould be consistent with the business continuity/disaster tolerance plan. The termsshould cover utilities services and environmental conditions (e.g., water, power, tem-perature and humidity controls, telecommunications, and internet connectivity). Termsshould be described the extent and how they are secured, monitored, maintained, andtested (e.g., inspection intervals, and power backups at the remote site, and failovertesting.)

15.7.2.4 Information Security Requirements. The level of security and pri-vacy controls and supplying the evidence of their implementation and effectiveness isusually established by the terms and conditions of the contract or SLA with the CSP[49]. The terms and condition must be consistent with the information security manage-ment plan (see Section 15.6.2.1) and require coordination between both the contractornegotiators and the data owner’s information security experts.

The legal consequences of data breaches resulting in the release of data are the sameas for conventionally structured and stored data. Liability, notification requirements, andpenalties for releases of data items that constitute Personally Identifiable Information(PII) and health related information are often governed by national, provincial or statestatutes [50, 51]. Additional consequences may result from industry agreements such asthe Payment Card Industry Data Security Standard (PCI-DSS) [52] as well as specificnon-disclosure agreements between the data set owner and third parties

The specific information security measures should be consistent with the risk ofloss of integrity, confidentiality, or availability of the data at the remote site. Too fewprovisions would shift the liability to the data owner from the service provider; too muchmight make the cloud implementation cost prohibitive.

The following terms and provisions from the Cloud Service Alliance should beconsidered for the inclusion in the information security clauses of negotiated contractsbetween service providers and data owners:

• Provider and data owner (tenant) primary points of contact for the duration of thebusiness relationship;

“9780471697558c15” — 2015/3/20 — 12:17 — page 384 — #24


• References to detailed supporting and relevant business processes and technicalmeasures implemented to enable effectively governance, risk management, assur-ance and legal, statutory and regulatory compliance obligations by all impactedbusiness relationships;

• Responsibility for disposition of the data upon termination of the business rela-tionship and treatment of customer (tenant) data impacted;

• Notification and/or pre-authorization of any system configuration or proceduralchanges in CSP resources;

• Timely notification of a security incident (or confirmed breach) to all customers(tenants) and other business relationships impacted (i.e., up- and down-streamimpacted supply chain);

• Assessment and independent verification of compliance with agreement provi-sions and/or terms (e.g., industry-acceptable certification, attestation audit report,or equivalent forms of assurance), without posing an unacceptable business riskof exposure to the organization being assessed;

• Review of the risk management and governance processes of their partners toensure that practices are consistent and aligned to account for risks inherited fromother members of that partner’s cloud supply chain;

• Oversight of third-party service provider information security programs, servicedefinitions, and delivery-level agreements included in third-party contracts.

• Efforts in support of follow-up actions concerning a person or organization afteran information security incident. These may include forensic procedures for gath-ering evidence suitable for admission to legal proceedings, including chain ofcustody and preservation. Upon notification, customers (tenants) and/or otherexternal business relationships impacted by a security breach shall be given theopportunity to participate as is legally permissible in the forensic investigation.

15.8 ENABLING FUTURE SUCCESS—STEM CULTIVATIONAND OUTREACH

This section presents the STEM management strategy that enables the proper executionof the BDOC attributes discussed in this chapter. Combining business needs, availabilityand reliability, heterogeneous issues, legal issues, security issues, and scaling issues. Itis the SRE organization, which is charged with the management of vast and complexBDOC enterprises.

Clearly, the successful execution of BDOC ASP depends on very capable and robusttechnical professionals for both the development, as well as for the ongoing SRE func-tion, and for the technical understanding of the legal aspects associated with SLAs andall other aspects of BDOC enterprise.

This section addresses the supply of talent in the areas of STEM. These skillsare absolutely critical to the successful execution of BDOC management activities.This applies to both the development as well as to the operations of such BDOC ASPenterprises.

“9780471697558c15” — 2015/3/20 — 12:17 — page 385 — #25

OPEN CHALLENGES AND FUTURE DIRECTIONS 385

15.8.1 Criticality of Creating and Growing a STEM Pipelineand Engagement Ideas

There has been a continuing shortage in the supply of STEM talent in the United Statesover recent years. In 2013 the computer science (CS) field had some 50,000 undergradu-ate degrees awarded annually, against the growing demand of over 100,000 CS jobs in theUnited States alone. While this is wonderful for those graduates who obtain CS degrees,it exacerbates the challenges that the BDOC ASP industry is facing. In 2014 LinkedInshowed hundreds of job openings in the SRE category alone. This includes openings atFacebook, VMWare, Google, SalesForce, A9, Microsoft, Tmblr, Akamai, BestBuy andmany others.

15.8.2 Computer Science, Networking, and ComputerEngineering

SREs or DevOps are the fastest growing job type at this time [53]. This is astounding,given that this kind of job title did not even exist in 2009. Hence, the pace of change inthe landscape teaches us that the most important part of STEM education, is to teach ourstudents to become life-long learners, so that they can continually build on the technicalfoundations that we educate them.

Even though the United States has been battling declining in STEM interest at theK-12 levels, it appears that the job prospects for positions such as SRE and DevOps arepushing an increase in both enrollment and graduation rates in STEM and in particu-lar in CS. Some of this work is done by industry sponsored capstone projects as hasbeen very successful at Harvey Mudd College since the 1960s for Engineering, and fromthe 1990s in Computer Science [54]. Similar capstone programs are being created byaccreditation requirements promulgated by the Accreditation Board for Engineering andTechnology (ABET) 2020 study [55] and its member Computer Science AccreditationBoard (CSAB), founded in 1985.

The following section will address some of the open-research areas. A major chal-lenge entails the management of SRE talent growth, and the associated STEM education,which drives the SRE talent supply.

15.9 OPEN CHALLENGES AND FUTURE DIRECTIONS

This section discusses some of the open challenges and new directions of BDOC whichinclude workforce issues, privacy and security, resource utilization, and integration withthe Internet of things.

15.9.1 SRE and DevOps Professions

Going forward, we face some exciting challenges, which will afford the community anopportunity to acquire a deeper technical understanding in a number of areas. Further-more, the community stands to benefit from developing new approaches and enabling

“9780471697558c15” — 2015/3/20 — 12:17 — page 386 — #26


new technologies and solutions in order to address these emerging challenges. Onegrowth area for which there has been only limited work to date, is the emerging pro-fession of DevOps and SRE professionals, in the context of the entire BDOC enterprisethat is being managed.

This topic is not always viewed as a “hard” technical topic that technologists study,but rather a “soft” personnel topic. However, organizational management experts cor-rectly point out that “the soft stuff is the hard stuff”. Solving these challenges, within thecontext of the United States, will not be easy. In fact, it is a daunting challenge, in that thesocietal views of technologists is not high, and “geeks” are not considered “cool” by theirpeers during the most critical and formidable years of K-12 education. The late astronautSally Ride served on a research team that determined that “We lose the boys away fromSTEM in 8th grade, and we lose the girls in 5th grade” [56]. This hostile environmentto technology during the early educational experience of the K-12 of children educatedin the United States creates a cascading effect that results in an insufficient supply ofSTEM-capable students at the college level. Even if college students realize that becom-ing an engineer or computer scientist affords great job opportunities, they simply lackthe STEM background in order to compete within the college level STEM curriculum.

While this is the most critical challenge that is the most crucial to address, there areadditional challenges is building the SRE and DevOps workforce, that can be addressed,provided that the STEM talent supply issue is improved. Running a strong SRE orga-nization requires considerable management talent, as well as a strong mentoring andtraining culture. This culture is a strong team culture, in which seasoned SREs teachincoming SREs the tradecraft. These positions include both operational responsibilities,as well as development activities. The development activities look to address the scale ofBDOC enterprises. The only way to do that effectively is to develop the methodologiesand automation technologies that will “automate the human manager out of the man-ual operational activities.” Since the SRE position is rather new, it has not been studiedadequately by the research community. This delicate balance needs to be studied further.Trade-offs between the benefits of automation and the cost of developing automated alertand troubleshooting systems must be better understood. The organizational challengesof creating and leading strong SRE organizations must be quantified and further studied.The focus of SRE teams on specific applications represents a semantically higher levelof responsibility, in contrast with traditional management of networks and other lowerstack layer semimanual operations.

The complexity associated with the technical challenges associated with availabil-ity, reliability, legal issues, and the emerging business models demands nothing lessthan a very adaptable SRE workforce, which is schooled in these complex areas. Theoperational responsibility of the SRE team encompasses multiple emerging disciplinesand applications. It is anticipated that these challenges will continue to present multipleresearch challenges, which will be mutually beneficial to all stakeholders.

While the highest leverage would be achieved by major contributions to the STEMsupply, and by achieving SRE organizational enhancements, the following subsectionsand the “Conclusions” section offer more traditional research areas that would contributeto BDOC management solutions going forward.

“9780471697558c15” — 2015/3/20 — 12:17 — page 387 — #27

OPEN CHALLENGES AND FUTURE DIRECTIONS 387

15.9.2 Information Assurance (Confidentiality, Integrity,and Availability)

This is a fertile area for future research. When many users are involved, the consequencesof data breaches (i.e., the 2013 Target Data Breach) are substantial. Disaster response andtransaction verification offer considerable research opportunities. For intelligent highwayoutages, the loss of integrity of the data from the sensors feeding the cloud-resident appli-cations and from the cloud resident applications to the users can be particularly serious.In many cases, a portion of the data path from the sensors to the cloud will include wire-less links. New methods for rapid authentication for a large number of users (possiblyhundreds of thousands simultaneously), ensuring data integrity (including overcoming“man-in-the-middle” attacks) over wireless communications for high volume traffic arealso necessary.

15.9.3 More Efficient Use of Cloud Resources

While not all cloud–resident databases may reach the level of exabytes or petabytes in theimmediate future, the trend is certainly in that direction. Research is necessary in orderto more efficiently utilize storage and develop algorithms to reduce the costs of datacollection, integration and transformation, data analyses (searches and queries), storage,and disposal. Specific design issues include the following:

• Tailoring DBMSs for cloud computing including the tradeoffs between atomicity,consistency, isolation, and durability (ACID) in traditional SQL databases and lessrigorous basically available soft-state and eventually consistent (BASE) models.

• Data access: whether to have data repositories located closer the user than theprovider.

• Consistency of replicated data: If data are moved closer to the users, then repli-cation will be necessary. Along with the replication comes issues of currency,consistency, and completeness.

• Deployment of BDOC installations including the use of staging for data collectionprocessing.

15.9.4 Big Data and the Internet of Things

Information gathering, processing, and computing of massive amounts of data generatedfrom and delivered to highly distributed devices (e.g., sensors and actuators) create newchallenges, especially for interoperability of services and data. These requirements willimpact the underlying cloud infrastructure requiring efficient management of very largesets of globally distributed nonstructured or semistructured data that could be producedat very high rates (i.e., big data). A multicloud service platform supported by broadbandnetworks needs to handle all these challenges and appear to the application environmentas one uniform platform.

“9780471697558c15” — 2015/3/20 — 12:17 — page 388 — #28


15.10 CONCLUSIONS

This chapter reviewed BDOC from a number of perspectives. A historical perspectivewas provided as to the evolution of computing Web services, and how the current prevail-ing information architecture is the result of a steady process of increase in connectivity,mobility, and adoption of applications such as those offered by Google, Facebook, andAmazon. We discussed a number of BDOC enterprise management challenges suchas SRE/DevOp training, availability and reliability, legal and security aspects, and theSTEM educational challenges that must be addressed. Further work is recommendedin each of these areas, in order to accommodate further growth and enable futurecapabilities, especially in the mobile space.

It is anticipated that other promising research areas for BDOC would continueto include [2] automated service provisioning, VM migration, server consolidation,energy management, traffic management and analysis, software frameworks, storagetechnologies and data management, and novel cloud architectures.

It is clear that BDOC is the wave of the future, and that these applications willcontinue to further benefit the user communities and many other stakeholders.

REFERENCES

1. Tom White, Hadoop: The Definitive Guide, 2nd Edition. Sebastopol, CA: O’Reilly Media,2012.

2. Qi Zhang, Lu Cheng, and Raouf Boutaba, “Cloud computing: state-of-the-art and researchchallenges.” Journal of Internet Services and Applications 1 (2010): 7–18

3. Leonard Kleinrock, “Models for computer networks.” Proceedings of the International Confer-ence on Communications, 1969, http://www.lk.cs.ucla.edu/index.html. Accessed November19, 2014.

4. Marc Andreseen, Mosaic—the first global web browser, NCSA Technical Report. 1992, http://www.livinginternet.com/w/wi_mosaic.htm and http://en.wikipedia.org/wiki/Mosaic_(web_browser). Accessed November 19, 2014.

5. Internet Engineering Task Force. This is where the Request for Comments (RFCs) aremaintained, www.ietf.org. Accessed November 19, 2014.

6. David Clark “We reject: kings, presidents, and voting. We believe in: rough consensus andrunning code.” 24th IETF Meeting, July 1992, Cambridge, MA, 1992.

7. John Battelle, The Search: How Google and Its Rivals Rewrote the Rules of Business andTransformed Our Culture. New York: Portfolio, 2005.

8. John Battelle, Battelle’s media blog, www.battellemedia.com. Accessed November 19, 2014.

9. Thomas L. Friedman, The World is Flat [Updated and Expanded]: A Brief History of theTwenty-First Century. New York: Farrar, Straus and Giroux, 2006.

10. Open Grid Forum (OGF), www.ogf.org. Accessed November 19, 2014.

11. OpenStack, the virtual organization that promotes the open standardization of cloud technolo-gies, http://www.openstack.org/. Accessed November 19, 2014.

12. Christos Gkantsidis, Dimitrios Vytiniotis, Orion Hodson, Dushyanth Narayanan, Florin Dinu,and Antony Rowstron, “Rhea: automatic filtering for unstructured cloud storage.” Presented

“9780471697558c15” — 2015/3/20 — 12:17 — page 389 — #29

REFERENCES 389

as part of the 10th USENIX Symposium on Networked Systems Design and Implementation.USENIX, 10th USENIX Symposium on Networked Systems Design and Implementation,April 2–5, 2013, Lombard, IL, 2013.

13. Qi Zhang, Quanyan Zhu, Mahamed Faten Zhani, Raouf Boutaba, and Joseph L. Hellerstein,“Dynamic service placement in geographically distributed clouds,” IEEE Journal on SelectedAreas in Communications 31 (2013): 762–772.

14. Bin Fan, David G. Andersen, and Michael Kaminsky, “MemC3: compact and concurrentmemcache with dumber caching and smarter hashing,” Proceedings of the 10th USENIXNSDI, 10th USENIX Symposium on Networked Systems Design and Implementation, April2–5, Lombard, IL, 2013.

15. Adage.com, “70 Billion TV to Digital Ads”, 2013. http://adage.com/article/media/70-billion-tv-ad-market-eases-digital-direction/244699/. Accessed November 19, 2014.

16. Joseph Betser and David Belanger, “Architecting the enterprise via big data analytics,” in BigData and Business Analytics, Jay Liebowitz, ed. Boca Raton, FL: CRC Press, 2013.

17. Underwood, Todd, Google, Usenix Panel, 2013, https://www.usenix.org/sites/default/files/conference/protected-files/underwood.pdf. Accessed November 19, 2014.

18. Amazon Elastic Map Reduce Web page, 2014. http://aws.amazon.com/elasticmapreduce/.Accessed November 19, 2014.

19. Hamzeh Khazaei, Jelena Mišic, Vojislav B. Mišic, and Nasim Beigi Mohammadi, “Availabil-ity analysis of cloud computing centers,” 2012 IEEE Global Communications Conference(GLOBECOM), Anaheim, CA, pp. 1957–1962, http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber = 6503402&url = http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6503402. Accessed November 19, 2014.

20. Francesco Longo, Rahul Ghosh, Vijay K. Naik, and Kishor S. Trivedi, “A scalable availabil-ity model for infrastructure-as-a-service cloud,” 2011 IEEE/IFIPS Conference on Depend-able Systems and Networks, International, Hong Kong, pp. 335–346, http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5958247&navigation=1. Accessed November 19,2014.

21. Anupam Das, Cristian Lumezanu, Yueping Zhang, Vishal Singh, Guofei Jiang, Curtis Yu,“Transparent and flexible network management for big data processing in the cloud,” Pro-ceedings of the 5th USENIX Worskhop on Hot Topics in Cloud Computing (Hotcloud ’13),2013, http://www.nec-labs.com/ lume/files/flowcomb-hotcloud13.pdf. Accessed November19, 2014.

22. Rajesh Nishtala, Hans Fugal, and Steven Grimm, “Scaling Memcache at Facebook,” Pro-ceedings of the 10th USENIX Symposium on Networked Systems Design and Implementa-tion, April 2013, pp. 356–370, https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala. Accessed November 19, 2014.

23. Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam,Lorenzo Alvisi, and Mike Dahlin, “Robustness in the Salus scalable block store,” Proceed-ings of the 10th USENIX Symposium on Networked Systems Design and Implementa-tion, April 2013, pp. 356–370, https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/wang_yang. Accessed November 19, 2014.

24. Eric Bauer and Randee Adams, “Analyzing cloud reliability and availability,” in Reliabilityand Availability of Cloud Computing. New York: Wiley-IEEE, 2012, p. 84.

25. Paolo Costa, Austin Donnelly, Antony Rowstron, and Greg O’Shea, “ Camdoop: exploitingin-network aggregation for big data applications,” in 9th USENIX Symposium on Networked

“9780471697558c15” — 2015/3/20 — 12:17 — page 390 — #30


Systems Design and Implementation (NSDI’12), 2012, http://research.microsoft.com/apps/pubs/default.aspx?id=163081. Accessed November 19, 2014.

26. Bruno Silva, Paulo Maciel, Eduardo Tavares, and Armin Zimmermann, “Dependability mod-els for designing disaster tolerant cloud computing systems,” 2013 43rd Annual IEEE/IFIPInternational Conference on Dependable Systems and Networks (DSN), Budapest, Hungary,http://www.computer.org/csdl/proceedings/dsn/2013/6471/00/06575323-abs.html. AccessedNovember 19, 2014.

27. Apache Hadoop tutorial, http://developer.yahoo.com/hadoop/tutorial/module1.html. Acce-ssed November 19, 2014.

28. ISO 27001, “Information security management system,” International Standards Organiza-tion, October 2005, http://www.27000.org/iso-27001.htm. Accessed November 19, 2014.

29. Joint Task Force Transformation Initiative, NIST SP 800-53 Rev. 4., “Security and privacycontrols for federal information systems and organizations”, http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-53r4.pdf. Accessed November 19, 2014.

30. HIPAA/HITECH Act, 45 CFR 164, February 2013, http://www.hipaasurvivalguide.com/hipaa-regulations/hipaa-regulations.php. Accessed November 19, 2014.

31. North American Electric Reliability Council (NERC) Critical Infrastructure Protection Com-mittee, “Security Guidelines.” http://www.nerc.com/comm/CIPC/Pages/Security-Guidelines.aspx. Accessed November 19, 2014.

32. Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage, “Hey, you, get off ofmy cloud: exploring information leakage in third-party compute clouds,” CCS ’09: Proceed-ings of the 16th ACM Conference on Computer and Communications Security, November9–13, 2009, Chicago IL, 2009.

33. Hassan Takabi, James B.D. Joshi, and Gail-Joon Ahn, “Security and privacy challenges incloud computing environments,” IEEE Security & Privacy 8 (2010): 24–31.

34. Kui Ren, Cong Wang, and Qian Wang, “Security challenges for the public cloud,” IEEEInternet Computing 16 (2012): 69–73.

35. Rick Holland, Stephanie Balaouras, John Kindervag, and Kelley Mak, The CISO’s Guide ToVirtualization Security. Forrester Research Inc., 2012, http://www.forrester.com/The+CISOs+Guide+To+Virtualization+Security/fulltext/-/E-RES61230. Accessed November 19, 2014.

36. Ellen Messmer, “VMware strives to expand security partner ecosystem,” Network World,August 31, 2011, https://www.networkworld.com/news/2011/083111-vmware-security-partners-250321.html. Accessed November 19, 2014.

37. Bernd Grobauer, Tobias Walloschek, and Elmar Stöcker, “Understanding cloud computingvulnerabilities,” IEEE Security & Privacy 9 (2011): 50–57.

38. Advisories & Certifications, VMware, 2014. http://www.vmware.com/security/advisories.Accessed November 19, 2014.

39. IBM X-Force, “IBM X-Force 2010 Trend and Risk Report,” IBM, 2010. https://www.ibm.com/services/forms/signup.do?source = swg-spsm-tiv-sec-wp&S_PKG = IBM-X-Force-2010-Trend-Risk-Report. Accessed November 19, 2014.

40. “Former Shionogi employee sentenced to Federal Prison for hack attack on com-pany computer servers,” The United States Attorney’s Office, District of New Jerseypress release, December 9, 2011, http://www.justice.gov/usao/nj/Press/files/Cornish,%20Jason%20Sentencing%20News%20Release.html. Accessed November 19, 2014.

41. Yanpei Chen, Vern Paxson, and Randy H. Katz, “What’s new about cloud computing secu-rity?” University of California at Berkeley Technical Report UCB/EECS-2010-5, http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-5.html. Accessed November 19, 2014.

“9780471697558c15” — 2015/3/20 — 12:17 — page 391 — #31

REFERENCES 391

42. Linda L. Briggs, Q&A: New Approaches for Tackling Big Data Security Issues, http://tdwi.org/articles/2011/09/20/big-data-security-2.aspx. Accessed November 19, 2014.

43. Cloud Security Alliance, “Cloud Controls Matrix (CCM) Version 3.0.1, July 10, 2014,”https://cloudsecurityalliance.org/research/ccm/#_new. Accessed November 19, 2014.

44. Karen Scarfone, Murugiah Souppaya, and Paul Hoffman, Guide to Security for Full Virtualiza-tion Technologies. National Institute of Standards and Technology (NIST), January 2011, http://csrc.nist.gov/publications/nistpubs/800-125/SP800-125-final.pdf. Accessed November 19,2014.

45. Christopher Wolf, “Requirements and security assessment procedures,” 2012, http://www.americanbar.org/content/dam/aba/administrative/litigation/materials/sac_2012/22-15_intro_to_data_security_breach_preparedness.authcheckdam.pdf. Accessed November 19, 2014.

46. Marta C. González, César A. Hidalgo, and Albert-László Barabási, “Understanding individualhuman mobility patterns”. Nature 453, 779–782 (2008).

47. Alexandros Labrinidis and H. Jagadish, “Challenges and opportunities with big data,” Pur-due University Cyber Technical Reports, Purdue University Cyber Center Technical Reports,2012, http://docs.lib.purdue.edu/cctech/1/. Accessed November 19, 2014.

48. Alexander Keller and Heiko Ludwig, “The WSLA framework: Specifying and monitoringservice level agreements for web services.” Journal of Network and Systems Management 11(2003), 57–81.

49. National Institute of Standards and Technology Public Cloud Computing Security WorkingGroup, “Challenging security requirements for US Government cloud computing adoption,”2012, http://collaborate.nist.gov/twiki-cloud-computing/pub/CloudComputing/CloudSecurity/Challenging_Security_Requirements_for_US_Government_Cloud_Computing_Adoption_v6-WERB-Approved-Novt2012.pdf. Accessed November 19, 2014.

50. Miriam Russom, Robert Sloan, and Richard Warner, “Legal concepts meet technology: a50-state survey of privacy laws”, Proceedings of the 2011 Workshop on Governance ofTechnology, Information, and Policies, ACM, December 6, 2011, Orlando, FL, 2011.

51. Electronic Frontiers Foundation, “Privacy: statutory provisions,” 2011, https://ilt.eff.org/index.php/Privacy:_Statutory_Protections. Accessed November 19, 2014.

52. Payment Card Industry, Data Security Standard 3.0 Requirements and Security AssessmentProcedures, November 2013, https://www.pcisecuritystandards.org/documents/PCI_DSS_v3.pdf. Accessed November 19, 2014.

53. Mashable.com, “Fastest growing jobs,” 2013, http://mashable.com/2013/11/13/fastest-growing-jobs/. Accessed November 19, 2014.

54. Joseph Betser, “Knowledge management (KM) and e-learning (EL) growth for industryand university outreach activities via capstone projects: case studies and future trends,” JayLiebowitz and Michael Frank, eds. Knowledge Management and E-Learning. London: Taylorand Francis Press, pp. 305–321, 2010.

55. ABET 2020: Face the Future, “Proceedings of the 2003 ABET Annual Meeting,” Minneapolis,October 30–31, 2003, www.abet.org. Accessed November 19, 2014.

56. Carol B. Muller, Sally M. Ride, Janie Fouke, Telle Whitney, Denice D. Denton, Nancy Cantor,Donna J. Nelson, et al. “Gender differences and performance in science,” Science 307 (2005):1043.

“9780471697558c15” — 2015/3/20 — 12:17 — page 392 — #32

“9780471697558index” — 2015/3/20 — 12:18 — page 393 — #1

INDEX

Note: Page numbers in italics refer to Figures; those in bold to Tables

activity function generation, workflowactivity

activity semantics, 322, 322–3algorithm, 321, 321–4DAG control flow, 323, 323data flow, 323–4, 324function signature, 321–2, 322

Actual MAC (AMAC), 92Alfredo, 159, 161, 162Amazon Dynamo, 12Amazon EC2, 11, 19, 25, 99, 140, 224,

232, 296Amazon Simple Workflow (SWF) service

activity workers, 315–17, 316architecture, 315, 316asynchronous functions, 317decider, 315, 317task queues, 315

Amazon Virtual Private Cloud (VPC), 11,93–4

Amazon Web Services (AWS), 35, 36, 39,317, 320

Apache Hadoop, 158Apache Software Foundation project,

38–40Apple iCloud, BDOC, 363application platforms, 17, 129, 130, 132,

150, 333application programming interface (API),

7, 31, 138, 244architectures

cloud computing, 7application layer, 8

hardware layer, 7infrastructure layer, 8platform layer, 8

FlowIPS security design, 279Interoperable Workflow Intermediate

Representation (IWIR), 319, 319mobile cloud computing (MCC),

158–161OpenDaylight, 145, 146OpenStack, 139, 139RainCloud workflow, 318, 318scheduling, 199, 199security, 271software-defined networking (SDN),

132–3, 133virtual machine (VM) migration, 50,

50–513-ary CamCube topology, 80, 81automated migration, 67, 134autonomic computing, 220–221autonomic management, 41, 219, 361autonomic manager, 220, 233, 240

bag-of-tasks (BoTs), 249, 255BCMS see Bounded Congestion Multicast

Scheduling (BCMS)BCube, hybrid switch/server topologies,

79–80, 80big data on clouds (BDOC),

19–20application service provider to cloud

computing, 362–3


393

“9780471697558index” — 2015/3/20 — 12:18 — page 394 — #2

394 INDEX

big data on clouds (BDOC) (cont’d)business appetite, 362business applications

global enterprises, 365–6mobile services, 367online advertising, 366–7site reliability engineers (SREs),

367–8technical computing, 366

cloud and service availabilitydisaster tolerance, 370–372operational availability, 368–70

cloud computing, 19–20, 362cloud resources, 387clouds—supply and demand,

364–5contracting with cloud service providers

(CSPs)contractual agreements, 381disaster tolerance and business

continuity, 383information security requirements,

383–4service-level agreements (SLAs),

381–3DevOps professions, 385–6execution see science, technology,

engineering and mathematics(STEM)

incident response, 379information assurance, 387and Internet, 387legal issues, 380–384privacy and governance, 380search, 362–3security issues see securitysite reliability engineering (SRE), 361,

385, 386technical capabilities

dynamic service placement, 364MemC3, 364performance enhancement, 364

bisection bandwidth constrains, 77blocking switch port, 285Bluetooth, 161, 162, 171, 176Bounded Congestion Multicast Scheduling

(BCMS), 91broker, performance management and

monitoring, 222

browsers, weaknesses of, 166buzzword, 4

canonical three-tiered tree-like topology,77, 77

capacity constraints, 112CDMI see Cloud Data Management

Interface (CDMI)CDNs see content delivery networks

(CDNs)Ceilometer project in OpenStack, 150centralized monitoring, cloud data centers,

41Chroma, 159, 161CIMI see Cloud Infrastructure

Management Interface (CIMI)client server technique, MCC

Alfredo, 159Apache Hadoop, 158Chroma, 159CMCVR, 160Cuckoo, 160Hyrax, 158–9MWSMF, 159Spectra, 159VMC, 160

CloneClouds, 161, 162, 177, 181cloud computing see also resource

managementAmazon Web Services (AWS), 243architecture, 7, 7–8big data on clouds (BDOC), 19–20characteristics, 4–5cloud networking, 11–12cloud services, 8–10computing models and, 6–7data center networks and relevant

standards, 15–16data center virtualization, 11data storage and management, 12definition, 4–5, 243energy management, 13–14, 18–19Google Application Engine (GAE),

243–4interactive multimedia applications,

20–21interdata center networks, 16key driving forces, 5–6MapReduce programming model, 12–13

“9780471697558index” — 2015/3/20 — 12:18 — page 395 — #3

INDEX 395

mobile cloud computing, 17multi-clouds, autonomic performance

management, 18National Institute of Standards and

Technology (NIST), 4–5OpenFlow and, 16–17and OpenStack see OpenStackperformance management and

monitoring, 220privacy, 14resource management, 13, 17–18scheduling, 17–18scientific applications see scientific

applications on cloudssecurity, 14, 19software-defined networking (SDN)

technology, 12, 16–17survivability and fault tolerance, 19

see also survivability in cloudvirtualization, 14–15VM migration, 11, 15

Cloud Data Management Interface(CDMI), 33–4

cloud gamingAmazon EC2 infrastructure,

340–341BitTorrent, 340–341challenges, 339definition, 333–4hybrid DC-CDN infrastructure, 344–5,

345measurement settings, 340–341multi-DC infrastructure

cloud providers, 342–3EC2 cloud infrastructure, 341, 341latency-based strategy, 342region-based DC location strategy,

343, 343region-based strategy, 342

offloading computation, 339Cloud Infrastructure Management Interface

(CIMI), 33Cloudlets, 160–161, 179, 181Cloud-Mobile Convergence for Virtual

Reality (CMCVR), 158, 160CloudMonkey, Python tool, 38cloud networking, 11–12, 27–8, 28cloud platform, 25, 26cloud providers, 8–9, 11

cloud gaming, 342–3scientific applications on clouds, 312service providers, 3virtual machines (VMs), 25

cloud resource management, tools andsystems

CloudStack, 38–9Deltacloud, 39–40Eucalyptus, 35–6Libcloud, 39Libvirt, 40OpenNebula, 36–7OpenStack, 37–8

cloud service providers (CSPs)big data on clouds (BDOC), contract

withcontractual agreements, 381disaster tolerance and business

continuity, 383information security requirements,

383–4service-level agreements (SLAs),

381–3to cloud computing, 362–3

cloud slice, 25CloudStack, 38–9, 147, 280, 312CMCVR see Cloud-Mobile Convergence

for Virtual Reality (CMCVR)Code-Oriented eXplicit multicast

(COXcast), 91communication networks, 164, 166–7,

197–8, 368communication protocols

Bluetooth, 1623G/4G, 162Wi-Fi, 161–2

communications infrastructure, 209–10communication-to-computation ratio

(CCR), 259community clouds, 10, 157computing server power consumption,

195–6, 196configuration management and control,

373, 381Content Addressable Network (CAN), 80content delivery networks (CDNs), 337

see also hybrid delivery modelscooperative monitoring, cloud data centers,

41

“9780471697558index” — 2015/3/20 — 12:18 — page 396 — #4

396 INDEX

COXcast see Code-Oriented eXplicitmulticast (COXcast)

Crossroads, datacenters, 94–5Cuckoo, 160, 162, 364Cuckoo, client-server based framework,

160, 162, 364

datacenter (DC), 11, 336 see also hybriddelivery models

data center infrastructure efficiency(DCIE), 194

datacenter networks (DCNs)3-ary CamCube topology, 80, 81bisection bandwidth constrains, 77canonical three-tiered tree-like topology,

77, 77commodity off-the-shelf (COTS)

switches, 76Content Addressable Network (CAN),

80emerging technologies, 93–4equal-cost multipath (ECMP), 76expansion, efficient and incremental,

96–7full address space virtualization, 95hybrid switch/server topologies, 79–80,

80incremental expansion, 77load balancing across multiple paths, 98multi-rooted tree, 76multitiered topology, 77name and locator separation, 94–5network expansion, 83–5requirements, 96routing, 89–93server only topology, 80–82sharing and performance guarantees, 97switch-oriented topologies, 78,

78–9, 79tenants, address flexibility, 97–8topologies, 76–82traffic see traffic, datacenter networks

(DCNs)valiant load balancing (VLB), 76

data integrity, 166, 174data replication, cloud computing

applications, 205cost-based, 206databases, 206–7

downlink bandwidth requirements, 207,207

energy and residual bandwidth, 207–8,208

optimal location, 205–6policy maker, 205

data security, 166, 181, 184, 372, 378DC see datacenter (DC)DCell, hybrid switch/server topologies,

79–80, 80Deadline-Markov decision process (MDP)

algorithm, 260–261delivery cloud, interactive multimedia

applicationscontent delivery networks (CDNs), 337datacenter (DC), 336peer-topeer (P2P)-based infrastructure,

336–7Deltacloud, 39–40deterministic packet marking (DPM), 276development and operations (DevOps)

professions, 20, 385–6disaster tolerance

business continuity, 371definition, 370disruptive events probability, 371–2planning, 370–371primary resources, 371

Distributed Management Task Force(DMTF), 32

Distributed Overlay Virtual Ethernet(DOVE), 65, 66

DOVE see Distributed Overlay VirtualEthernet (DOVE)

DPM see deterministic packet marking(DPM); dynamic PowerManagement (DPM)

DropBox, BDOC, 363Dryad computational model, 12–13DVFS see dynamic voltage and frequency

scaling (DVFS)dynamic Power Management (DPM), 194,

195, 198, 209dynamic voltage and frequency scaling

(DVFS), 194, 195, 209dynamic voltage scaling (DVS) links, 197

ECG data analysis software, 171, 171, 176ECMP see equal-cost multipath (ECMP)

“9780471697558index” — 2015/3/20 — 12:18 — page 397 — #5

INDEX 397

e-commerce, 3, 23, 243, 362–3, 366–7edge-core clouds (scenario), 225electric bills, energy-efficient design

data center subset selection, 116demand profile, 118, 119demand response (DR) component, 115dynamic pricing, 115IDC demands, 115IDC workload sharing, 116–18, 118Opex savings, 119, 119–20physical link cost assignment, 116smart grid, 115virtual link cost assignment, 115–16

e-mail, 23, 362–3embedding, network virtualization, 24, 301,

301–2, 306EMC2, storage provider enterprise, 67encryption algorithms erosion, 373energy consumption in data centers

challenges, 210–211communication networks, 197–8computing servers and switches, 195–6energy-efficiency

communications infrastructure,209–10

computing servers, 197data replication, 205–8load balancing, 200–205networking switches, 196, 197scheduling, 198–201virtual machines placement, 208–9

Interactive Data Corporation (IDC),194

next-generation user devices, 193equal-cost multipath (ECMP), 76, 85, 89,

98e-STAB scheduling policy

queue-size, 203–4, 204racks and modules, 203, 203server selection, 204, 204steps, 202

Eucalyptus, 35–6, 244, 245European Network and Information

Security Agency (ENISA), 10Execution Engine, 227extended semishadow images (ESSI), 161

fabric manager, 92failure tolerance in cloud

hard disk failures, 299load balancers (LBs), 299Microsoft data centers, 299NetPilot, 298–9raid controller failures, 299

faulty backup mechanisms, 166FireCol, flooding DDoS attacks detection

solution, 275flow conservation constraints, 112FlowIPS security design

architecture, 279cloud cluster, 280controller, 281Open vSwitch (OVS), 280–281processing flow, 281–2, 282Snort, 281Snort/Iptables IPS vs., 282–4, 283system components, 280–281

Forwarding Rules manager (FRM), 145

generic routing encapsulation (GRE), 25,141

global enterprises, BDOC, 365–6Google, 3, 5, 8, 185, 363, 366, 367Google Glass, 185, 193grid computing, 6–7, 363

Hadoop distributed file system (HDFS),158, 159

hardware latency, 335Heat project in OpenStack, 150Hedera, 89–90HEFT see Heterogeneous earliest finish

time (HEFT)heterogeneous clouds

heterogeneous monitoring systems, 239performance management and

monitoring, 238–9public cloud environment, 239rapid reaction, 239traditional monitoring techniques

inaccuracy, 239Heterogeneous earliest finish time (HEFT),

259heuristic solution, IDC network

virtualization, 113–15, 114high-availability virtual infrastructure

management framework (Hi-VI),302–3

“9780471697558index” — 2015/3/20 — 12:18 — page 398 — #6

398 INDEX

Hi-VI see high-availability virtualinfrastructure managementframework (Hi-VI)

Host-based intrusion detection system,275

hybrid cloud, 9–10, 217, 224–5, 247hybrid cloud optimization cost (HCOC)

algorithm, 259hybrid delivery models

CDN-P2P compositions, 338compositions, 337, 338DC CDN, 339multi-CDN, 338–9multi-DCs, 339

hybrid switch/server topologies, 79–80, 80hypervisor, 25, 50–51Hyrax, client-server approach

computational resources, 158–9, 177fault tolerance mechanism, 162VMC, 160

IDS/IPS see intrusion detection andprevention systems (IDPS)

infrastructure as a service (IaaS), 8, 9performance management and

monitoring, 217provider, 8

input/output (I/O) virtualizationdrawback, 57hardware-assisted network, 55, 56modes, 55, 55Network Plug-In Architecture

(NPA/NPIA), 56Peripheral Component Interconnect

Express (PCIe), 55single root I/O virtualization (SR-IOV),

55techniques, 57, 58VM Device Queues (VMDq), 55VM live migration, 56, 57

insecure/incomplete data deletion, 166Interactive Data Corporation (IDC), 194interactive multimedia applications on

cloudsadaptive streaming, 354delivery cloud, 336–7Graphics Processing Unit (GPU), 353–4hardware latency, 335hybrid delivery models, 337–9, 338

Massive User-Generated Content (UGC)Live Streaming, 334

multimedia tasks virtualization,353–4

network latency, 335–6networks economics, 353on-demand gaming see cloud gamingresponse time, 335time-shifting on-demand TV, 334time-shifting video streaming, 351–3,

353user-generated content (UGC) live

streaming, 345–351inter-data-center (IDC) networks

electric bills, energy-efficient design,115–20

energy efficiency impacts, 107energy vs. resilience trade-off,

123–4heterogeneous, 105, 106optical, 106–7penalties, design, 120–123public telecom network, 105–6schemes, 125, 126virtualization see virtualization

internal clouds, 9, 372Internet Group Management Protocol

(IGMP), 93–4Interoperable Workflow Intermediate

Representation (IWIR)atomic activity, 314–15composite activity, 315directed acyclic graph (DAG), 314–15,

315IWIR-to-SWF conversion

activity function generation algorithm,321, 321–4

architecture, 319, 319control flow constructs, 320HTTP connection, 319–20principle, 320SWF decider generation algorithm,

320workflow application, 314

intrusion detection and prevention systems(IDPS), 19, 269, 274

in cloud virtual networkingenvironments, 279

design challenges, 272

“9780471697558index” — 2015/3/20 — 12:18 — page 399 — #7

INDEX 399

latency, 272Network Intrusion Detection System

(NIDS) mode, 274network reconfigurations, 272resource consumption, 272SDN-based cloud security solutions,

277–8Snort, 272, 274solutions, limitation of, 278–9Suricata, 272

IPS see intrusion detection and preventionsystems (IDPS)

IWIR see Interoperable WorkflowIntermediate Representation (IWIR)

Java EE web application, 232–3, 235, 240Jellyfish, 84, 84–5justin.tv analysis see user-generated

content (UGC) live streaming

Knowledge Store component, 226, 227

label switched paths (LSPs), 25legacy IP networks, 105legislative/organizational risks, MCC,

168–9, 176Legup, 83, 83Libcloud, 39Libvirt, 40limited system monitoring, 373Link Aggregation Control Protocol

(LACP), 91, 92link protection and restoration, 304Linux bridging, 142live migration

big data on clouds (BDOC), 374definition, 50Internet Protocol (IP) address, 52memory migration, 52–3offline and, 53, 54page fetching, 53post-copy, 53pre-copy, 53retransmission of dirty pages, 52strategies, 53transmission control protocol (TCP), 52virtual CPU (vCPU), 52virtual machines, 66, 145, 225

load balancingelectricity cost, 204–5e-STAB scheduler, 202–4, 203, 204Ethernet standards, 202idle servers, 200–201stochastic power reduction scheme

(SAVE), 205virtual machines, 204

Locator/Identifier Separation Protocol(LISP), 66–7

malicious insider, 166, 270, 279Management Logic implementation, 226,

226–7manycast constraints, 112–13MAPE-k loop, XCAMP components

Execution Engine, 227information aggregation service, 225–6Knowledge Store component, 227Management Logic implementation,

226, 226–7notification engine, 225–6Plugin Engine, 226

MapReducecloud computing and, 6programming model, 12–13resource management, 13

MAUI framework, 158, 159, 161, 162MCC see mobile cloud computing (MCC)Microsoft Hyper-V, 93–4MobiCloud, 161, 162mobile ad hoc networking (MANET), 161mobile cloud computing (MCC)

abstraction and virtualization, 155application partitioning/client server,

158–60benefits, 156characteristics, 155client server, 157complexity and dynamism, 154definition, 155, 156description, 153frameworks and application models

architectures, 158–161communication protocols, 161–2risk management strategies, 162

hybrid approach, 157peer to peer approach, 157popularity, 156

“9780471697558index” — 2015/3/20 — 12:18 — page 400 — #8

400 INDEX

mobile cloud computing (MCC) (cont’d)risk management see risks in mobile

cloud computing (MCC)services, 155–6VM technology, 160–161

mobile services, 17, 177, 362, 367Mobile Web Services Mediation

Framework (MWSMF), 158, 159MPLS see multiprotocol label switching

(MPLS) networksmulticlouds, 18, 217–19, 222, 225, 387Multiple Spanning Tree Protocol (MSTP),

91multiprotocol label switching (MPLS)

networks, 25, 90, 105multitiered topology, 77MWSMF see Mobile Web Services

Mediation Framework (MWSMF)

natural disasters, risks, 166NETCONF, management protocols, 28, 44NetLord, full address space virtualization,

95, 95NetPilot, failure characterization, 298–9network latency, 335–6, 342Network Plug-In Architecture

(NPA/NPIA), 56network reconfiguration (NR), 19

actions, 284blocking switch port, 285filtering, 285mechanism, 286QoS adjustment (QA), 284, 286–7quarantine, 285selection policy, 287, 287–8traffic isolation (TI), 284–5traffic redirection (TR), 284–6

network security, 166, 271, 277network virtualization, 4, 11, 24, 27, 42,

59, 108–115Net-Zero Energy Data Center, 14NR see network reconfiguration (NR)

offline migration, 52–3, 54 see also livemigration

OFRHM see OpenFlow Random HostMutation (OFRHM)

on-demand gaming see cloud gaming

online advertising, BDOC, 366–7Open Cloud Computing Interface (OCCI),

31–2, 36, 37Open Cloud Networking Interface (OCNI),

32OpenDaylight

architecture, 145, 146cloud computing and SDN interaction,

139, 146, 147description, 145Forwarding Rules manager (FRM), 145Open-Vswitch database (OVSDB), 148service provider, 147virtual infrastructures management,

architecture for, 148, 148virtualization edition, 147, 147Virtual Tenant Network (VTN) service,

147VN embedding problem, 148

OpenFlowcomponents, 135, 135controller, 90, 95elements, 134Ethernet destination and source

addresses, 135–6flow entries removal from entries, 136identifier, 137matching of packets, 136, 136message types, 137Open Networking Foundation, 134–5ports, 136QoS implementation, 137switch and controller, 134, 135technology, 16–17transport layer security (TLS), 137

OpenFlow Management Infrastructure(OMNI), 65–6

OpenFlow Random Host Mutation(OFRHM), 278, 279

open/global grid forum, 363OpenNebula, 36–7, 39open source cloud management platforms

CloudStack, 38–9Eucalyptus, 35–6OpenNebula, 36–7OpenStack, 37–8

OpenStack, 37–8application programming interface

(API), 138

“9780471697558index” — 2015/3/20 — 12:18 — page 401 — #9

INDEX 401

architecture, 139, 139BDOC, 363data center deployment, 140–141, 141Identity service (Keystone), 140interrelated services, 138–9Neutron networking service, 140nova compute service, 140tenant and provider networks, 141–2,

142tenant resource, 140virtual machines (VM), 138, 138

Open Virtualization Format (OVF), 32–3,377

Open Virtual Switch (OVS) cloudenvironment, 279

Open vSwitch (OVS), 19, 142–4, 143Open-Vswitch database (OVSDB), 148operational availability, big data on clouds

(BDOC)Camdoops, 369child tasks failures, 369FlowComb, 369hanging tasks failures, 369–70Internet service providers, 368MapReduce program, 369map tasks and reduce tasks, 370Scalus for HBase, 369service-level agreements (SLAs), 368tasktracker failure, 370

opportunistic redundancy pooling (ORP),300, 300–301

optical interconnection networks, 209–10optical switching architecture (OSA), 78–9,

79ORP see opportunistic redundancy pooling

(ORP)OVF Packages, 32–3

packet marking technique, IP traceback,275–6

pattern-based deployment service (PDS),229–231, 237

peer-to-peer (P2P)-based infrastructure,336–7

Pegasus Workflow Management System,314

penalties, inter-data-center (IDC) networkselastic optical network (EON) backbone,

120

ij, virtual link, 120minimum outage probability in cloud,

121resource saving minimum outage

probability in cloud (RS-MOPIC),121–3, 122

upstream data center demand, 121virtual path (VP), outage probability, 120

performance management and monitoringautonomic computing, 220–221autonomic manager, 220broker, 222cloud computing, 220heterogeneous clouds, 238–9hybrid clouds, 217implementation

Amazon EC2, 233analysis and planning, 229–30execution, 229Java EE web application, 232–3, 235Misure, 230monitoring components, 229pattern-based deployment service

(PDS), 230–231scaling experiment, measurements

from, 233, 234scaling test, measurements, 234, 235

infrastructural flexibility, 220infrastructure-as-a-service (IaaS)-style,

217Management Logic, design and

implementation, 222, 233, 235–7multiclouds, 217, 222ownership and access, 219–20pay-as-you-go model, 220private clouds, 221X-Cloud Application Management

Platform (XCAMP), 222–9performance risks

complexity, 168data availability, 167, 175data location, 167data segregation/isolation, 167, 175MCC technology and services, 167network constraints, 168portability, 167quality-of service-(QoS) parameter,

174–5reliability, 168, 175

“9780471697558index” — 2015/3/20 — 12:18 — page 402 — #10

402 INDEX

performance risks (cont’d)resource exhaustion, 168service availability, 167, 175

Peripheral Component InterconnectExpress (PCIe), 55

platform as a service (PaaS), 8, 9, 25, 155,244, 245, 270, 364

Plugin Engine, 226Portland, data center network architectures,

12, 92, 99port-switching-based source routing

(PSSR), 90, 90, 98, 99power usage effectiveness (PUE), 13–14,

194private cloud, 9, 10, 25, 221, 247probabilistic packet marking (PPM)

method, 276ProtoGENI (federation), 43Pseudo IP (PIP), 95Pseudo MAC (PMAC), 92PSSR see Port-switching-based source

routing (PSSR)public cloud, 9, 10, 25, 246, 368PUE see power usage effectiveness (PUE)pyOCNI, OCNI reference implementation,

32

QoS adjustment (QA), 284, 286–7Quantum project, OpenStack, 37quarantine, network reconfiguration (NR),

285

RabbitMQ, 140raid controller failures, 299RainCloud workflow, 324–8

architecture, 318, 318ASKALON

black-box library, 325–6environment, 317–18heterogeneous and distributed

computing environments, 328scheduler, 326

execution time with 16 and 64 parallelloops, 327, 327–8

file transfer time, 325noncongested scenario, 325, 325–6parallelForEach loop in IWIR, 318,

318–19

processing time, 325queuing time, 325scheduling time, 325SWF version, 327waiting time, 325

random graph-based topologies, 85RBridges (routing bridges), 92remote connectivity, 157RESERVOIR (European Union’s Seventh

Framework Programme (FP7)project), 36

resilient provisioning with minimum powerconsumption in cloud (RPMPC),123–4, 125

resource managementallocation, 17–18, 248applications

bag-of-tasks (BoTs), 249characteristics, 248data transfers, 249service-oriented computing, 248–9variations, 249workflow task, 249, 250

Big Data, 262–3charging models and service-level

agreements (SLAs), 247cloud computing definition, 244cloud service models, 244–6, 245cloud types, 246–7greeness, 263hybrid clouds and uncertainty, 263–4infrastructure, 251multiple workflows scheduling, 263optimization techniques, 253scheduler and VM allocation

cooperation, 262scheduling in clouds see schedulingservice level agreements (SLAs), 246,

246, 251, 252user tasks submission, 249, 251VM allocation, 252–3

resource saving minimum outageprobability in cloud (RS-MOPIC),121–3, 122

Rewire, 84risks in mobile cloud computing (MCC)

application mobility, 170CloneCloud and Cloudlets, 179, 181context-awareness, 170

“9780471697558index” — 2015/3/20 — 12:18 — page 403 — #11

INDEX 403

cost and resource savings, 154definition, 163device mobility, 170ECG data analysis software, 171, 171effectiveness, 181factors, 172, 172–4framework analysis, 181, 182–3hierarchy, 179–181, 180Hyrax and MobiCloud, 162, 177, 179identification, analysis and treatment,

163legislative/organizational, 168–9, 176“metering of services”, 170mobile-specific, 176monitoring and control, 163performance, 167–8, 174–5physical risks, 170resource limitation, mobile devices, 170scenarios, 170, 176–7, 178security and privacy, 164–7Spectra and Alfredo, 159

Routelet, 66routing

bridges, 91–2layer 2

Actual MAC (AMAC), 92fabric manager, 92Link Aggregation Control Protocol

(LACP), 91, 92Multiple Spanning Tree Protocol

(MSTP), 91Portland, 92Pseudo MAC (PMAC), 92RBridges, 92routing bridges, 91–2Smart Path Assignment in Networks

(SPAIN), 92Spanning Tree Protocol (STP), 91Transparent Interconnect of Lots of

Links (TRILL), 91Virtual LANs (VLANs), 91

layer 3Bounded Congestion Multicast

Scheduling (BCMS), 91Code-Oriented eXplicit multicast

(COXcast), 91equal-cost multipath (ECMP), 89Hedera, 89–90

multiprotocol label switching(MPLS), 90

OpenFlow controller, 90Port-switching-based source routing

(PSSR), 90, 90valiant load balancing (VLB), 89

RPMPC see resilient provisioning withminimum power consumption incloud (RPMPC)

RS-MOPIC see resource saving minimumoutage probability in cloud(RS-MOPIC)

SAVI two-tier cloud testbed, 237, 238scheduling

in clouds see also resource managementallocation and, 248definition, 252, 252dependent tasks, 257–261elasticity management entity, 256,

256–7, 257independent tasks, 254–6, 255output, 249–50, 251submission, 250–251VM allocation, 261–2

data centerscomputing servers, 198congestion/hotspots, 199DENS methodology, 199–200DPM-like power management, 198Gigabit Ethernet (GE) interfaces, 198server load and communication

potential, 200, 201three-tier data center architecture, 199,

199science, technology, engineering and

mathematics (STEM)criticality, 385DevOps professions, 385–6engagement ideas, 385site reliability engineering (SRE), 385,

386scientific applications on clouds

Amazon, cloud providers, 312Amazon Simple Workflow (SWF)

service, 312, 315–17Architecture Neutral Distribution Format

(ANDF), 314big data, 328–9

“9780471697558index” — 2015/3/20 — 12:18 — page 404 — #12

404 INDEX

scientific applications on clouds (cont’d)cloud computing, 313Elastic Compute Cloud (EC2)

infrastructure, 313FutureGrid, 313–14Interoperable Workflow Intermediate

Representation (IWIR), 312,314–15

IWIR-to-SWF conversionactivity function generation, 320–324decider generation, 320

Megha workflow system, 314Pegasus Grid workflow system, 314RainCloud workflow, 317–19, 324–8security, 329SHIWA European project, 312, 312super computing, 329UNiversal Computer Oriented Language

(UNCOL), 314SDN see software-defined networking (SDN)secure virtual environment

access control, 62authentication and authorization, 62availability and isolation, 62confidentiality, 62integrity, 62nonrepudiation, 62replay resistance, 63vulnerabilities

access control policy, 64covert channel, 64intruders, 64loopholes in migration module, 65side channel attack, 63–4system administrator, 63unprotected channel transmission,

64–5security

algorithms, 290in big data see big data on clouds

(BDOC)big data on clouds (BDOC)

data encryption and access control,378–9

identity and access management,377–8

networking communications, 375–6planning and risk assessment, 375tenant separation, 377

threats and vulnerabilities, 372–4VM security, 376–7

cloud computing, 270data integrate, 270DDoS attack, 290IDS/IPS cloud security solutions see

intrusion detection and preventionsystems (IDPS)

malicious insider, 270network characteristics see network

reconfiguration (NR)OpenFlow-based IDPS solution see

FlowIPS security designperformance comparison, 288–90robust network architecture design, 271SDN-based cloud security solutions see

software-defined networking (SDN)signature-and anomaly-based detection,

290software-defined networking (SDN),

273–4synchronization, 290traditional non-SDN solutions, 275–6virtualization hijacking, 270–271

server only topology3-ary CamCube topology, 80, 81benefits and limitations, 80, 81CamCube, 80, 82Content Addressable Network (CAN),

80DCell and BCube, 82Fat-Tree, 81Forwarding Information Base (FIB), 80optical switching architecture (OSA), 82VL2 scales, 81

service-oriented computing andvirtualization, 129

service providers, 25 see also cloud serviceproviders (CSPs)

cloud providers, 3Internet, 368lease resources, 3OpenDaylight, 147virtualization management in cloud, 25

session riding and hijacking, 372–3Simple Mail Transfer Protocol (SMTP),

363Simple Network Management Protocol

(SNMP), 28–9

“9780471697558index” — 2015/3/20 — 12:18 — page 405 — #13

INDEX 405

single root I/O virtualization (SR-IOV), 55site reliability engineers (SREs), 20, 367–8small to medium enterprises (SMEs) survey

on cloud computing, 10Smart Path Assignment in Networks

(SPAIN), 92SMTP see Simple Mail Transfer Protocol

(SMTP)SNMP see Simple Network Management

Protocol (SNMP)Snort

FlowIPS security design, 281–4, 283intrusion detection and prevention

systems (IDPS), 272, 274software as a service (SaaS), 8, 9, 18, 25,

155, 244, 364, 372software-defined infrastructures (SDI)

cloud and network controllers, 149programmable hardware resources, 149resource management system (RMS),

149, 149SAVI project, 149, 149–50

software-defined networking (SDN) seealso OpenStack

automated scaling, 150Ceilometer project in OpenStack, 150for cloud computing

inter-data center networking, 145Linux bridging, 142networking requirements, 144Open vSwitch (OVS), 142–4, 143physical interfaces (PIFs), 142virtual interfaces (VIFs), 142

cloud security solutions, 269actuating trigger module, 277ALARMS, 276AvantGuard, 277based security solutions, 276–7CONA, content-oriented networking

architecture, 277CPRecovery component, 277FortNox, Security Enforcement

Kernel, 276–7intrusion detection and prevention

system (IDS/IPS), 277–8OpenFlow-enabled solutions, 273,

273, 276–7OpenFlow switch (OFS)’s flow tables,

273–4

OpenSafe, 276QuagFlow, 276replication mechanism, 277

definition, 132flexible and customizable networking,

133Heat project in OpenStack, 150layered architecture, 132–3, 133management systems, scalability, 151paradigm, 95technology, 12, 16–17

Spanning Tree Protocol (STP), 91Spectra framework, 159, 161standardized generic interfaces, 31stochastic power reduction scheme

(SAVE), 205storage media, inability to sanitize, 373Storage Networking Industry Association

(SNIA), 34storage virtualization, 26–7survivability in cloud see also failure

tolerance in cloudavailability, 297–8, 298cloud computing fundamentals, 296–7embedding schemes

comparison, 306high-availability virtual infrastructure

management framework (Hi-VI),302–3

opportunistic redundancy pooling(ORP), 300, 300–301

survivable mapping, 299–300survivable virtual network embedding

(SVNE), 304–5, 305Vailability-aware EmbeddiNg

framework In Cloud Environments(VENICE), 301, 301–2

worst-case survival (WCS), 303,303–4

fault domain and tolerance, 297, 298reliability, 297

survivable virtual network embedding(SVNE), 304–5, 305

switch-oriented topologiesClos-based topologies, 78, 78optical switching architecture (OSA),

78–9, 79System Resource, CIMI, 33

“9780471697558index” — 2015/3/20 — 12:18 — page 406 — #14

406 INDEX

Technical Working Group (TWG), 34time-shifting video streaming

characteristics, 352hybrid P2P-DC solution, 352video portions ratio, 352, 353VoD services and, 351–2

TPM see trusted platform module (TPM)traffic, datacenter networks (DCNs)

asymmetry, 86bandwidth guarantees, 88categories, 85flow arrival patterns, 87flow size, duration and number, 87hot spots, 87intra-and inter-application

communication, 86link utilization, 87location and exchange, 86management, 87–9nature, 86packet losses, 87properties, 86–7proportional sharing, 88

traffic isolation (TI), 284–5traffic redirection (TR), 284–6Transparent Interconnect of Lots of Links

(TRILL), 91transport layer security (TLS), 40, 137“TrustCloud”, security and privacy risks,

184trusted computing base (TCB), 50–51trusted platform module (TPM), 14

unauthorized access, risk, 165, 174user-generated content (UGC) live

streaminghybrid DC-CDN delivery

CDN infrastructures management,348

channels top categories, 349, 349online popular channels stability,

350–351, 351popular channels, 349–50, 350

justin.tv analysisinternational service, 34724/7 TV-like service, 347–8uploaders, 346–7, 347viewers, 346–7, 347

over-the-top (OTT) TV channels, 345–6

video sharing platforms, 346utility-based services, 155

Vailability-aware EmbeddiNg frameworkIn Cloud Environments (VENICE),301, 301–2

valiant load balancing (VLB), 89virtual data centers (VDCs), 11 see also

survivability in cloudVirtual eXtensible Local Area Network

(VXLAN), 93–4, 141Virtual Grid Application Development

Software Project (VGrADS), 35virtualization

cloud computing, 14–15inter-data-center (IDC) networks

backbone, 108, 109cloud customers, 107constraints, 111–12erbium-doped fiber amplifiers

(EDFAs), 110heuristic solution, 113–15infrastructure as a service (IaaS),

108infrastructure design, 109–10mathematical formulation, 110–113mixed integer linear programming

(MILP), 108–9notation, 110, 111objective, 108–9physical infrastructure, 108power consumptions, 110

virtualization management in cloudcloud services, 25cloud tenants, 41–2embedding, 24energy efficiency, 42fault management, 42federation, 43–4infrastructures, 23interfaces for, 30–34monitoring, 41multiple customers, 23–4operations, 29–30private clouds, 25resource configurations, 24scalability, 40–41security, 43service providers, 25

“9780471697558index” — 2015/3/20 — 12:18 — page 407 — #15

INDEX 407

standard management protocols andinformation models, 44

tools and libraries see cloud resourcemanagement, tools and systems

virtualized elementscomputing, 26management, 28–9networking, 27–8storage, 26–7

VMmonitor, hypervisor, 25Virtual LANs (VLANs), 91–3, 141Virtually Clustered Open Router

(VICTOR), 66virtual machines (VMs)

in cloud computing, 130cloud providers, 25computing resources, 26Deltacloud, 40Machine Resources, 33migration, 11, 15

automated migration, 67confidentiality, 67–8hypervisor, 50–51input/output (I/O) virtualization, 55,

55–7, 56, 58isolation, access control, and

availability, 65–66live migration, 50, 52–3, 54network connections, 66–7offline migration, 52–3, 54process migration, 50relocation, 49security see secure virtual

environmentstorage, 67trusted computing base (TCB), 50–51virtual network migration without

packet loss, 59–61VM downtime, 66Xen-based virtualization architecture,

50, 50–51XenFlow virtual topology migration,

51–3, 52operations, 29–30OVF Package, 32–3SDN see software-defined networking

(SDN)security issues, 43

server virtualization technologies, 26virtual mobile computing (VMC), 160, 161virtual network migration without packet

losscontrol plane directives, 59local area network (LAN), 59network topology, 59OpenFlow switched networks, 59XenFlow, 59

virtual networks (VNs), 27–8, 28, 33Virtual Private Cloud (VPC), 11, 93–4, 99Virtual Tenant Network (VTN) service, 147VLB see valiant load balancing (VLB)VL2, datacenters, 94, 94–5VMC see virtual mobile computing (VMC)VM Device Queues (VMDq), 55VMM-MIB module, management

protocols, 44VR-MIB module, management protocols,

44

WCS see worst-case survival (WCS)Web browsing use case, 131, 131Web services, security defects, 165wireless backhaul networks, 105wireless sensor networks (WSNs), 105wireline local area networks (LANs), 105worst-case survival (WCS), 303, 303–4

X-Cloud Application ManagementPlatform (XCAMP)

components, 222, 223deployment stage examination, 237deployment view, 227–8edge-core clouds (scenario 2), 225hybrid clouds (scenario 1), 224–5hypothesis evaluation, 237–8information abstraction, 228initial exploratory runs, 237Java EE application, 240Management Logic, 228–9MAPE-k loop, 222, 225–7SAVI two-tier cloud testbed, 237, 238

Xen-based cloud environment, 279XenFlow and VM migration techniques,

15

Yahoo, 363

“9780471697558ser” — 2015/3/20 — 12:19 — page 1 — #1

wiley end user license agreementGo to www.wiley.com/go/eula to access Wiley’s ebook EULA.

Cloud Services, Networking, and Management

Documents