Raouf Boutaba Research Challenges in Cloud Computing D. Cheriton School of Computer Science University of Waterloo CS856 W’17
Raouf Boutaba
Research Challenges in Cloud Computing
D. Cheriton School of Computer Science University of Waterloo
CS856 W’17
Outline • Data Center Networks • Network Management • Resource and Performance Management • Energy Management • Pricing and Economics • Security and Enterprise Applications
Data Center Networks • Data center networks form the backbones of data centers • Connecting tens of thousands of servers that may host
millions of applications
• Characteristics • Very large scale • Single administrative domain • Bandwidth is often the performance bottleneck
3 Research Issues and Current Trends
Conventional Architecture
4 Research Issues and Current Trends – Data Center Networks
Source: VL2: A Scalable and Flexible Data Center Network, SIGCOMM 2009
Limitations of Conventional Architectures
• High oversubscription ratio (i.e. creating bandwidth bottleneck) • Typically 1:5, 1:80 or even 1:240 at root
• Poor reliability and utilization
• Static network addresses assignment • Fragmentation of resources • Difficult to support VM migration due to address
reconfiguration
5 Research Issues and Current Trends – Data Center Networks
Design Objectives • Scalability
• Scale to millions of servers without compromising performance
• Economics • Built using commodity switches and servers
• Performance • Low network diameter • Large bisection bandwidth
• Reliability • Multiple forwarding paths for host-to-host communication
• Application Support • Support address reconfiguration and VM migration
6 Research Issues and Current Trends – Data Center Networks
Architectural Proposals • Switch-Centric
• Forwarding using only switches • E.g. Portland, VL2
• Server-Centric • Forwarding using both switches and servers • E.g. DCell, Bcube, CamCube
7 Research Issues and Current Trends – Data Center Networks
Portland • Uses a fat-tree topology for path diversity and large bisection bandwidth • Operates on Layer 2
• Using Pseudo-MAC address in the format of pod.position.port.vmid for forwarding
• Using a centralized fabric manager to manage actual to pseudo MAC mapping
8 Research Issues and Current Trends – Data Center Networks – Switch-Centric
BCube • Targeting container-based datacenters • Using a generic hypercube topology
• Overlay routing at layer 2.5 • Efficient support for communication patterns such as one-to-one, one-to-
many, many-to-many using source routing
9 Research Issues and Current Trends – Data Center Networks – Switch-Centric
• BCube0 = n servers + one mini-switch (n<=8) • Bcubek= n Bcubek-1 + nk n-port switches Connection Rule: The level-k port of the i-th server in the j-th Bcubek-1 to the j-th port of the i-th level-k switch
Research Challenges • Understanding the trade-off between different architectures
• Switch centric vs. Server centric
• Comparison criteria • Network capacity • Robustness • Capital and Operational Cost
• Managing and upgrading existing data center networks over time
10 Research Issues and Current Trends – Data Center Networks
Outline • Data Center Networks • Network Management • Resource and Performance Management • Energy Management • Pricing and Economics • Security Management • Migrating Enterprise Applications to the Cloud
11
Network Management Issues • Naming and addressing
• Address configuration and management
• Flow control and management • Congestion Control • Flow Scheduling
12 Research Issues and Current Trends
Address Configuration • ID/Locator separation is a design principle of data center
networks. • E.g. Portland maintains the mapping between physical MAC and
hierarchical PMAC addresses, • E.g. BCube assigns virtual addresses to individual host
• Automatic address reconfiguration is a requirement • Manual configuration is costly and error prone
13 Research Issues and Current Trends – Network Management
Congestion Control • Data center traffic typically consists of
• (>80%) Low latency short flows (i.e. user facing requests) • (<20%) throughput sensitive long flows (i.e. backend operations such
as MapReduce)
• Current transport protocol (TCP) is not suitable for such type of traffic pattern • network buffers in switches and servers are often overwhelmed by
long flows, resulting in high latency for short flows
• Achieving fairness by dynamically adjusting congestion window in servers
14 Research Issues and Current Trends – Network Management
Flow Scheduling • Given path diversity provided by data center networks, route
network flows to minimize congestion
• Current Approaches • Equal Cost Multipath (ECMP)
• Determining path using a hash function (called flow-hashing) • Valiant Load Balancing (VLB)
• Bouncing packet off of random intermediary nodes (switches or servers)
• Limitation: • Inefficient for non-uniform traffic patterns
• Two heavy weight flows may collide, resulting in congestion
15 Research Issues and Current Trends – Network Management
Flow Scheduling (cont) • Flow scheduling
• Separate flows into large and small flows
• For small flows, use ECMP or VLB
• For large flows, use centralized scheduling • A variant of NP-hard multi-commodity flow problem
• Implementation • Monitor network flows • Dynamically inserting forwarding entries for large flows
16 Research Issues and Current Trends – Network Management
Research Directions • Configuration Management
• Reducing the complexity of management tasks such as address configuration
• Traffic Management • Support various usage patterns of cloud applications
• Leveraging new network management paradigms such as SDN
17 Research Issues and Current Trends – Network Management
Outline
• Data Center Networks • Network Management • Resource and Performance Management • Energy Management • Pricing and Economics • Security Management • Migrating Enterprise Applications to the Cloud
18
Resource and Performance Management • A cloud computing environment hosts myriads of
applications with diverse performance objectives
• How to effectively allocate resources to applications to satisfy their performance objectives?
• Sub-problems • Performance modeling and management for each individual
application • Run-time resource management
19 Research Issues and Current Trends
Application Performance Management • An application owner needs to understand the performance model of the
application, and adjust resource requirement according to workload condition • E.g. Increase number of web server replicas to mitigate flash crowd effect
20 Research Issues and Current Trends – Resource & Performance Mgmt
Demand Prediction Controller Application
Performance Model
Output
Input
Application Performance Management (cont)
• Using probabilistic / statistical methods • Queuing Models • Machine learning
• Proactive vs. reactive Control • Proactive control uses predicted demand to allocate resources before
they are needed • Reactive control respond to immediate demand fluctuations when
prediction is not available.
21 Research Issues and Current Trends – Resource & Performance Mgmt
Data Center Resource Management • Objectives
• Mitigating performance bottleneck (i.e. hotspot) • Improving application schedulability • Improving server utilization • Improve resource sharing among applications • Reducing energy cost
• Current approach: using various virtualization techniques • Dynamically adjusting resource allocation of applications • Virtual machine migration
22 Research Issues and Current Trends – Resource & Performance Mgmt
Data Center Resource Management (cont) • Optimal placement problem is a general case of multi-
dimensional bin packing problem • NP-hard to solve
• Additional Factors • Job arrival process • Job duration • Reconfiguration procedure and cost
• E.g. cost of migration
23 Research Issues and Current Trends – Resource & Performance Mgmt
Research Directions
• Understanding application resource requirements • e.g. workload characterization, application performance analysis
• Resource management framework for data-center wide workloads
• Multi-tenancy issues • Application owner and cloud owner may have potentially conflicting
objectives
24 Research Issues and Current Trends – Resource & Performance Mgmt
Outline
• Data Center Networks • Network Management • Resource and Performance Management • Energy Management • Pricing and Economics • Security Management • Migrating Enterprise Applications to the Cloud
25
Energy Management • Reducing energy consumption is a critical objective of cloud
computing
• Power and cooling cost constitutes a large potion of datacenter expenditure • 25%-30% total data center operational cost
• Government regulations call for environment friendly (i.e. Green) data centers
26 Part 2- Research Issues and Current Trends
Cost of Consumption • Power and Cooling cost millions of dollars monthly
27 Part 2- Research Issues and Current Trends – Energy Management
Estimated Monthly Operational Expenditure of a 50k machine Data Center Source: http://perspectives.mvdirona.com/
Reducing Energy Cost • Server Consolidation
• Reducing number of servers used by turning off unused servers
• Energy-Aware scheduling • Scheduling jobs to reduce power and cooling costs
• Energy Efficient Networks • Dynamically adjust active network elements to reduce power
cost
28 Part 2- Research Issues and Current Trends – Energy Management
Server Consolidation • Consolidating application workloads on a smaller number of
servers to save server power cost
• However, consolidation increases resource contention among applications, which may hurt their performance
• Challenges • Understanding the energy and performance impact of consolidation • Devising effective policies for achieving good trade-off between
performance and power cost
29 Part 2- Research Issues and Current Trends – Energy Management
Energy-aware Workload Scheduling • Power-aware scheduling
• Schedule jobs to minimize server power consumption • E.g. leveraging Dynamic Voltage and Frequency Scaling (DVFS) to
reduce server power consumption • Thermal-aware scheduling
• Scheduling jobs to minimize overall data center temperature • E.g. scheduling jobs to reduce server temperature so as
to reduce cooling cost
30 Research Issues and Current Trends – Energy Management
Energy Efficient Networks • Objective: making data center networks energy-proportional
• Make energy cost proportional to network utilization
• Approach: Given the current network condition, dynamically adjust active network elements to reduce power cost • Powering down unneeded switches and links • Adjusting link rate
• Many modern switch models (e.g. infiniBand) can specify more than one operation range
31 Research Issues and Current Trends – Energy Management
Research Directions • Effectively leveraging latest hardware, software technologies
to achieve high energy cost reduction
• Achieving a good trade-off between performance and energy cost • E.g. Reducing CPU rate using DVFS slows down job execution
32 Research Issues and Current Trends – Energy Management
Outline
• Data Center Networks • Network Management • Resource and Performance Management • Energy Management • Pricing and Economics • Security Management • Migrating Enterprise Applications to the Cloud
33
Pricing and Economics • Cloud computing is a realization of utility computing
• Provide storage and computing resources using a usage based pricing model
• Demand is highly volatile in Cloud environments • Low resource demand causes low server utilization • High resource demand results in unsatisfied demands, which causes
customer dissatisfaction
34 Research Issues and Current Trends
Pricing and Economics (cont) • Approach: using market economy to shape demand
• Dynamically adjust resource supply and price
• Increase price when demand spikes • Ensure resources are allocated to most needing users • Provide incentive for customers to reduce demand
• Reduce price when demand is low • Incentivize customers to increase demand
35 Part 2- Research Issues and Current Trends
Amazon EC2 Spot Market • Amazon EC2 launched spot
instance service in Dec. 2009
• Price of resources fluctuates with supply and demand • Customers specify their bids in
their resource requests • A market-based mechanism
decides the final price and assign resources to customers
36 Part 2- Research Issues and Current Trends – Pricing and Economics
Price of m1.small linux spot instance in US-West-1 from Sept. 24-Sept. 30, 2010
(Source: www.cloudexchange.org)
Market-Oriented Resource Allocation • Objectives
• Truthful, fair and revenue maximizing
• Additional considerations • Support price discovery
• Providing historical prices • Easy to compute
• Solving NP-hard problems in real-time is not preferred
37 Part 2- Research Issues and Current Trends – Pricing and Economics
Research Directions • Designing and analyzing pricing schemes for cloud
computing • Satisfy all previous objectives is difficult • Most of the existing work use auction mechanisms but mostly focus
on single-round auctions • Need to understand dynamics for multi-round repeat auctions
• More general pricing scheme • Packaging • Volume discount
38 Research Issues and Current Trends – Pricing and Economics
Outline
• Data Center Networks • Network Management • Resource and Performance Management • Energy Management • Pricing and Economics • Security Management • Migrating Enterprise Applications to the Cloud
39
Security Management • Cloud customers are concerned about privacy and
confidentiality of their data and applications in the Cloud
• Security risks • Information leaking and stealing by
• Adversarial users in the cloud • Cloud providers
• Attacks within data centers • Performance interference and disruption • Denial of Service (DoS) attack
40 Research Issues and Current Trends
Security Management (cont) • Security in traditional environments
• Application owners can modify the security settings of the underlying fabric
• Security in cloud computing environment • Underlying fabric is operated by the cloud infrastructure provider • Individual application owners cannot directly modify security settings • Different stakeholders may have potentially conflicting interests
41 Research Issues and Current Trends
Security in the Cloud • Establishing trust between Cloud providers and customers
• Cloud provider continuously monitor and audit customer’s VMs • Customer Privacy enforcement through attestation
• Relying on trusted platform module (TPM) • Using non-forgeable hardware signatures to prove no non-privileged
memory access has been done
• Auditability must be mutual between providers and customers • Since both sides can be malicious
42 Research Issues and Current Trends – Security Management
Research Directions • Supporting fine grained security requirements
• Different users will have different security needs
• Eliminating source of information leakage • E.g. side channels through memory cache
• Minimizing impact of auditing on performance
43 Research Issues and Current Trends – Security Management
Outline
• Data Center Networks • Network Management • Resource and Performance Management • Energy Management • Pricing and Economics • Security Management • Migrating Enterprise Applications to the Cloud
44
Migration of Enterprise Applications • Outsourcing (or partially outsourcing) enterprise
infrastructure to the cloud is a growing trend in the industry • Reducing capital investment and maintenance cost
• Challenges • Find a cost-effective strategy for outsourcing • Integration with existing business infrastructure • Security and privacy
45 Research Issues and Current Trends
Research Directions • Selecting cloud services among multiple providers
• Evaluating service offerings in terms of performance, reliability, security and cost
• Outsourcing strategies • Determine components to be outsourced • Migration plan and policy configuration
46 Research Issues and Current Trends – Migrating Enterprise Apps
Summary • The advent of cloud computing not only brings significant benefits, but also research challenges
• Cloud computing is an active research area in networks and distributed systems • Many key issues to be resolved • Many research opportunities to be discovered
47 Research Topics in Cloud Computing