Availability of Network Application Services

Availability of Network Application Services

Candidacy Exam

Jong Yul Kim

February 3, 2010

Motivation

Critical systems are being deployed on the internet, for example: Next Generation 9-1-1 Business transactions Smart grid

How do we enhance the availability of these critical services?

Scope

What can application service providers do to enhance end-to-end service availability?

The problems are that: The underlying internet is unreliable. Servers fail.

X X

Scope

Level of abstraction: system design On top of networks as clouds (but we can probe it.) Using servers as nodes

The following are important for availability, but not in scope: Techniques that ISPs use. Techniques to enhance availability of

individual components of the system. Defense against security vulnerability attacks.

Contents① ②

③

The underlying internet is unreliable. Symptoms of unavailability in the application

Web : “Unable to connect to server.” VoIP : call establishment failure, call drop[2]

Symptoms of network during unavailability High percentage of packet loss Long bursts of packet loss[1,2]

23% of lost packets belong to outages[2]

Packet delay[2]

1 Dahlin et al. End-to-End WAN Service Availability. IEEE/ACM TON 2003.2 Jiang and Schulzrinne. Assessment of VoIP Service Availability in the Current Internet. PAM 2003.

The underlying internet is unreliable. Network availability figures

Paths to popular web servers: 99.6%[1]

Call success probability: 99.53%[2]

1 Gummadi et al. Improving the Reliability of Internet Paths with One-hop Source Routing. OSDI 2004.2 Jiang and Schulzrinne. Assessment of VoIP Service Availability in the Current Internet. PAM 2003.

The underlying internet is unreliable. Possible causes of these symptoms

Network congestion BGP updates

90% of lost packets near BGP updates are part of a burst of 1500 packets or longer. At least 30 seconds of continuous loss.[1]

Intra-domain link failure[1,2]

Software bugs in routers[2]

1 Kushman et al. Can you hear me now?!: it must be BGP. SIGCOMM CCR 2007.2 Jiang and Schulzrinne. Assessment of VoIP Service Availability in the Current Internet. PAM 2003.

The underlying internet is unreliable. Challenges

Quick re-routing around network failures With limited control of the network

Contents

Resilient Overlay Network[1]

An overlay of cooperative nodes inside the network Each node is like a router

It has its own forwarding table Link-state algorithm used to construct map of the RON network

They actively probe the path between themselves Fully meshed = Fast detection of path outages

1 Andersen et al. Resilient Overlay Networks. SOSP 2001.

Resilient Overlay Network

Outage defined as outage(r,p) = 1 when Observed packet loss rate

averaged over an interval r is larger than p on the path.

“RON Win” defined as when average loss rate on

the Internet ≥ p, RON loss rate < p

Results on right r = 30 minutes


RON1

12 nodes>64 hours

RON2

16 nodes>85 hours

Resilient Overlay Network

Merits Re-routes quickly around

network failures

Good recovery from core network failures

Limitations Does not recover from edge

failures or last-hop failures

Active probing is expensive Trades off scalability for

reliability[1]

Clients and servers cannot use RON directly.

Need to have a strategy to deploy RON nodes. May violate ISP contracts


Contents

One-Hop Source Routing

Main idea is to try another path through an intermediary if mine does not work. “Spray and pray” Issue a random packet to 4 others every 5 seconds and

see if they succeed (called random-4)

Intermediaries

1 Gummadi et al. Improving the Reliability of Internet Paths with One-hop Source Routing. OSDI 2004.

X


Results

1 Gummadi et al. Improving the Reliability of Internet Paths with One-hop Source Routing. OSDI 2004..


Results



Merits Simple, stateless

Good for applications where users are tolerant of faults

Good for rerouting around core network failures

Limitations Not always guaranteed to

find alternate path

Cannot reroute around edge network failures

Intermediaries How does the client find

them? Where to place them? Will they cooperate?


Contents

Multi-homing

Use multiple ISPs to allow network path redundancy at the edge network.

1 Akella et al. A Comparison of Overlay Routing and Multihoming Route Control. SIGCOMM 2004.

Figure: Availability gains by adding best 2 and 3 ISPs based on RTT performance. (Overlay routing achieved 100% availability during the 5-day testing period.)[1]

Multi-homing

Availability depends on choosing:[1]

Good upstream ISPs ISPs that do not have

path overlaps upstream Hints at the need for path

diversity

Some claim that gains are comparable to overlay routing.[2]

1 Akella et al. A Measurement-Based Analysis of Multihoming. SIGCOMM 2003.2 Akella et al. A Comparison of Overlay Routing and Multihoming Route Control. SIGCOMM 2004.

Resilient Overlay Network One-hop source routing Multihoming

Recovery mechanism

Re-route through another node with path to dest.

Spray to other nodes and hope for best

Route control

Recoverable failure location

Core network Core network Edge network

Management overhead

Costly: active measurements needed

Requestor: none

Intermediary: relay Measurements on links

Recovery time seconds Seconds N/A

Path guarantee Deterministic Probabilistic N/A

Scalability 2 ~ 50 nodes Scales well Does not scale well

Client modification Yes Yes No

Supported Applications

All IP-based services All IP-based services All IP-based services

Limitations

Not directly usable by clients and servers May violate ISP contracts

Bootstrap problem Need to find good ISP Non-overlapping paths Probably will suffer from slow BGP convergence

Contents②

Server failures.

Hardware failures More use of commodity hardware and components

“Well designed and manufactured HW: >1% fail/year.”[1]

Software failures Concurrent programs are prevalent and so are bugs.

Cases of introducing another bug while trying to fix one.[1]

Environmental failures Electricity outage Operator error

1 Patterson. Recovery oriented computing: A new research agenda for a new century. Keynote address, HPCA 2002.2 Lu et al. Learning from Mistakes – A comprehensive study on real world concurrency bug characteristics . ASPLOS 2008.

Contents

One-site Clustering

Three models in organizing servers in a cluster[1]

Cluster-based Web system Virtual Web cluster Distributed Web system

1 Cardellini et al. The State of the Art in Locally Distributed Web-Server Systems. ACM Computing Survey 2002.

one Virtual IP address exposed to client

one Virtual IP address shared by all servers

IP addresses of servers exposed to client

Cluster-based Web System

One IP address exposed to client

Request routing By the web switch

Server selection Content-aware Client-aware Server-aware


Virtual Web System

All servers share same IP address

Request routing All servers have same

MAC address Or layer 2 multicast

Server selection By hash function


Distributed Web System

Multiple IP addresses exposed

Request routing Primary by DNS Secondary by server

Server selection By DNS


Effects of design on service availability

Cluster-based Virtual Web Distributed Web

Server selection Web switch Hash function Authoritative DNS,

Servers

Request routing IP or MAC addr MAC uni-/multi-cast IP address

Load Balancing Web switch None Using DNS

Group membership Not necessary Not possible (Need another network)

Possible

Single Point Of Failure

Web switch

(can be redundant)

None

(all servers)

DNS server

(can be redundant)

Failover method Web switch decides

N/A Takeover IP using ARP

Fault detection Heart beat N/A Heart beat

Other limitations None Hash function is static

DNS records are cached by client

Improving Availability of PRESS[1]

PRESS Web Server Cooperative nodes Serve from cache instead of

disk All servers have a map of

where cached files are

Availability mechanism Servers are organized in a ring

structure. Each sends heartbeat to next one in the ring.

Three heartbeats missing: predecessor has crashed.

Node restarts – broadcasts its IP address to join cluster again.

1 Nagaraja et al. Quantifying and Improving the Availability of High-Performance Cluster-Based Internet Services . SC 2003.

Distributed Web Model

Improving Availability of PRESS[1]

To enhance availability1. Add front-end node to

mask failures Layer 4 switch (LVS) with

IP tunneling

2. Robust group membership

3. Application-level heart beats

4. Fault Model Enforcement If there’s fault, crash whole

node and restart.

99.5% 99.99% Cluster-based model!

1 Nagaraja et al. Quantifying and Improving the Availability of High-Performance Cluster-Based Internet Services . SC 2003.

Cluster-based Web Model!

Effects of design on service availability

Cluster-based Virtual Web Distributed Web

Server selection Web switch Hash function Authoritative DNS,

Servers

Request routing IP or MAC addr MAC uni-/multi-cast IP address

Load Balancing Web switch None Using DNS

Group membership Not necessary Not possible (Need another network)

Possible

Single Point Of Failure

Web switch

(can be redundant)

None

(all servers)

DNS server

(can be redundant)

Failover method Web switch decides

N/A Takeover IP using ARP

Fault detection Heart beat N/A Heart beat

Other limitations None Hash function is static

DNS records are cached by client

Best choice!

Contents

Reliable Server Pooling (RSerPool)

“RSerPool provides an application-independent set of services and protocols for building fault-tolerant and highly-available client/server applications.”[1]

Specifies behavior of registrar, server, and client

Two protocols designed over SCTP Aggregate Server Access Protocol (ASAP)

Used between server - registrar and client - registrar

Endpoint haNdlespace Redundancy Protocol (ENRP) Used among registrars

1 Lei et al. An Overview of Reliable Server Pooling Protocols. RFC 5351. IETF 2008.

Reliable Server Pooling (RSerPool) • Servers can add itself by registering

to a registrar. That registrar becomes the home registrar.• Home registrar probes server status.

• Clients cache list of servers from registrar.• Clients also report failures to registrar.

Each registrar is home registrar to a set of servers. Each maintains a complete view of pools by synchronizing with each other.

Servers provide service through normal application protocol such as HTTP.

Registrars

Announce themselves to servers, clients, and other registrars using IP multicast or by static configuration.

Share all knowledge about pool with other registrars All server updates (registration, re-registration, deregistration)

are announced by home registrar. Maintain connections to peer registrars using SCTP Checksum mechanism used to audit consistency of

handlespace.

Keeps the number of unreachability reports for each server in its pool and if some threshold is reached, the server is removed from the pool.

Fault detection mechanisms[1]

Failed Component Detection by Detection Mechanism

Client Server Application-specific

Client Registrar Broken connection

Server Client Application-specific

Server Registrar ASAP Keep-Alive Timeout

Server Registrar ASAP Endpoint Unreachable

Registrar Client ASAP Request Timeout

Registrar Server ASAP Request Timeout

Registrar Registrar ENRP Presence Timeout

Registrar Registrar ENRP Request Timeout

1 Dreibholz. Reliable Server Pooling – Evaluation, Optimization, and Extension of a Novel IETF Architecture . Ph.D. Thesis 2007.

Reliability mechanisms[1]

Registrar takeover Each registrar already has a complete view

Registrars volunteer to take over the failed one

Conflict resolved by comparing registrar ID (lowest one wins)

Send probes to servers under failed registrar


Reliability mechanisms[1]

Server failover Uses client-side state-keeping mechanism

A state cookie is sent to the client via control channel. The cookie serves as a checkpoint.

When server fails, client connects to a new server and sends the state cookie. The new server picks up from that state.

Uses ASAP session layer between client and server.

Data and control channel are multiplexed in over a single connection to enforce correct order of transaction and cookie.


Evaluation of RSerPool

Merits Includes the client in system design

Every element keeps an eye on each other

There are mechanisms for registrar takeover and server failover

Multiple layers of failure detection. (e.g. SCTP level and application level.)

Limitations The client doesn’t do much to help

when unavailability problems occur

Too much overhead in maintaining consistent view of the pools. Not scalable to many servers.

Use of IP multicast confines deployment of registrars and servers to multicast domain. Most likely, clients would have to be statically configured.

Assumes that client connectivity problem is a server problem. Servers at one site will all get dropped if there’s a big network problem.

Contents

③

Distributed Clusters

Distributed clusters Intrinsically has both

network path and server redundancy

How can we utilize these redundancies?

Study of availability techniques of Content Distribution

Networks Domain Name System

Contents

Akamai vs. Limelight[1]

Akamai Philosophy

Go closer to client Scatter small clusters all

over the place

Scale 27,000 servers 65 countries 656 ASes

Two level DNS

Limelight Philosophy

Go closer to ISPs Large clusters at few key

locations near many ISPs

Scale 4,100 servers 18 countries Has own AS

Flat DNS, uses anycast1 Huang et al. Measuring and Evaluating Large-Scale CDNs. IMC 2008.

Akamai vs. Limelight

Methodology Connect to port 80 once

every hours. Failure = two consecutive

connection error.

Results Server and cluster

availability is higher for Limelight.

But service availability may be different!

1 Huang et al. Measuring and Evaluating Large-Scale CDNs. IMC 2008.

Akamai Failure Model

Failure model“We assume that a significant and constantly

changing number of components or other failures occur at all times in the network.”[1]

Components: link, machine, rack, data center, multi-network…

This leads having small clusters scattered in diverse ISPs and geographic locations.

1 Afergan et al. Experience with some Principles for Building an Internet-Scale Reliable System. USENIX WORLDS 2005.

Akamai: scalability and availability Use two-level DNS to direct clients to server

Top level directs clients to a region e.g. g.akamai.net

Region resolves lower level queries.

Top level returns low-level name servers in multiple regions Low level returns short DNS TTL (20 seconds) Servers use ARP to takeover a failed server

Use internet for inter-cluster communication Uses multi-path routing that’s directed by SW logic

= overlay network?1 Afergan et al. Experience with some Principles for Building an Internet-Scale Reliable System. USENIX WORLDS 2005.

Contents

DNS Root Servers[1,2]

Redundancy Redundant hardware that takes over failed one with or

without human intervention At least 3 recommended, with one in a remote site[3]

Backups of the zone file stored at off-site locations Connectivity to the internet

Diversity Geographically located in 130 places in 53 countries

Topological diversity matters more Hardware, software, operating system of servers Diverse organizations, personnel, operational processes Distribution of zone files within root server operator

1 Bush et al. Root Name Server Operational Requirements. RFC 2870. IETF 2000.2 http://www.icann.org/en/committees/security/dns-security-update-1.htm3 Elz et al. Selection and Operation of Secondary DNS Servers. RFC 2182. IETF 1997.

The use of anycast for availability[1]

Basic anycast Announce identical IP

address Routing system takes client

request to closest node

Hierarchical anycast Global vs. local nodes If any node fails, stop

announcement Global node takes over

automatically

1 Abley, Hierarchical Anycast for Global Service Distribution. ISC Technical Note 2003-1. 2003.

Is anycast good for everyone?[1]

Not really…

Packets for long sessions may go to another node if the routing dynamics change Service time and stability of routing

A lot of routing considerations Aggregated prefixes Multiple services from a prefix Consideration of route propagation radius

1 Abley and Lindqvist, Operation of Anycast Services. RFC 4786. IETF 2006.

Contents

Conclusion

Techniques presented here attempt to recover from either network failure or server failure through redundancy There is redundancy in network path Server redundancy can be handled

An application service provider may have to use a combination of these techniques.

Conclusion

CDN, DNS has been good so far… Can these be applied to other applications?

System designed for availability is only as good as the failure model.

Further research on availability Few techniques Not so much about availability in literature.

Availability of Network Application Services

Documents

current internet

reliability of internet

resilient overlay networks

drop2symptoms of network

packet loss rate

application service

ron loss rate presults

ron nodes