Availability of Network Application Services Candidacy Exam Jong Yul Kim February 3, 2010
Dec 31, 2015
Availability of Network Application Services
Candidacy Exam
Jong Yul Kim
February 3, 2010
Motivation
Critical systems are being deployed on the internet, for example: Next Generation 9-1-1 Business transactions Smart grid
How do we enhance the availability of these critical services?
Scope
What can application service providers do to enhance end-to-end service availability?
The problems are that: The underlying internet is unreliable. Servers fail.
X X
Scope
Level of abstraction: system design On top of networks as clouds (but we can probe it.) Using servers as nodes
The following are important for availability, but not in scope: Techniques that ISPs use. Techniques to enhance availability of
individual components of the system. Defense against security vulnerability attacks.
Contents① ②
③
The underlying internet is unreliable. Symptoms of unavailability in the application
Web : “Unable to connect to server.” VoIP : call establishment failure, call drop[2]
Symptoms of network during unavailability High percentage of packet loss Long bursts of packet loss[1,2]
23% of lost packets belong to outages[2]
Packet delay[2]
1 Dahlin et al. End-to-End WAN Service Availability. IEEE/ACM TON 2003.2 Jiang and Schulzrinne. Assessment of VoIP Service Availability in the Current Internet. PAM 2003.
The underlying internet is unreliable. Network availability figures
Paths to popular web servers: 99.6%[1]
Call success probability: 99.53%[2]
1 Gummadi et al. Improving the Reliability of Internet Paths with One-hop Source Routing. OSDI 2004.2 Jiang and Schulzrinne. Assessment of VoIP Service Availability in the Current Internet. PAM 2003.
The underlying internet is unreliable. Possible causes of these symptoms
Network congestion BGP updates
90% of lost packets near BGP updates are part of a burst of 1500 packets or longer. At least 30 seconds of continuous loss.[1]
Intra-domain link failure[1,2]
Software bugs in routers[2]
1 Kushman et al. Can you hear me now?!: it must be BGP. SIGCOMM CCR 2007.2 Jiang and Schulzrinne. Assessment of VoIP Service Availability in the Current Internet. PAM 2003.
The underlying internet is unreliable. Challenges
Quick re-routing around network failures With limited control of the network
Contents
Resilient Overlay Network[1]
An overlay of cooperative nodes inside the network Each node is like a router
It has its own forwarding table Link-state algorithm used to construct map of the RON network
They actively probe the path between themselves Fully meshed = Fast detection of path outages
1 Andersen et al. Resilient Overlay Networks. SOSP 2001.
Resilient Overlay Network
Outage defined as outage(r,p) = 1 when Observed packet loss rate
averaged over an interval r is larger than p on the path.
“RON Win” defined as when average loss rate on
the Internet ≥ p, RON loss rate < p
Results on right r = 30 minutes
1 Andersen et al. Resilient Overlay Networks. SOSP 2001.
RON1
12 nodes>64 hours
RON2
16 nodes>85 hours
Resilient Overlay Network
Merits Re-routes quickly around
network failures
Good recovery from core network failures
Limitations Does not recover from edge
failures or last-hop failures
Active probing is expensive Trades off scalability for
reliability[1]
Clients and servers cannot use RON directly.
Need to have a strategy to deploy RON nodes. May violate ISP contracts
1 Andersen et al. Resilient Overlay Networks. SOSP 2001.
Contents
One-Hop Source Routing
Main idea is to try another path through an intermediary if mine does not work. “Spray and pray” Issue a random packet to 4 others every 5 seconds and
see if they succeed (called random-4)
Intermediaries
1 Gummadi et al. Improving the Reliability of Internet Paths with One-hop Source Routing. OSDI 2004.
X
One-Hop Source Routing
Results
1 Gummadi et al. Improving the Reliability of Internet Paths with One-hop Source Routing. OSDI 2004..
One-Hop Source Routing
Results
1 Gummadi et al. Improving the Reliability of Internet Paths with One-hop Source Routing. OSDI 2004..
One-Hop Source Routing
Merits Simple, stateless
Good for applications where users are tolerant of faults
Good for rerouting around core network failures
Limitations Not always guaranteed to
find alternate path
Cannot reroute around edge network failures
Intermediaries How does the client find
them? Where to place them? Will they cooperate?
1 Gummadi et al. Improving the Reliability of Internet Paths with One-hop Source Routing. OSDI 2004..
Contents
Multi-homing
Use multiple ISPs to allow network path redundancy at the edge network.
1 Akella et al. A Comparison of Overlay Routing and Multihoming Route Control. SIGCOMM 2004.
Figure: Availability gains by adding best 2 and 3 ISPs based on RTT performance. (Overlay routing achieved 100% availability during the 5-day testing period.)[1]
Multi-homing
Availability depends on choosing:[1]
Good upstream ISPs ISPs that do not have
path overlaps upstream Hints at the need for path
diversity
Some claim that gains are comparable to overlay routing.[2]
1 Akella et al. A Measurement-Based Analysis of Multihoming. SIGCOMM 2003.2 Akella et al. A Comparison of Overlay Routing and Multihoming Route Control. SIGCOMM 2004.
Resilient Overlay Network One-hop source routing Multihoming
Recovery mechanism
Re-route through another node with path to dest.
Spray to other nodes and hope for best
Route control
Recoverable failure location
Core network Core network Edge network
Management overhead
Costly: active measurements needed
Requestor: none
Intermediary: relay Measurements on links
Recovery time seconds Seconds N/A
Path guarantee Deterministic Probabilistic N/A
Scalability 2 ~ 50 nodes Scales well Does not scale well
Client modification Yes Yes No
Supported Applications
All IP-based services All IP-based services All IP-based services
Limitations
Not directly usable by clients and servers May violate ISP contracts
Bootstrap problem Need to find good ISP Non-overlapping paths Probably will suffer from slow BGP convergence
Contents②
Server failures.
Hardware failures More use of commodity hardware and components
“Well designed and manufactured HW: >1% fail/year.”[1]
Software failures Concurrent programs are prevalent and so are bugs.
Cases of introducing another bug while trying to fix one.[1]
Environmental failures Electricity outage Operator error
1 Patterson. Recovery oriented computing: A new research agenda for a new century. Keynote address, HPCA 2002.2 Lu et al. Learning from Mistakes – A comprehensive study on real world concurrency bug characteristics . ASPLOS 2008.
Contents
One-site Clustering
Three models in organizing servers in a cluster[1]
Cluster-based Web system Virtual Web cluster Distributed Web system
1 Cardellini et al. The State of the Art in Locally Distributed Web-Server Systems. ACM Computing Survey 2002.
one Virtual IP address exposed to client
one Virtual IP address shared by all servers
IP addresses of servers exposed to client
Cluster-based Web System
One IP address exposed to client
Request routing By the web switch
Server selection Content-aware Client-aware Server-aware
1 Cardellini et al. The State of the Art in Locally Distributed Web-Server Systems. ACM Computing Survey 2002.
Virtual Web System
All servers share same IP address
Request routing All servers have same
MAC address Or layer 2 multicast
Server selection By hash function
1 Cardellini et al. The State of the Art in Locally Distributed Web-Server Systems. ACM Computing Survey 2002.
Distributed Web System
Multiple IP addresses exposed
Request routing Primary by DNS Secondary by server
Server selection By DNS
1 Cardellini et al. The State of the Art in Locally Distributed Web-Server Systems. ACM Computing Survey 2002.
Effects of design on service availability
Cluster-based Virtual Web Distributed Web
Server selection Web switch Hash function Authoritative DNS,
Servers
Request routing IP or MAC addr MAC uni-/multi-cast IP address
Load Balancing Web switch None Using DNS
Group membership Not necessary Not possible (Need another network)
Possible
Single Point Of Failure
Web switch
(can be redundant)
None
(all servers)
DNS server
(can be redundant)
Failover method Web switch decides
N/A Takeover IP using ARP
Fault detection Heart beat N/A Heart beat
Other limitations None Hash function is static
DNS records are cached by client
Improving Availability of PRESS[1]
PRESS Web Server Cooperative nodes Serve from cache instead of
disk All servers have a map of
where cached files are
Availability mechanism Servers are organized in a ring
structure. Each sends heartbeat to next one in the ring.
Three heartbeats missing: predecessor has crashed.
Node restarts – broadcasts its IP address to join cluster again.
1 Nagaraja et al. Quantifying and Improving the Availability of High-Performance Cluster-Based Internet Services . SC 2003.
Distributed Web Model
Improving Availability of PRESS[1]
To enhance availability1. Add front-end node to
mask failures Layer 4 switch (LVS) with
IP tunneling
2. Robust group membership
3. Application-level heart beats
4. Fault Model Enforcement If there’s fault, crash whole
node and restart.
99.5% 99.99% Cluster-based model!
1 Nagaraja et al. Quantifying and Improving the Availability of High-Performance Cluster-Based Internet Services . SC 2003.
Cluster-based Web Model!
Effects of design on service availability
Cluster-based Virtual Web Distributed Web
Server selection Web switch Hash function Authoritative DNS,
Servers
Request routing IP or MAC addr MAC uni-/multi-cast IP address
Load Balancing Web switch None Using DNS
Group membership Not necessary Not possible (Need another network)
Possible
Single Point Of Failure
Web switch
(can be redundant)
None
(all servers)
DNS server
(can be redundant)
Failover method Web switch decides
N/A Takeover IP using ARP
Fault detection Heart beat N/A Heart beat
Other limitations None Hash function is static
DNS records are cached by client
Best choice!
Contents
Reliable Server Pooling (RSerPool)
“RSerPool provides an application-independent set of services and protocols for building fault-tolerant and highly-available client/server applications.”[1]
Specifies behavior of registrar, server, and client
Two protocols designed over SCTP Aggregate Server Access Protocol (ASAP)
Used between server - registrar and client - registrar
Endpoint haNdlespace Redundancy Protocol (ENRP) Used among registrars
1 Lei et al. An Overview of Reliable Server Pooling Protocols. RFC 5351. IETF 2008.
Reliable Server Pooling (RSerPool) • Servers can add itself by registering
to a registrar. That registrar becomes the home registrar.• Home registrar probes server status.
• Clients cache list of servers from registrar.• Clients also report failures to registrar.
Each registrar is home registrar to a set of servers. Each maintains a complete view of pools by synchronizing with each other.
Servers provide service through normal application protocol such as HTTP.
Registrars
Announce themselves to servers, clients, and other registrars using IP multicast or by static configuration.
Share all knowledge about pool with other registrars All server updates (registration, re-registration, deregistration)
are announced by home registrar. Maintain connections to peer registrars using SCTP Checksum mechanism used to audit consistency of
handlespace.
Keeps the number of unreachability reports for each server in its pool and if some threshold is reached, the server is removed from the pool.
Fault detection mechanisms[1]
Failed Component Detection by Detection Mechanism
Client Server Application-specific
Client Registrar Broken connection
Server Client Application-specific
Server Registrar ASAP Keep-Alive Timeout
Server Registrar ASAP Endpoint Unreachable
Registrar Client ASAP Request Timeout
Registrar Server ASAP Request Timeout
Registrar Registrar ENRP Presence Timeout
Registrar Registrar ENRP Request Timeout
1 Dreibholz. Reliable Server Pooling – Evaluation, Optimization, and Extension of a Novel IETF Architecture . Ph.D. Thesis 2007.
Reliability mechanisms[1]
Registrar takeover Each registrar already has a complete view
Registrars volunteer to take over the failed one
Conflict resolved by comparing registrar ID (lowest one wins)
Send probes to servers under failed registrar
1 Dreibholz. Reliable Server Pooling – Evaluation, Optimization, and Extension of a Novel IETF Architecture . Ph.D. Thesis 2007.
Reliability mechanisms[1]
Server failover Uses client-side state-keeping mechanism
A state cookie is sent to the client via control channel. The cookie serves as a checkpoint.
When server fails, client connects to a new server and sends the state cookie. The new server picks up from that state.
Uses ASAP session layer between client and server.
Data and control channel are multiplexed in over a single connection to enforce correct order of transaction and cookie.
1 Dreibholz. Reliable Server Pooling – Evaluation, Optimization, and Extension of a Novel IETF Architecture . Ph.D. Thesis 2007.
Evaluation of RSerPool
Merits Includes the client in system design
Every element keeps an eye on each other
There are mechanisms for registrar takeover and server failover
Multiple layers of failure detection. (e.g. SCTP level and application level.)
Limitations The client doesn’t do much to help
when unavailability problems occur
Too much overhead in maintaining consistent view of the pools. Not scalable to many servers.
Use of IP multicast confines deployment of registrars and servers to multicast domain. Most likely, clients would have to be statically configured.
Assumes that client connectivity problem is a server problem. Servers at one site will all get dropped if there’s a big network problem.
Contents
③
Distributed Clusters
Distributed clusters Intrinsically has both
network path and server redundancy
How can we utilize these redundancies?
Study of availability techniques of Content Distribution
Networks Domain Name System
Contents
Akamai vs. Limelight[1]
Akamai Philosophy
Go closer to client Scatter small clusters all
over the place
Scale 27,000 servers 65 countries 656 ASes
Two level DNS
Limelight Philosophy
Go closer to ISPs Large clusters at few key
locations near many ISPs
Scale 4,100 servers 18 countries Has own AS
Flat DNS, uses anycast1 Huang et al. Measuring and Evaluating Large-Scale CDNs. IMC 2008.
Akamai vs. Limelight
Methodology Connect to port 80 once
every hours. Failure = two consecutive
connection error.
Results Server and cluster
availability is higher for Limelight.
But service availability may be different!
1 Huang et al. Measuring and Evaluating Large-Scale CDNs. IMC 2008.
Akamai Failure Model
Failure model“We assume that a significant and constantly
changing number of components or other failures occur at all times in the network.”[1]
Components: link, machine, rack, data center, multi-network…
This leads having small clusters scattered in diverse ISPs and geographic locations.
1 Afergan et al. Experience with some Principles for Building an Internet-Scale Reliable System. USENIX WORLDS 2005.
Akamai: scalability and availability Use two-level DNS to direct clients to server
Top level directs clients to a region e.g. g.akamai.net
Region resolves lower level queries.
Top level returns low-level name servers in multiple regions Low level returns short DNS TTL (20 seconds) Servers use ARP to takeover a failed server
Use internet for inter-cluster communication Uses multi-path routing that’s directed by SW logic
= overlay network?1 Afergan et al. Experience with some Principles for Building an Internet-Scale Reliable System. USENIX WORLDS 2005.
Contents
DNS Root Servers[1,2]
Redundancy Redundant hardware that takes over failed one with or
without human intervention At least 3 recommended, with one in a remote site[3]
Backups of the zone file stored at off-site locations Connectivity to the internet
Diversity Geographically located in 130 places in 53 countries
Topological diversity matters more Hardware, software, operating system of servers Diverse organizations, personnel, operational processes Distribution of zone files within root server operator
1 Bush et al. Root Name Server Operational Requirements. RFC 2870. IETF 2000.2 http://www.icann.org/en/committees/security/dns-security-update-1.htm3 Elz et al. Selection and Operation of Secondary DNS Servers. RFC 2182. IETF 1997.
The use of anycast for availability[1]
Basic anycast Announce identical IP
address Routing system takes client
request to closest node
Hierarchical anycast Global vs. local nodes If any node fails, stop
announcement Global node takes over
automatically
1 Abley, Hierarchical Anycast for Global Service Distribution. ISC Technical Note 2003-1. 2003.
Is anycast good for everyone?[1]
Not really…
Packets for long sessions may go to another node if the routing dynamics change Service time and stability of routing
A lot of routing considerations Aggregated prefixes Multiple services from a prefix Consideration of route propagation radius
1 Abley and Lindqvist, Operation of Anycast Services. RFC 4786. IETF 2006.
Contents
Conclusion
Techniques presented here attempt to recover from either network failure or server failure through redundancy There is redundancy in network path Server redundancy can be handled
An application service provider may have to use a combination of these techniques.
Conclusion
CDN, DNS has been good so far… Can these be applied to other applications?
System designed for availability is only as good as the failure model.
Further research on availability Few techniques Not so much about availability in literature.