This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1. A Highly Available Network File Server
Bhide et al. (1991)
Presented By: Anand Janjal (CS 8631)
1
2. Network File Server Reliability
The problem of network file server reliability divided into three
sub problems:
1) Server reliability
2) Disk reliability
3) Network reliability
2
3. Contd.
1) Server reliability: Dual ported disks and impersonation
Dual ported disks: Allows the drive to continue functioning when
one port becomes nonfunctional, eliminating single point of
failure
2) Disk reliability: Disk mirroring
Disk Mirroring: Disk mirroring is the replication of logical disk
volumes onto separate physical hard disks in real time to ensure
continuous availability.
3) Network reliability: Network replication
Source:
http://wiki.answers.com/Q/Single_port_hard_drive_vs_dual_port_hard_drive
http://en.wikipedia.org/wiki/Disk_mirroring
3
4. Mirroring
Fast recovery from disk failures achieved by mirroring the files on
different disks
All copies of the same file are on the disks controlled by the same
server to eliminate the overhead of ensuring consistency and
coherence between two servers
Mirroring used for applications required continuous availability,
otherwise archival backup is used
4
5. Network Failures
Network failures are tolerated by optional replication of network
components, including the transmission medium
Packets are NOT replicated over the two networks
Network load is distributed over the networks
5
6. Contd.
Reliability in NFS by server replication suffers from resource
overhead, performance degradation and increased complexity
Replicated servers use expensive protocols to maintain consistency
and coherence leading to performance degradation
Uses complex protocols to update the state of the stale replica
when repaired after the failure
Handling network partition requires quorum management which
increases the system complexity
6
7. HA NFS
HA-NFS adheres the semantics of Sun NFS.
Server failures are tolerated by using dual ported disks accessible
to the two servers each acting as backup for other.
Disks are divided into sets, each served by one server during
normal operation
Each server maintains on its disk enough information to reconstruct
its current volatile state
Servers exchange liveness checking messages
7
8. Design goals of HA NFS
1) Failure and recovery transparent to the applications running on
file servers clients; failure must not force operation in progress
to terminate
2) Failure free performance must not be penalized to provide high
availability
3) NFS client protocol implementation should not require
modification to use HA NFS servers
8
9. Contd.
HA NFS implemented on top of AIXv3 journaled file system
AIXv3 provides serializable and atomic modification of the file
system meta data by using transactional locking and logging
techniques.
In the event of failure, meta data are restored to a consistent
state by applying the changes contained in the log
Reliability of files ensured by NFS semantics: Forcing data to disk
before sending an acknowledgement to the client
9
10. Contd.
AIXv3 supports logical volumes, which can be mirrored to provide
the disk reliability.
Though NFS is a stateless file server protocol, most
implementations maintain a small amount of the state
information
NFS server maintains a reply cache to maintain successful, non
idempotent RPC
HA NFS records changes to volatile state of AIXv3 disk log, to
reconstruct the reply cache in the event of failure
10
11. HA NFS Architecture
Consists of two NFS servers sharing a number of SCSI buses
Each shared SCSI bus and the disk connected to it have a designated
primary server (to balance the load across the servers)
During normal operation, disks are served by their corresponding
primary server
Each server has two network interfaces and IP addresses
Primary interface normal operation; secondary interface-
impersonation of the other server during the failure
Figure 1- Where is it?
11
12. Normal operation
Server performs the operation described in each NFS RPC it
receives
Upon success, the meta data changes are recorded to the AIXv3
log
An entry added in the reply cache for the RPC
Upon failure, server checks if there is an entry in the reply cache
corresponding to the RPC
If an entry is found, then RPC is a retry of the non idempotent
operation succeeded before; else server replies with an error code
to the client
12
13. Take over
If a server fails, disks are taken over by the other server
The server uses the log to retrieve the reply cache entries of the
failed server
Impersonate the failed server by changing its secondary network
interface to the primary address of the failed server
Packets destined for the failed server will be received by the live
server on its secondary interface
13
14. Alternative method
If network interfaces that can change their hardware addresses are
not available, then ARP protocol is used
HA NFS sends ARP request to query the hardware address
This query appears to have been sent by the failed servers IP
address, but the source address of the secondary interface of the
live server.
14
15. Re- Integration
When a server comes up, it has its primary network interface turned
off and sends a reintegration request with the secondary network
interface to the backup server
Two servers periodically check the status through liveness messages
till the second server reintegrates itself into the system
15
16. Network Failure
Replicates the network to tolerate the network failure
Recovery from the server failure does not require any changes to
the client
Recovery from the network failure requires a daemon to run on the
client to observe the status of each network and reroute the
requests to the operational network
Where is figure 2?
16
17. Contd.
Sever broadcasts heart beat messages
When the daemon on the client does not receive the heart beat
message after a time out period, it concludes that the path to the
primary interface to the server is broken
Daemon updates the clients routing table to use the alterative path
to the server
17
18. Performance
Performance of HA NFS measured by running a set of experiments on a
number of RISC System/6000 family workstations connected by 10
Mbit/sec Ethernet
The underlying system uses 4 Kbyte disk blocks
18
19. Effect of disk logging
Comparison between HA NFS and a traditional implementation of NFS
that doesnt use disk logging
NFS forces data and meta data to the disk before responding to
RPC
HA NFS records meta data modification as a log record and requires
no disk arm movement
Reply cache entries are piggybacked on the normal disk log
information, saving the volatile state on the disk does not incur
additional overhead
Disk logging improves the response time of all RPCs that modify the
file system structure
19
20. Performance of HA NFS
20
21. Contd.
Disk logging improvement ranges from 33% for SETATTR and WRITE RPC
and up to 75% for MKDIR RPC.
Placing the log on the same disk reduces the performance due to
additional disk arm movement
21
22. Contd.
The overhead introduced due to mirroring is 17% slow down for WRITE
RPC
This is due to variation in the disk arm position among the
mirrors
It takes 15 seconds for a backup to perform all tasks related to
take over (excluding failure detection), and 30 seconds including
the failure detection
It takes 60 seconds for a server to reintegrate into the system
after repair/maintenance
22
23. Conclusions/Future work
Replicated file servers are well suited for WAN where a client can
access the file from the nearest replica
HA NFS offers server reliability by using dual ported disks, and
impersonation, disk reliability by using mirroring and network
reliability by replication
Impersonation prevents the client from hanging during failure
HA NFS is not flexible; can not tolerate more than one server
failure
During failure, disks are not available for a period of 30
seconds
Servers must be physically close due to restriction on the length
of the SCSI bus (use of optical links is in consideration)
23