This page was originally created by GuochunShi but is now maintained primarily by DaveDykstra.
Please refer to the following papers for both introductory and detailed documentation:
AlanRobertson provided a number of HeartbeatTutorials that cover a lot of ground on HA, including HA-NFS, but are a great way to start. NFS is covered at the end of the 2008 tutorials.
AlanRobertson also wrote an article entitled Highly-Affordable High-Availability which appeared in the November, 2003 issue of the US publication Linux Magazine
High-Availability NFS Server with Linux Heartbeat (pdf) in the August, 2003 issue of the European publication Linux Magazine.
Also the How-To Setting Up A Highly Available NFS Server by Falko Timme may be helpful.
HA-NFS requires the following for transparent failover for clients:
identical /etc/exports, and
a shared /var/lib/nfs directory between the server nodes.
When an active/passive setup such as DRBD is used as shared file system, no service should hold /var/lib/nfs open during failover (or STONITH should be used with an iron fist).
Therefore, the following resources should be under HA control, i.e., configured under heartbeat (not covered here) for migration as a group between the servers:
the nfslock resource
the nfsserver resource (on some distros named simply nfs)
the resources managing rpc.gssd (client) and rpc.svcgssd (server) for the RPCSEC GSS daemon (cf. RFC 2203)
the rpc.idmapd resource
the rpc_pipefs file system mount
The device major/minor numbers are embedded in the NFS filehandle, making the filehandle go stale if the major/minor numbers change as the failover happens. If that is the case, you can either change your configuration to make it magically the same if you are using DRBD , or if you have shared disks you can use either EVMS or LVM to create a shared volume which will have the same major/minor numbers. You can download EVMS in http://evms.sourceforge.net/ and LVM in http://www.sistina.com/products_lvm.htm
Another alternative to deal with the NFS device numbering problem is afforded by later versions of NFS (commonly included with 2.6 Linux kernels). In these versions, you can specify an integer to be used in place of the major/minor of the mount device through the fsid parameter. For more details see exports(5).
For those who want a quick setup, here are the steps to prepare manual NFS failover. Once this is tested and working, you should consult Heartbeat for setting up automatic failover.
For NFS v4, short of taking down the rpc_pipefs file system mount and all resources that use it, you might keep the rpc_pipefs directory local to each server to get started. Do this before setting up the sharing of /var/lib/nfs described in the next section.
Note: This is a shortcut, which may not work in complex environments using all the services mentioned above.
The following procedure was tested on Redhat Enterprise Linux 5.2. The servers were not NFS clients, and did not use GSS. You may need to adjust details for your own distribution. The changes are to be made on both nodes.
Change all occurrences of /var/lib/nfs/rpc_pipefs to /var/lib/rpc_pipefs:
vi /etc/modprobe.d/modprobe.conf.dist /etc/idmapd.confSuggestion: keep and comment out the original lines.
Here is a simple but not bullet-proof automated version:
perl -i.orig -pe ' if (m,^[^#], and m,/var/lib/nfs/rpc_pipefs,) {print "#", $_; s,,/var/lib/rpc_pipefs,;} ' \ /etc/modprobe.d/modprobe.conf.dist \ /etc/idmapd.confThis was tested for the files from the package versions module-init-tools-3.3-0.pre3.1.37.el5 and nfs-utils-1.0.9-33.el5, respectively.
Note that /etc/modprobe.d/modprobe.conf.dist has two occurrences for sunrpc (an install and a remove line).
Take down services:
service nfslock stop service nfs stop service rpcidmapd stop /bin/umount -a -t rpc_pipefs
Move the directory:
mv /var/lib/nfs/rpc_pipefs /var/lib/
Restart services:
/bin/mount -t rpc_pipefs sunrpc /var/lib/rpc_pipefs service rpcidmapd start
The first of these two steps is the same as is normally performed by modprobe as specified in /etc/modprobe.d/modprobe.conf.dist.
Make sure you are using a shared disk device, either shared by hardware between your two servers or by DRBD.
Mount your shared device in one of the machines. For example, if the device is /dev/sdb1 and the directory you want to mount to is /hafs:
mount -t ext3 /dev/sdb1 /hafs
If the directory you will export is /hafs/data, create it if it is not there yet:
mkdir /hafs/data
Stop services (use the appropriate service names for your distro):
service nfs stop # or: nfsserver service nfslock stop # or: nfs-common
Move /var/lib/nfs to your shared disk and symlink the original location to it:
mv /var/lib/nfs /hafs ln -sn /hafs/nfs /var/lib/nfs
On the other machine, remove the /var/lib/nfs directory and create a link instead:
rm -rf /var/lib/nfs ln -sn /hafs/nfs /var/lib/nfs
If you get an error,
rm: cannot remove directory `/var/lib/nfs/rpc_pipefs/statd': Operation not permittedfollow the instructions in the previous section.
For lock recovery to succeed, you need to supply rpc.statd the "floating" (common) host name foo-ha or IP under which the HA services are to be reached.
On Debian, the locking service is started by /etc/init.d/nfs-common. You can configure it by putting into /etc/default/nfs-common:
STATDOPTS="-n foo-ha"
On Redhat/RHEL, the appropriate place to configure is /etc/sysconfig/nfs:
STATD_HOSTNAME=foo-ha
Otherwise, you could hardcode the host name in the /etc/init.d/nfslock script on many Linux systems:
start() { ... echo -n $"Starting NFS statd: " ... daemon rpc.statd "$STATDARG" -n foo-ha .... }This is of course not recommended. Rather, scan the top of the file for any configuration files that are source'd and edit those instead.
Start services, on the primary node only:
service nfs start service nfslock start
For heartbeat integration, make sure that both of these services are removed from the native init.d startup sequence and instead are placed under heartbeat control, i.e., in the haresources file for heartbeat v.1 or the CIB (v.2). The order of the services is not entirely clear. Most examples have nfslock subordinate to nfs, being started after nfs, and stopped before nfs, though some init.d configurations may have the order reversed (notably RHEL5.x).
Export the user directory. Add the following line to /etc/exports:
/hafs/data *(rw,sync)
and run
exportfs -va
Either sync /etc/exports to the other node, or symlink it on both nodes into the shared disk, perhaps into /var/lib/nfs:
mv /etc/exports /var/lib/nfs/ # one node only ln -sf /var/lib/nfs/exports /etc # both nodes
Heartbeat is set up and minimally configured to manage:
DRBD resource demotion and promotion (if used)
migration of the shared file system (/hafs in the examples above)
migration of the shared IP for host foo-ha
On a client, mount the exported file system:
mount foo-ha:/hafs/data /mnt/data
Write some files to it, and watch, e.g.:
while :; do date +"%s %c -- $i" | tee -a /mnt/data/nfstest.log; sleep 1; (( ++i )); done
On the primary server: take down NFS services:
service nfslock stop service nfs stop
Initiate heartbeat migration for all resources in the previous section, e.g., for a heartbeat v2 configuration:
crm_resource -r drbdgroup -M crm_mon; sleep 5; crm_mon
On the secondary server: bring up NFS services:
service nfslock start service nfs start
On the client, watch the above loop.
To migrate back, perform the analogous:
On the secondary server: take down NFS services:
service nfslock stop service nfs stop
On the primary server::
crm_resource -r drbdgroup -U crm_mon; sleep 5; crm_mon
... and bring up NFS services:
service nfslock start service nfs start
NFS-mounting any filesystem on your NFS servers is highly discouraged. DaveDykstra wanted both servers to NFS-mount the replicated filesystem from the active server, and through a lot of trouble mostly got it working but still saw scenarios where "NFS server not responding" could interfere with heartbeat failovers and he finally gave up on it. The biggest problem was with the fuser command hanging. For more details see the archives of a mailing list discussion from 2005 and another from 2006.
On Redhat Enterprise Linux 5.x, nfsd occasionally refuses to die upon service nfs stop. To remedy the situation, apply the following patch:
--- /etc/init.d/nfs 2008/05/24 23:02:19 1.1 +++ /etc/init.d/nfs 2008/09/19 07:30:12 @@ -109,6 +109,10 @@ echo echo -n $"Shutting down NFS daemon: " killproc nfsd -2 + + # Need to be more persuasive for HA + sleep 5 + killproc nfsd -9 echo if [ -n "$RQUOTAD" -a "$RQUOTAD" != "no" ]; then echo -n $"Shutting down NFS quotas: "
This patch is for /etc/init.d/nfs from nfs-utils-1.0.9-33.el5. Adapt as necessary for your distribution.
NFS locking is a cooperative enterprise. Lock migration is coordinated by rpc.statd(8). Locks are stored in /var/lib/nfs/statd/sm and /var/lib/nfs/statd/sm.bak. Having /var/lib/nfs on shared storage as outlined above will enable cooperative lock migration upon failover, as initiated by rpc.statd on the new server.
For debugging, rpc.statd(8) allows signal-initiated recovery:
On some kernels or in some situations, locks may not survive HA failovers without the steps above. As a workaround for those situations, it is recommended that you mount NFS filesystems with the "nolock" option. For more details see a mailing list post from 2005 and a confirmation from 2007 using a more recent kernel.
For further information about locking, which includes testing using Connectathon, see HaNFSOldLocking.
The above directions and comments apply mainly to ActivePassive arrangements. For information on ActiveActive setups, please refer to Matt Schillinger's NFS page (Note: old page).