How to setup redundant NFS using DRBD and Heartbeat on Linux Easily – Centos Fedora SuSe Redhat RHEL

Hi all, hope you are enjoying series of articles published on this website for free.. totally free.

In this article you will learn how to setup redundant NFS servers using DRBD (Distributed-Replicated-Block-Devices) Technology. Complete step-by-step procedure is listed below. It works for me so hopefully it will work for your also. No gurantees are given.

1. Two servers with similar storage disk (harddrive) setup (To create a redundant nfs server)
2. One client system/server where the nfs share will be mounted.
3. Static IPs for all servers.

First step is to install CentOS on both machines. During the install process, create a separate blank partition on both machines to be used as your nfs mount. Make sure you creat exact same size partitions on both servers. Set the mount point to /nfsdata during installation.

From this point on i’m going to be referring to both nfs servers by their IPs and hostnames.
server1 will be nfs1 with ip and server2 will be nfs2 with ip Your may use different range of private IPs, so make sure to put in the correct IPs where necessary within this how-to.

Do the following on nfs1( and nfs2(

To view mount points on your system:

vi /etc/fstab

Search for the /nfsdata mount point and comment it out to prevent it form automatically being mounted on boot. Take note of the device for the /nfsdata mount point. Here is what my fstab looks like.

LABEL=/boot /boot ext3 defaults 1 2
#/dev/VolGroup00/LogVol04 /nfsdata ext3 defaults 1 2


If you are going to use external storage as NFS partition then you can set that up after base os installation. It is not mandatory to create nfs partition during OS installation. Also it is best to use LVM for partitioning instead of fix size device. LVM gives greater flexibility.

Now unmount /nfsdata partition if it is mounted as heartbeat will take care of mounting that.

umount /nfsdata

Make sure that ntp and ntpdate are installed on both of the nfs servers.

yum install ntp ntpdate

The time on both servers must be identical. Edit your /etc/ntp.conf file and verify settings.

Now lets check and make sure that the nfs service is not running on startup and that selinux is also turned off.


If you have not installed full operating systems or all administration tools then you can use


and disable firewall and selinux.

Now We will install DRBD and DRBD-Kernel module

Note: If you are installing DRBD and DRBD-kernel on physical system then you will need to install:
drbd-8.0.16-5.el5.centos.x86_64.rpm AND kmod-drbd-8.0.16-5.el5_3.x86_64.rpm
And if you are using DRBD and DRBD kernel module on virtual machine (VM) then you need to install:
drbd-8.0.16-5.el5.centos.x86_64.rpm AND kmod-drbd-xen-8.0.16-5.el5_3.x86_64.rpm
You can choose version of package according to your choice.
Along with DRBD and DRBD-kernel module, we will install following rpm’s also.

yum install Perl-TimeDate net-snmp-libs-x86_64

rpm -Uvh heartbeat-pils-2.1.4-2.1.x86_64.rpm

rpm -Uvh heartbeat-stonith-2.1.4-2.1.x86_64.rpm

rpm -Uvh heartbeat-2.1.4-2.1.x86_64.rpm

Now its time to edit drbd.conf file. /etc/drbd.conf is default location for that config file.

common { syncer { rate 100M; al-extents 257; } }

resource r0 {
protocol C;
handlers { pri-on-incon-degr “halt -f”; }
disk { on-io-error detach; }
startup { degr-wfc-timeout 60; wfc-timeout 60; }

on nfs01 {
device /dev/drbd0;
disk /dev/LogVol04/nfsdrbd;
meta-disk internal;
on nfs02 {
device /dev/drbd0;
disk /dev/LogVol04/nfsdrbd;
meta-disk internal;

Let me give you little bit more information about parameters used in above config file.
So lets start from the top.

  • Protocol – This is the method that drbd will use to sync both of the nfs servers. There are 3 available options here, Protocol A, Protocol B and Protocol C.Protocol A is an asynchronous replication protocol. The manual states, “local write operations on the primary node are considered completed as soon as the local disk write has occurred, and the replication packet has been placed in the local TCP send buffer. In the event of forced fail-over, data loss may occur. The data on the standby node is consistent after fail-over, however, the most recent updates performed prior to the crash could be lost.”Protocol B is a memory synchronous (semi-synchronous) replication protocol. The manual states, “local write operations on the primary node are considered completed as soon as the local disk write has occurred, and the replication packet has reached the peer node. Normally, no writes are lost in case of forced fail-over. However, in the event of simultaneous power failure on both nodes and concurrent, irreversible destruction of the primary’s data store, the most recent writes completed on the primary may be lost.”
  • Protocol C is a synchronous replication protocol. The manual states, “local write operations on the primary node are considered completed only after both the local and the remote disk write have been confirmed. As a result, loss of a single node is guaranteed not to lead to any data loss. Data loss is, of course, inevitable even with this replication protocol if both nodes (or their storage subsystems) are irreversibly destroyed at the same time.

    You may choose your desired protocol but Protocol C is the most commonly used one and it is the safest method.

  • rate – The rate is the maximum speed at which data will be sent from one nfs server to the other while syncing. This should be about a third of your maximum write speed. In my case, I have only a single disk that can write about 45mb/sec so a third of that would be 15mb. This number will usually be much higher for people with raid setups. In some large raid setups, the bottleneck would be the network and not the disks so set the rate accordingly.
  • al-extent – This data on the disk are cut up into slices for synchronization purposes. For each slice there is an al-extent that is used to indicate any changes to that slice. Larger al-extent values make synchronization slower but benefit from less writes to the metadata partition. In my case, I’m using an internal metadata which means the drbd metadata is written to the same parition that my nfs data is on. It would benefit me to have less metadata writes to prevent the disk arm from constantly moving back and forth and degrading performance. If you are using a raid setup and a separate partition for the metadata then set this number lower to benefit from faster synchronization. This number MUST be a prime to gain the most possible performance because it is used in specific hashes that benefit from prime number sized structures.
  • pri-on-incon-degr – The “halt -f” command is executed if the node is primary, degraded and if the data is inconsistent. I use this to make sure drbd is halted when there is some sort of data inconsistency to prevent a major mess from occuring.
  • on-io-error – This allows you to handle low level I/O errors. The method I use is the “detach” method. This is the recommended option by On the occurrence of a lower-level I/O error, the node drops its backing device, and continues in diskless mode.
  • degr-wfc-timeout – This is the amount of time in seconds that is allowed before a connection is timed out. In case a degraded cluster (cluster with only one node left) is rebooted, this timeout value is used instead of wfc-timeout, because the peer is less likely to show up in time, if it had been dead before.

The rest of the config is pretty self explanatory. Replace nfs1 and nfs2 with the hostnames of your nfs servers. To get the hostnames use the following command on both servers:

uname -n

Then replace the disk value with the device name from your fstab file that you commented out. Enter the IP address of each server and use port 7789. The last part is the meta-disk. I used an internal meta-disk because I only have one hard disk in the server and it would not give me any benefit to create a separate partition for the metadata. If you have a raid setup or a separate disk from your data partition that you can use for the meta data than go ahead and create a 150mb partition. Replace the word “internal” in the config file with your device name that you used for the meta data partition.

Now that we finally have our drbd.conf file ready we can move on. Lets go ahead and enable the drbd kernel module.

modprobe drbd

Now that the kernel module is enabled lets start up drbd.

drbdadm up all

This will start drbd, now lets check its status.

cat /proc/drbd

You can always use the above command to check the status of drbd. The above command should show you something like this.

0: cs:Connected st:Secondary/Secondary ld:Inconsistent
ns:0 nr:0 dw:0 dr:0 al:0 bm:1548 lo:0 pe:0 ua:0 ap:0
1: cs:Unconfigured

You should get some more data before it but the above part is what we are interested in. If you notice it shows that drbd is connected and both nodes are in secondary mode. This is because we have not assigned which node is going to be the primary yet. It also says the data is inconsistent because we have not done the initial sync yet.

I am going to set nfs1 to be my primary node and nfs2 to be my secondary node. If nfs1 fails, nfs2 will takeover but if nfs1 comes back online then all the data from nfs2 will be synced back to nfs1 and nfs1 will take over again.

First of all lets go ahead and delete any data that was created on the /data partition that we setup during our intial OS installation. Be very careful with the command below. Make sure to use the appropriate device because all data on that device will be lost.

dd if=/dev/zero bs=1M count=1 of=/dev/VolGroup00/LogVol04; sync

Instead of “/dev/VolGroup00/LogVol04″, replace it with your device for the /data parition. Now that the partition is completely erased on both servers, lets create the meta data.

drbdadm create-md r0

Do the following ONLY on nfs1(

Now that the metadata is created, we can move onto assigning a primary node and conducting the initial sync. It is absolutely important that you only execute the following command on the primary node. It doesn’t matter which node you choose to be the primary since they should be identical. In my case, I decided to use nfs1 as the primary.

drbdadm — –overwrite-data-of-peer primary r0

Ok, now we just have to sit back and wait for the initial sync to finish. This is going to take some time to finish even though there is no data on each device, drbd has to sync every single block on /data partition from nfs1 to nfs2. You can check the status by using the following command.

cat /proc/drbd

Do the following on nfs1( and nfs2(

After the initial sync is finished, “cat /proc/drbd” should show something like this.

0: cs:Connected st:Primary/Secondary ld:Consistent
ns:12125 nr:0 dw:0 dr:49035 al:0 bm:6 lo:0 pe:0 ua:0 ap:0
1: cs:Unconfigured

If you notice, we are still connected and have a primary and secondary node with consistent data.

Do the following ONLY on nfs1( :

Now lets make an ext3 file system on our drbd device and mount it. Since drbd is running, the ext3 file system will also be created on the secondary node.

mkfs.ext3 /dev/drbd0

The above command will create an ext3 file system on the drbd device. Now lets go ahead and mount it.

mount -t ext3 /dev/drbd0 /data

We know that NFS stores important information in /var/lib/nfs by default which is required to function correctly. In order to preserve file locks and other important information, we need to have that data stored on the drbd device so that if the primary node failes, NFS on the secondary node will continue from right where the primary node left off.

mv /var/lib/nfs/ /data/
ln -s /data/nfs/ /var/lib/nfs
mkdir /data/export
umount /data

So lets go over what we just did.

  • We have now moved the nfs folder from /var/lib to /data.
  • We created a symbolic link from /var/lib/nfs to /data/nfs since the operating system is still going to look for /var/lib/nfs when nfs is running.
  • We created an export directory in /data to store all the actual data that we are going to use for our nfs share.
  • Finally, we un-mounted the /data partition since we finished what we were doing.

Do the following ONLY on nfs2(
Since we moved the nfs folder to /data, that was synced over to the secondary node as well. We just need to create the symbolic link so that when the /data partition is mounted on nfs2 we have a link to the nfs data.

rm -rf /var/lib/nfs/
ln -s /data/nfs/ /var/lib/nfs

So we removed the nfs folder and created a symbolic link from /var/lib/nfs to /data/nfs. The symbolic link will be broken since the /data parition is not mounted. Don’t worry about that because in the event of a failover that partiton will be mounted and everything will be fine.

Now we need to configure heartbeat on both nfs servers ns1 and nfs2. we have already installed required softwares.
Create /etc/ha.d/ on both nfs servers with following contents in it

keepalive 2
deadtime 30
bcast eth0
node ukibinfs01 ukibinfs02

Replace names for “node” as per your hostnames. to find out your hostname use uname -n.

Now we need to create “/etc/ha.d/haresources” configuration file on both nfs servers with following configuration

nfs1 IPaddr:: drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 nfslock nfs
IP address used in haresources config file is floating IP. Out of both NFS server which ever is primary, will have that IP configured on eth0:0.

Share this post

Post Comment