This site best when viewed with a modern standards-compliant browser. We recommend Firefox Get Firefox!.

Linux-HA project logo
Providing Open Source High-Availability Software for Linux and other OSes since 1999.

USA Flag UK Flag

Japanese Flag

Homepage

About Us

Contact Us

Legal Info

How To Contribute

Security Issues

This web page is no longer maintained. Information presented here exists only to avoid breaking historical links.
The Project stays maintained, and lives on: see the Linux-HA Reference Documentation.
To get rid of this notice, you may want to browse the old wiki instead.

1 February 2010 Hearbeat 3.0.2 released see the Release Notes

18 January 2009 Pacemaker 1.0.7 released see the Release Notes

16 November 2009 LINBIT new Heartbeat Steward see the Announcement

Last site update:
2014-10-23 05:52:08

Frequently Asked Questions about DRBD

Contents

  1. Frequently Asked Questions about DRBD
  2. General Issues
    1. What is DRBD, to begin with?
    2. Which license conditions apply to DRBD?
    3. Where do I get Support for DRBD
    4. Where can I download DRBD, or get more information?
  3. Setup and Installation
    1. Can I encrypt/compress the exchanged data?
    2. Can I mount the secondary at least readonly?
    3. Why does DRBD not allow concurrent access from all nodes? I'd like to use it with GFS/OCFS2...
    4. Can DRBD use two devices of different size?
    5. Can XFS be used with DRBD?
    6. How do the "local machine" and the "remote machine" need to be connected?
    7. When I try to load the drbd module, I am gettin the following error: compiled for kernel version ''some version'' while this kernel is ''some other version''
    8. Can I use DRBD with LVM?
    9. I use DRBD and Linux-vServer, and I cannot umount anymore: "file system in use"
    10. What about Xen, DRBD and iSCSI?
    11. Can I use DRBD with OpenVZ?
  4. Operation Issues
    1. drbdadm create-md fails with "Operation refused." - what can I do?
    2. I get "access beyond end of device" errors, drbd is broken!?
    3. Why is Synchronization (SyncingAll) so slow?
    4. How can I speed up the Synchronization performance?
    5. How can I speed up write throughput?
    6. Why is my "load average" that high?
    7. What is warning: ''Return code 255 from /etc/ha.d/resource.d/datadisk'' telling me when using the datadisk script with heartbeat?
    8. When the node goes from secondary to primary the drbd device will not be mounted on the primary. Manually mounting works.
    9. What is warning: ''out of vmalloc space''
    10. What do the fields like st, ns, nr, dw, dr etc. in /proc/drdb mean?

General Issues

  • Please have a look at some of the publications and documentation.

What is DRBD, to begin with?

  • DRBD, developed by PhilippReisner and LarsEllenberg, is a
    Distributed
    Replicated
    Block
    Device
    for the Linux operating system. It allows to have a realtime mirror of your local block devices on a remote machine. In conjunction with Heartbeat it allows to create HA (high availability) Linux clusters.

Which license conditions apply to DRBD?

  • DRBD is released under the GNU GENERAL PUBLIC LICENSE Version 2, Juni 1991 (GPL. Thus, within the conditions of this license it can be freely distributed and modified.)

Where do I get Support for DRBD

Where can I download DRBD, or get more information?

Setup and Installation

Can I encrypt/compress the exchanged data?

  • Of course. But this is no option within DRBD. You 'just' need to setup some VPN, then the network stack will take care of that. For a lightweight solution, have a look at the

    CIPE project. Of course, IPSEC or OpenVPN will do, too.

Can I mount the secondary at least readonly?

  • Short answer: No!. But see also next question/answer.

  • DRBD would not care, but most likely your filesystem will be confused because it will not be aware about changes in the underlying device. This in general means that it cannot work, not with ext2, ext3, reiserFS, JFS or XFS.

    Thus, if you want to mount the secondary, set the secondary as the primary first. Both devices mounted at the same time does not work. Actually, DRBD v8 does support two Primaries, see the next answer. If you need access to the data from both nodes, and an arbitrary number of other clients, consider using HaNFS.

Why does DRBD not allow concurrent access from all nodes? I'd like to use it with GFS/OCFS2...

  • Actually, DRBD version 8.0.x and later support this.

    If you need not just a mirrored, but a shared filesystem, use OCFS2 or GFS2 for example. But these are much slower, and typically expect write access on all nodes in question. If we have more than one node concurrently modifying distributed devices, we have some "interessting" problems to decide which part of the device is up-to-date on which node, and what blocks need to be resynchronized in which direction. These problems have been solved. You need to net { allow-two-primaries; } to activate this mode. But the handling of DRBD in "cluster fs mode" is still more complex and cumbersome than "classical" one-node-at-a-time access.

  • An other option would be to have only one node active, export that device via iSCSI, then run OCFS2 on iSCSI.
  • Also have a look at the DRBD Changelog.

Can DRBD use two devices of different size?

  • Generally yes, but there are some issues to consider:

    Locally DRBD uses the configured disk-size, which has to be <= physical, and if not given its is set to the physical size. On connect the device size will be set to the minimum of both nodes. And here you could run into problems, if you do things without common sense: if you first use drbd on one node only, without disk-size configured properly, and later connect a node with smaller device size, then the drbd device size shrinks at runtime. you should find a message about Your size hint is bogus, please change to <some value> in the syslog in that case. This will confuse the file system on top of your device. Thus, if your device sizes differ, set the size to be used for DRBD explicitely. {i} DRBD-0.7 stores information about the peers device size in its local meta data, therefore usage of disk-size is deprecated (and is disallowed in the configuration file).

Can XFS be used with DRBD?

  • XFS uses dynamic block size, thus DRBD 0.7 or later is needed.

How do the "local machine" and the "remote machine" need to be connected?

Can I put one machine in Los Angeles and the other machine in New York, connected only by a VPN link over the Internet? Or do they both need to be connected to the same local Ethernet network?

When I try to load the drbd module, I am gettin the following error: compiled for kernel version ''some version'' while this kernel is ''some other version''

  • The settings for your actual kernel and the .config for the kernel source against which drbd was build do not match. On SuSE Linux you can get the right config with the following commands: cd /usr/src/linux/ && make cloneconfig && make dep Ususally, you do not have to recompile your kernel, just drbd. But read INSTALL in the drbd tgz, to learn how to do it the proper way.

Can I use DRBD with LVM?

I use DRBD and Linux-vServer, and I cannot umount anymore: "file system in use"

What about Xen, DRBD and iSCSI?

Can I use DRBD with OpenVZ?

Operation Issues

drbdadm create-md fails with "Operation refused." - what can I do?

  • the actual error message looks like
    Found $some filesystem which uses $somuch kB 
    current configuration leaves usable $less kB
    
    Device size would be truncated, which
    would corrupt data and result in
    'access beyond end of device' errors.
    You need to either
      * use external meta data (recommended)
      * shrink that filesystem first
      * zero out the device (destroy the filesystem)
    Operation refused.
  • which means
    • you created your filesystem before you created your DRBD resource, or
    • you created your filesystem on your backing device, rather than your DRBD,
  • neither of which is a problem by itself, except - as the error message tries to hint - you need to enlarge the device (e.g. lvextend), shrink the filesystem (e.g. resize2fs), or place the DRBD metadata somewhere else (external meta data).
  • DRBD tries to detect an existing use of the block device in question. E.g. if it detects an existing file system that uses all the available space (as is default for most filesystems), and you try to use DRBD with internal meta data, there is no room for the internal meata data - creating that would corrupt the last few MiB of the existing file system.
  • If re-creating the filesystem on the DRBD is an option, one way to "zero out the device (destroy the filesystem)", and then recreate it on the DRBD is
    dd if=/dev/zero bs=1M count=1 of=/dev/sdXYZ; sync 
    drbdadm create-md $r
    drbdadm -- -o primary $r
    mkfs /dev/drbdY
  • If drbdadm would not refuse, you would soon be back reading the next answer.

I get "access beyond end of device" errors, drbd is broken!?

  • Your kernel log fills with
    attempt to access beyond end of device 
    drbd0: rw=1, want=211992584, limit=211986944
    Buffer I/O error on device drbd0, logical block 26499072
    Your file system then remounts read-only, panics or similar. When you try to fsck, you get something like
    The filesystem size (according to the superblock) is ... blocks. 
    The physical size of the device is ...+x blocks.
    Envision this:
    |-- usable area with drbd and internal meta data --|-IMD-| 
    |-- real device -----------------------------------------|
    IMD is "internal meta data". Once created, it is fixed size. With drbd 0.7 it was fixed 128MB. With drbd 8.0 it is approximately [total storage of real device]/4/8/512/2 rounded up, +36k, rounded up to the next 4k.
    exaple: 
     grep -e hda4 -e drbd0 /proc/partitions
       3     4  105996744 hda4
     147     0  105993472 drbd0
     ceil(105996744 kB / 32768) == 3235 kB
                                +    36 kB
                                == 3271 kB
                    4k  aligned == 3272 kB
         105996744 kB - 3272 kB == 105993472 kB
  • If you did mkfs /real/device, then later mount through DRBD, the file system either recognized size mismatch in superblock vs. actual block device size on the spot and refuse to mount (xfs does this, iirc).
  • Or the file system mounts alright, because it skips the check for block device size (ext3, at least certain version of it, aparently do this; it is ok for a file system to assume that its superblock contains valid data) and then thinks it could use the now not available space which is occupied by IMD.
  • There are various ways to find out what your file system thinks about the usable space it occupies. For ext3, you can find out with e.g.
    tune2fs -l /dev/whatever | 
      awk '/^Block.size:/ { bs=$NF }
           /^Block.count:/ { bc=$NF }
           END { print bc * bs / 1024, "kB" }'
  • As long as the file system does not want to use that area, it won't notice. If the file system eventually decides to use that area, whops, surprise, it gets an access beyond end of device error. When the file system will start using that area is nearly impossible to pretict. So it may appear to work fine for month, and then suddenly break again and again.
  • (!) This is not a problem with drbd. It is a problem with using drbd incorrectly.

  • also see http://thread.gmane.org/gmane.linux.network.drbd/12690/focus=12692 or serach the list archives for more ascii art and explanations.

Why is Synchronization (SyncingAll) so slow?

  • {i} Outdated, applies to drbd versions prior drbd-0.6.4 only For historical reasons replicate used to work backwards. Most physical devices do have a pretty slow thoughput when writing data backwards.

How can I speed up the Synchronization performance?

  • double check the value of sync-max in the net {} section (drbd-0.6) resp. rate in the syncer {} section (drbd-0.7). Keep in mind that the default value is very low, and the default unit is kByte/sec!

  • if you run on top of some local RAID, make sure it is not reconstructing at the same time
  • check whether DMA is enabled ;-)

How can I speed up write throughput?

  • First you need to find the bottleneck. This can be your local disk, the network, the remote disk, latency caused by excessive seeks, or the summed up latency of those components.

    You may want to play with the values of protocol and sndbuf-size. If your NIC supports it, you may want to enable "jumbo frames" (up the value of the MTU). If nothing helps, ask the list for known good and performant setups...

Why is my "load average" that high?

  • Load average is defined as average number of processes in the runqueue during a given interval. A process is in the run queue, if it is
    • not waiting for external events (e.g. select on some fd)
    • not waiting on its own (not called "wait" explicitly)
    • not stopped :)

    Note that all processes waiting for disk io are counted as runable! Therefore, if a lot of processes wait for disk io, the "load average" goes straight up, though the system actually may be almost idle cpu-wise ... E.g. crash your nfs server, and start 100 ls /path/to/non-cached/dir/on/nfs/mount-point on a client... you get a "load average" of 100+ for as long as the nfs timeout, which might be weeks ... though the cpu does nothing. Verify your system load by other means, e.g. vmstat, sysstat/sar. This will give you an idea of the bottleneck of your system. Some ideas are using multiple disks (not just partitions!) or even a RAID with 10.000rpm SCSI disks and probably even a Gigabit Ethernet. Even on a Fast Ethernet device you will rarely see more then 6 MByte per second. (100 MBit/s is at most 12.5 MByte/s minus protocol overhead and latency etc.).

What is warning: ''Return code 255 from /etc/ha.d/resource.d/datadisk'' telling me when using the datadisk script with heartbeat?

  • {i} DRBD-0.6 only
    Exit code 255 is most likely from a script generated die, which include a verbose error message. Capture the output of that script. this is the debugfile directive in your ha.cf, iirc. If that does not help, do it by hand, and see what error message it gives. datadisk says something like cannot promote to primary, sychronization running or fsck failed or ...

When the node goes from secondary to primary the drbd device will not be mounted on the primary. Manually mounting works.

  • (!) Feature ...
    DRBD does not automaticaly mount the partition. The script datadisk (or drbddisk since 0.7) is made for that purpose. It is intended to be called by hartbeat.

What is warning: ''out of vmalloc space''

For each device, drbd will (try to) allocate X MB of bitmap, plus some constant amount (<1MB). X = storage_size_in_GB/32, so 1 TB storage -> 32 MB bitmap.

By default Linux allocates 128MB to Vmalloc. For systems using more than 4TB, this may cause an issue.

If you get the following error message in /var/log/messages, Try a Linux 2.6 hugemem kernel.

kernel: allocation failed: out of vmalloc space - use vmalloc=<size> to increase size.

What do the fields like st, ns, nr, dw, dr etc. in /proc/drdb mean?

 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
 ^   ^            ^                    ^                   ^ ^-[*]
 |   |            |                    |                   |
 |   |            |                    |    wire protocol -ยด
 |   |            |                    `- disk state
 |   |            `- state (should be named role, but historically)
 |   `- connection state
 `- minor number

    ns:67582830 nr:1293290 dw:68880243 dr:124296536
    net sent    net read   disk write  disk read

                              al:13693       bm:101
    meta data updates for     activity log   bitmap

                                          ,--------- lo:0 pe:0 ua:0 ap:0
    gauges of currently pending requests, see below.

     ,- resync: used:0/31 hits:335 misses:109 starving:0 dirty:0 changed:109
     |- act_log: used:0/1801 hits:6527480 misses:13790 starving:0 dirty:97 changed:13693
   cache statistics for the resync and activity log in memory cache.
   you can safely ignore these.

[*]: four characters showing certain flag bits

  • The first character is
  • [rs]: io running(resumed)/suspended.
    • See drbdadm suspend-io/resume-io. Also temporarily set implicitly by fencing resource-and-stonith.
    The next three characters show details of sync-after dependencies. They all say '-' if currently unset. If there is a resync running, but you have serialized resync of your devices (because they share some resources (live on the same "spindle"), or because you want some more important ones to be resynced first), there are certain ways to suspend this resync.
  • a: implicitly paused because of sync-after dependency on this node
  • p: implicitly paused because of sync-after dependency on the peer node
  • u: explicitly suspended by the user, see drbdadm pause-sync/resume-sync
cs
connection state
  • Unconfigured

    Device waits for configuration.

    StandAlone

    Not trying to connect to peer, IO requests are only passed on locally.

    Unconnected

    Transitory state, while bind() blocks.

    WFConnection

    Device waits for configuration of other side.

    WFReportParams

    Transitory state, while waiting for first packet on a new TCP connection.

    Connected

    Everything is fine.

    Timeout, BrokenPipe, NetworkFailure

    Transitory states when connection was lost.

    {i} DRBD-0.6 specific

    SyncingAll

    All blocks of the primary node are being copied to the secondary node.

    SyncingQuick

    The secondary is updated, by copying the blocks which were updated since the now secondary node has left the cluster.

    SyncPaused

    Sync of this device has paused while higher priority (lower sync-group value) device is resyncing.

    {i} DRBD-0.7 / DRBD-8; trailing S or T indicates this node is SyncSource or SyncTarget, respectively.

    WFBitMap{S,T}

    Transitory state when synchronization starts; "dirty"-bits are exchanged.

    SyncSource

    Synchronization in progress, this node has the good data.

    SyncTarget

    Synchronization in progress, this node has inconsistent data.

    PausedSync{S,T}

    see SyncPaused.

    SkippedSync{S,T}

    you should never see this. "Developers only" ;-)

st:Local/Remote
state, the respective node's role for this device.
  • Primary

    the active node; may access the device.

    Secondary

    the passive node; must not access the device; expects mirrored writes from the other node.

    Unconfigured

    this is not a role, obviously.

ld
local data consistentency (DRBD-0.7)
ds
disk state (DRBD 8)
  • Diskless

    No storage attached, or storage had IO errors previously and got detached.

    Attaching

    in the process of attaching the local storage

    Failed

    storage had io errors

    Negotiating

    storage attached, but is not yet decided whether it is UpToDate

    Inconsistent

    storage is Inconsistent (e.g. half way during bitmap based resync)

    Outdated

    storage is consistent, but not UpToDate

    DUnknown

    (peer's) storage state is not known

    Consistent

    storage is consistent, not yet decided whether it is UpToDate or Outdated

    UpToDate

    storage is good

ns,nr,dw,dr,...
statistic counters in number of blocks (1KB) respectively number of requests
  • ns

    network send

    nr

    network receive

    dw

    disk write

    dr

    disk read

    al

    activity log updates (0.7 and later)

    bm

    bitmap updates (0.7 and later)

    lo

    reference count on local device

    pe

    pending (waiting for ack)

    ua

    unack'd (still need to send ack)

    ap

    application requests expecting io-completion