Heartbeat

NOTE: We no longer run heartbeat, as it was causing more problems than it was solving. The information on this page will be helpful if you try setting up heartbeat again, although some of the software has changed a lot over the past few years.

Heartbeat is a daemon that handles all the failover in the cluster. For starters, see http://linux-ha.org/. We are running Heartbeat V2.

Basics
Heartbeat works by managing resources on nodes. A node is a computer that runs stuff. A resource is any type of service that gets moved around. Examples of services include failover IP's, drbd disks (who is primary/secondary), filesystems (you use these to mount drbd stuff), and services. Resources are usually put into "Resource Groups". All the services in a resource group will be run on the same host, and they will be started/stopped sequentially.

There are also rules that help specify where services can be run. The most common is a "location" rule.

drbddisk
Parameter: 1, value: 

Filesystem

 * Parameter: device, value /dev/whatever
 * Parameter: directory, value: /mountpoint
 * Parameter: fstype, value xfs|reiserfs|ext3|etc

IPaddr
Parameter: 1, value:

crm_resource
crm_resource is a command that lets you manage resources in the cluster. To use it, you must be a member of the haclient group.

crm_resource -W -r Tells you where the specified resource is running crm_resource -M -r [-h host] Migrates the specified resource off of its current host. If -h is specified, it moves it to that host. This adds a location constraint with a score of -INFINITY for the resource and its current host (translation: the resource will never be run on its current host again), so you probably want to run crm_resource -U -r to remove this rule.

crm_resource -r -H -C "Cleans up" a resource. You must use a real resource name, not a resource group.

crm_resource -r -p target_role -v (started|stopped) Sets a resource's target role to either started or stopped.

hb_gui
hb_gui is a graphical interface to the heartbeat cluster. It's quite nice, and is also very useful for configuring services.

crm_mon
crm_mon is a command-line program that pretty-prints the current cluster status. You may also want to try crm_mon -n to show resources by nodes, or crm_mon -1 to just give one-shot info (not try to update it every 15sec or so)

crm_standby
crm_standyb allows you to set/clear the standby status of a machine. To put a machine into standby, crm_standby -U -v on To take it out of standby, use either of the following commands crm_standby -U -D crm_standby -U -v off