Elastix HA Cluster

Building Elastix-1.3 High Availability Clusters with Redfone foneBRIDGE2, DRBD and Heartbeat

Disclaimer

DRBD and Heartbeat are not programs maintained or supported by Redfone

Communications LLC. Do not contact Redfone for support on these programs.

Use the information in this document at your own risk. Redfone disavows any potential

liability for the contents of this document. Use of the concepts, examples, and/or other

content of this document are entirely at your own risk.

All copyrights are owned by their owners, unless specifically noted otherwise.

Use of a term in this document should not be regarded as affecting the validity of any

trademark or service mark.

You are strongly recommended to take a backup of your system before major installation

and backups at regular intervals.

Credits

Special thanks to Telesoft Integrando Technologies whose earlier documentation on

doing DRBD in Elastix was a great reference for this work;

http://asterisk.aplitel.info/files/asteriskcluster.pdf

Operational Overview

What is DRBD?

DRBD®

refers to block devices designed as a building block to form high availability

(HA) clusters. This is done by mirroring a whole block device via an assigned network. It

is shown as network raid-1- DRBD.

In the illustration above, the two orange boxes represent two servers that form an HA

cluster. The boxes contain the usual components of a Linux™ kernel: file system, buffer

cache, disk scheduler, disk drivers, TCP/IP stack and network interface card (NIC) driver.

The black arrows illustrate the flow of data between these components.

The orange arrows show the flow of data, as DRBD mirrors the data of a high availably

service from the active node of the HA cluster to the standby node of the HA cluster.

(Source www.drbd.org)

In our implementation we will be creating a DRBD synchronized partition on /dev/sda3

called “replica”. This partition will contain only those directories and files we want

synchronized between our primary and secondary server. Namely, the important Asterisk

and Elastix related directories and files.

What is Heartbeat?

The upper part of this picture shows a cluster where the left node is currently active, i.e.,

the service's IP address is currently on the left node, and the client machines are talking to

the service via its service's IP address on the active (left) node.

The service, including its IP address, can be migrated to the other node at any time, either

due to a failure of the active node or as an administrative action. The lower part of the

illustration shows a degraded cluster. In HA speak the migration of a service is called

failover, the reverse process is called failback and when the migration is triggered by an

administrator it is called switchover. (Source www.drbd.org)

In our implementation we will utilize Heartbeat to monitor the state of two servers and

during a failover mount our synchronized partition on the secondary server and start up

the following resources/applications; fonulator, asterisk, mysql and http. The fonulator

utility will re-configure the fonebridge to reroute T1/E1 traffic to the secondary server as

specified in /etc/redfone.conf and asterisk, mysql and http start up all aspects of Elastix

on the secondary. During failover our floating IP address will move from the primary to

the secondary server. This IP address should be used to register SIP and other VoIP

endpoints against.

Equipment Overview

This installation scenario assumes two servers, each with three Ethernet interfaces and a

single SATA hard drive. You may have a different type of hard drive (IDE, SCSI, etc)

and therefore some of these steps may need to be modified to better reflect your

environment. A fonebridge2 (providing T1/E1 connectivity), interconnected with the two

servers via a dedicated Ethernet switch or on a shared managed switch with an isolated

VLAN for fonebridge to Asterisk traffic. It is also possible to implement this with only

two Ethernet interfaces. Under those circumstances it would be advisable to use one Eth

for the fonebridge and the other for IP, heartbeat and DRBD traffic.

Network Diagram

DRBD Install and Configuration

The following steps are to be performed on both primary and secondary servers;

1. Boot Elastix-1.3 Install CD

2. From boot menu, type ‘advanced’ and enter.

3. During install routine choose to manually partition hard drive. The following is

based on a 160.0 GB SATA

• Create root (/), ext3 partition with 6144MB (sda1)

• Create swap partition with 3072MB (sda2)

4. The remainder of the install routine is standard.

5. After installation and booting perform upgrade

• yum –y update

6. Ensure /boot/grum/menu.1st to boot non-xen kernel unless you need the Xen

kernel.

• default=1

7. Create partition that will contain the replicated data

• fdisk /dev/sda

• Add a new partion (n)

• Primary (p)

• Partition number (3)

• Press enter until returned to fdisk command prompt

• NOTE: if your servers have two different sized hard drives it is imperative

that the third partition is identical in size or they will never synchronize

over DRBD. Do this by accepting the default first cylinder and then

specifying the Last cylinder with the +sizeM option. Ex. +6048M. Make

these same specifications on both servers.

• Press “t” to change the partition system ID

• Press “3” to choose partition number

• Choose HEX 83 for type

• Press “w” to save changes

• RESTART SERVER

8. Format newly made partition

• mke2fs –j /dev/sda3

9. Now we delete the file system from the disk we just created

• dd if=/dev/zero bs=1M count=1 of=/dev/sda3; sync

10. Install DRBD, Heartbeat and dependencies with yum. (or install_utilities script)

• yum install drbd

• yum install kmod-drbd82

• yum install OpenIPMI-libs

• yum install heartbeat-pils

• yum install openhpi

• yum install heartbeat

• yum install heartbeat-stonith

11. To ensure proper host name to IP resolution it is recommended that you manually

update the /etc/hosts file to reflect proper host-to-IP mapping.

10.1.1.1 primary.yourdomain.com

10.1.1.2 secondary.yourdomain.com

127.0.0.1 primary.yourdomain.com

::1 localhost6.localdomain6 localhost6

12. DRBD allows us to write the declared partition in both members of the cluster. In

our case /dev/sda3. To make this happen we need to create a virtual partition

called /dev/drbd0.

13. Edit /etc/drbd.conf on both primary and secondary servers. drbd.conf file must be

identical on both servers. Modify this sample to meet your particular needs.

resource "r0" { protocol A; disk { on-io-error pass_on; } startup { wfc-timeout 5; degr-wfc-timeout 3; } syncer { rate 100M; } on primary.yourdomain.com { device /dev/drbd0; disk /dev/sda3; address 10.1.1.1:7789; meta-disk internal; } on secondary.yourdomain.com { device /dev/drbd0; disk /dev/sda3; address 10.1.1.2:7789; meta-disk internal; } }

14. Before continuing, change the name of the server in the Elastix web interface.

15. Now we create the virtual partition /dev/drdb0 on both servers

• drbdadm create-md r0

16. Start drbd service on both servers to begin synchronization process.

• service drbd start

17. Verify sync process with the following command

• service drbd status

18. Initially, both servers will be ‘secondary’. We need to assign who is the ‘primary’.

Enter the following command on the primary server only

• drbdsetup /dev/drbd0 primary -o

19. Check drbd status again with ‘service drbd status’. You should obtain a status

similar to the following;

# service drbd status drbd driver loaded OK; device status: version: 8.0.3 (api:86/proto:86) SVN Revision: 2881 build by buildsvn@c5-i386-build, 2007-05-13 08:22:43 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate A r--- ns:464976 nr:0 dw:0 dr:464976 al:0 bm:170 lo:0 pe:0 ua:0 ap:0 resync: used:0/31 hits:28977 misses:85 starving:0 dirty:0 changed:85 act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0

20. We can determine the role of a server by executing the following;

• drbdadm state r0

• The primary server should return;

• # drbdadm state r0

Primary/Secondary

21. Primary Server Only: Now we can mount the virtual partition /dev/drbd0, but

first we must format the partition with ext3 using the following command

• mke2fs –j /dev/drbd0

• mkdir /replica

• mount /dev/drbd0 /replica

22. Primary Server Only: Now we will copy all of the directories we want

synchronized between the two servers to our new partition, remove the original

directories and then create symbolic links to replace them.

Directories of interest are;

• /etc/asterisk

• /var/lib/asterisk

• /usr/lib/asterisk

• /var/spool/asterisk

• /var/lib/mysql

• /var/log/asterisk

• /var/www

23. Copy, remove and link. (Or create_directories script which automates this step)

• cd /replica

• tar -zcvf etc-asterisk.tgz /etc/asterisk

• tar -zxvf etc-asterisk.tgz

• tar -zcvf var-lib-asterisk.tgz /var/lib/asterisk

• tar -zxvf var-lib-asterisk.tgz

• tar -zcvf usr-lib-asterisk.tgz /usr/lib/asterisk/

• tar -zcvf var-www.tgz /var/www/

• tar -zxvf usr-lib-asterisk.tgz

• tar -zcvf var-spool-asterisk.tgz /var/spool/asterisk/

• tar -zxvf var-spool-asterisk.tgz

• tar -zcvf var-lib-mysql.tgz /var/lib/mysql/

• tar -zxvf var-lib-mysql.tgz

• tar -zcvf var-log-asterisk.tgz /var/log/asterisk/

• tar -zxvf var-log-asterisk.tgz

• tar -zxvf var-www.tgz

• rm -rf /etc/asterisk

• rm -rf /var/lib/asterisk

• rm -rf /usr/lib/asterisk/

• rm -rf /var/spool/asterisk

• rm -rf /var/lib/mysql/

• rm -rf /var/log/asterisk/

• ln -s /replica/etc/asterisk/ /etc/asterisk

• ln -s /replica/var/lib/asterisk/ /var/lib/asterisk

• ln -s /replica/usr/lib/asterisk/ /usr/lib/asterisk

• ln -s /replica/var/spool/asterisk/ /var/spool/asterisk

• ln -s /replica/var/lib/mysql/ /var/lib/mysql

• ln -s /replica/var/log/asterisk/ /var/log/asterisk

• ln -s /replica/var/www /var/www

24. Restart mysql server

• service mysqld restart

Heartbeat Configuration

1. Remember to stop any boot up services that should be controlled by heartbeat.

These services will be controlled by heartbeat on the server that is in control.

• chkconfig asterisk off

• chkconfig mysqld off

• chkconfig httpd off

2. Edit the /etc/ha.d/ha.cf file. This is a basic config file that provides all of our

required functionality.

debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 20 warntime 10 initdead 40 udpport 694 bcast eth1 auto_failback off node primary.yourdomain.com node secondary.yourdomain.com

3. Edit /etc/ha.d/haresources. This is where we specify which is our primary server,

the name of the drbd partition we will mount and where, our floating IP address

(10.1.1.3 in this example on eth1 with a 24 bit subnet mask) and the init.d scripts

that will be started during a failover or switchover (fonulator, asterisk, mysqld,

httpd)

primary.yourdomain.com drbddisk::r0 Filesystem::/dev/drbd0::/replica::ext3 IPaddr::10.1.1.3/24/eth1/10.1.1.255 fonulator asterisk mysqld httpd

4. Edit /etc/ha.d/authkeys. This provides a level of security/authentication between

the nodes.

auth 1 1 sha1 B!gb@dUndUglIp@$$werd

5. Change permissions on the /etc/ha.d/authkeys file

• chmod 600 /etc/ha.d/authkeys

6. Configure heartbeat to initialize at boot

• chkconfig --add heartbeat

7. Restart drbd on both servers

• service drbd restart

8. Verify that both servers are currently in secondary state


• Result should be ‘Secondary/Secondary’ on both

9. Restart heartbeat

• service heartbeat restart

10. Wait a few moments are re-check the state of the servers.


• Result this time should be ‘Primary/Secondary’ on the primary server

11. Execute ‘df –h’ on the primary to confirm that our /dev/drbd0 partition is

mounted and in use.

• # df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sda1 5.9G 1.9G 3.8G 33% /

tmpfs 1.5G 0 1.5G 0% /dev/shm

/dev/drbd0 138G 305M 131G 1% /replica

12. You can also monitor the synchronization process with the following command

• watch -n 1 cat /proc/drbd

Install and configure fonebridge Support

1. Install fonulator utility (install_utilities script)

• rpm -ivh http://support.red-fone.com/downloads/fonulator/fonulator-2.0.0-

36.i386.rpm

2. Download sample redfone.conf file

1. cd /etc/

2. wget http://support.red-fone.com/downloads/fonulator/redfone.conf

3. Edit redfone.conf to match your requirements

• Refer to fonebridge general install guide for details and options

4. Download fonulator initd script

• cd /etc/init.d

• wget http://support.red-

fone.com/downloads/fonulator/old/fonulator_initd_script

• mv fonulator_initd_script fonulator

• chmod +x fonulator

5. Reboot both servers and confirm functionality. Under normal operation your primary

server should boot, mount drbd partition and start up those init scripts specified in

/etc/ha.d/haresources.

6. Verify drbd partition is mounted by executing ‘df –h’. You should see something

similar to the following.

# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sda1 5.9G 1.8G 3.8G 32% /

tmpfs 125M 0 125M 0% /dev/shm

/dev/drbd0 5.7G 409M 5.0G 8% /replica

Troubleshooting

1. Split brain or nodes running in “standalone mode”. Ex;

cs:StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown,

• drbdadm -- --discard-my-data connect all

DRBD documentation

http://www.drbd.org/docs/about/

Heartbeat documentation

http://www.linux-ha.org/Heartbeat

Centos DRBD/Heartbeat Howtos

http://wiki.centos.org/HowTos/Ha-Drbd

http://www.howtoforge.com/drbd-on-centos-4.5

Last updated

December 23, 2008

Elastix HA Cluster

Documents

drbd synchronized partition

drbd traffic

secondary servers

active node

ext3 partition

swap partition

network diagram drbd

active left node