Clustered KVM setup for Live Migration with gfs2 on CentOS 6


Cluster KVM

computer cluster

Let’s start by configuring networking, then install a bunch of packages for clustering and KVM. After that we’ll configure the services and create a gfs2 cluster file system. Finally we’ll relocate libvirt to the clustered storage so live migration will just work right out of the box.
Ready set go.

Networking

If you only have one or two network interfaces, you can make this part pretty simple by skipping the bonding setup. You should have at least two interfaces so all internal cluster and KVM migration communication can stay separate on its own network, but even this isn’t a necessity for testing purposes.

If your switch supports LACP, use bonding mode 802.3ad. I found it to be the best for performance and redundancy. If your switch does not have specialized channel bonding support, consider using the adaptive load balancing mode instead.

For the nitty-gritty bonding details, see http://www.linuxfoundation.org/collaborate/workgroups/networking/bonding

Loading the bonding module and specifying some options:

Configure the files in /etc/sysconfig/network-scripts/ to bring up the network at boot time. My configuration uses three interfaces. Two interfaces (eth0,eth1) are bonded with the bond (bond0) connected to a bridge (br0) for the public network and a single interface (eth2) for private communication between nodes in the cluster.

That was for node1. For the rest of your nodes, do the same thing but increment the ip addreses. I use 101, 102, and 103 for a three node cluster.

Setup your hosts file so the nodes can talk to each other on your private network. This will be the same on each host, so just make it once and copy it to the other nodes. For all configurations from here on, make it on one node and copy the file to the rest.

The second name and IP address for each host is for the dedicated baseboard management controller (BMC). I use the “-ipmi” names for fencing in the cluster.conf later.

Install Cluster Packages

This list will pull all packages and dependencies needed for getting up and running. You can add things like snmp and foghorn later. Skip OpenIPMI and ipmitool if you don’t want to use a BMC like HP’s integrated lights out controller for fencing.

I avoid selinux like the plague and I don’t have qlogic hardware, yet these things try to get in my way so i’ll remove a few packages and be done with it.

Disable selinux and the firewall. You can add the firewall back later when you’re not testing on a private network.

Sanlock and watchdog

When running augtool, use a unique value for each host with a value between 1 and 2000. See http://libvirt.org/locking.html for more information. I let sanlock’s dependency start on its own using chkconfig, but I don’t want sanlock starting by itself. I put this in the cluster stack instead.

Softdog module must be loaded at boot for sanlock to work. Make a script and put it in the sysconfig/modules directory.

If you’re using an IPMI interface for fencing, the modules must load for ipmitool to work. I had to do this to configure the BMC’s network interface and for general probing from the host.

Setup IPMI with user and password for cluster fencing. Using ipmitool, configure an IP address. I don’t plan on touching the BMC from the public side at all, so I put it on my private network.

For more details on ipmitool, go to this projects home page at http://ipmitool.sourceforge.net/

Turn on services you want at boot for starting up the cluster

Turn off services with chkconfig for anything the cluster will handle starting. It’s also important to include services that keep files open on your storage cluster. If you don’t let cman handle start/stop of dnsmasq for example, the storage will not be able to unmount while the service is still running.

Reconfigure lvm for clustering. You could create the file system with local locking and change it later, but why? I do this before creating the gfs2 volumes so I know they’re configured right from the beginning. Change locking type to 3 for build-in clustered locking. I disable fallback to local locking to avoid any kind of split brain problems with two hosts writing independently and screwing up the gfs2 volume. If a host can’t play nice with others, it’s safer to not allow the storage to mount at all.

Cluster Config

cluster.conf controls the way your cluster stack loads, unloads, fences, etc. Each time you make a change to cluster.conf you must increment the config_version to have your changes take effect.

Check the config for errors with ccs_config_validate.

And just for future reference: To list raw currently running values similar to how sysctl.conf printing works, run corosync-objctl. Since your cluster isn’t running yet, ignore this for now.

Cluster Storage

Create clustered volumes for storage with clvmd running so you know it’s going to work properly. Start cman and clvmd manually for now.

Create a physical volume, a volume group, then a logical volume, and finally the gfs2 file system in that order.

# mkdir /vol1

Add entries to /etc/fstab so gfs2 will mount the volume when the cluster tells it to.

You should be able to mount it manually now or let the cluster start it after rebooting.

With the storage mounted, move the libvirt directory to the cluster storage from one node and create a link to it. You must do this for live migration to work. Delete it from the rest of the nodes and just create the link. The other option is to change the location where libvirt keeps all its files.

From host1:

From host2 and on:

All done!

Reboot all your nodes and check to make sure they come up. Use clustat to verify that services are running. If you see your cluster storage services listed with the state shown as “started”, then everything worked. If not, check each service individually to figure out where the problem is. Make sure the storage is mounted, then check libvirt, dnsmasq, sanlock, gfs2, and then clvmd.

If you find the services are missing or hanging up somehow, use rg_test to do a dry run and make sure all of the individual components of your service are loading, and in the proper order. You’re not going to get gfs2 mounted if clvmd isn’t started first. And in the reverse order for shutdown, the cluster isn’t going to unmount if libvirtd doesn’t stop first.

If everything looks good so far but the cluster won’t unmount cleaning, try using lsof and grepping for anything in vol1. If a file is open and you’re waiting for a gfs2 unmount, your cluster is gonna have a bad time.



Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">