Cluster service wont start because it’s in a failed state caused from the service attempting to start but failing too many times in too short of a time period (based on your cluster.conf retries and remember policy) or, more likely it’s failing to shutdown cleanly.
Manually enabling the service fails, right?
# clusvcadm -e host1service
Checking syslog, you will see the service refusing to start and the reason why. In this case it’s complaining it failed to stop cleanly, but the road block is really that it’s in a failed state.
# tail /var/log/messages rgmanager: Service:host1service has failed; can not start. rgmanager: Service:host1service failed to stop cleanly
Assuming cluster.conf is valid because you ran ccs_config_validate and it told you so, the problem is probably pretty simple. If a service is in a failed state, you can’t enable it unless you disable it first!
Perhaps this functionality is to stop you from starting a failed service because it’s already tried to start up a number of times and couldn’t already, so what’s the point of another try? In my case, what’s the harm? It’s already failed again and again. Why not just retry it when I step in and manually ask for the service to start?!
Oh well, it’s just one extra step.
Disable the service, then enable it. If you really have fixed the problem that caused the failure in the first place, the service should start right up.
# clusvcadm -d host1service # clusvcadm -e host1service
Checking syslog once again, you should see the service disable, renable, and fire off the startup script.
Stopping service service:host1service rgmanager: [script] Executing /etc/init.d/ClusterService stop rgmanager: [script] Executing /etc/init.d/ClusterService start rgmanager: Service service:host1service started
Check the cluster status to make sure it started.
# clustat Cluster Status for hacluster @ Fri Nov 9 02:15:00 2012 Member Status: Quorate Member Name ID Status ------ ---- -- ------ host1 1 Online, Local, rgmanager host2 2 Online, rgmanager host3 3 Online, rgmanager Service Name Owner State ------- ---- ----- ----- service:host1service host1 started
If the service is relocatable, now is a good time to test relocation to make sure it works properly and doesn’t fail on the other available nodes.
# clusvcadm -r dnsmasq -m host2 Trying to relocate service:dnsmasq to host2…Success service:dnsmasq is now running on host2
If relocation fails on any particular node, fix the problem or take that node out of the failoverdomain in cluster.conf.