Last weekend I was at a customer that was very enthusiastic about virtualizing their servers with Hyper-V.
They immediately started to build their clusters and virtualized their servers. They were rather shocked when cluster resources were not brought online when a node failed, migrations did not work, etc. Also, SCVMM shown a status of ‘Unsupported cluster configuration’ for the VM’s that were experiencing issues.
The first thing I do in situations like these is to run a cluster validation report, which shown me that either something was changed after the cluster was created or no cluster validation was run at all.
Next, I look for similarities… which can become a journey in itself
The Hyper-V Manager told me that nothing was wrong with the virtual machine configurations… and Failover Cluster Manager also reported no errors.
Although this customer had SCVMM installed and configured, I was not allowed access to the console… but that’s a discussion for another day.
I still wanted to see if the issues were related to a specific host or cluster… or if there were VM’s without this status.
Since I had no access to the SCVMM console, I tried the SCVMM PowerShell Module… and guess what? I did have access to it (since I can do way more with PowerShell compared to the management console I’m gonna have a nice discussion with the IT guy that denied me access to the console…).
Here’s the command to get the VM names, status en node they are running on:
Get-VM | select name, vmhost, status | sort status | convertto-html | out-file D:\Temp\VMStatusList.html
And the output was a follows:
As you can see, my guess was right… the VM’s running on the HV-cluster were just fine but those running on the HYPER-cluster were reporting the ‘Unsupported Cluster Configuration’ status.
Just to make sure, I migrated a VM to the HV-cluster… and the status changed from ‘Unsupported Cluster Configuration’ to ‘Running’.
So, I logged on to the HYPER1 and HYPER2 node and walked through the basics for a cluster configuration.
Suddenly everything became clear… 4 NICs in one team for all the network traffic (yes, this includes the heartbeat traffic), VHD’s stored on both CSV and on local disk for the same virtual machine, etc.
So we’ve rebuild the cluster and everything is running smoothly
Now what have we learned by all of this?
Step 1: Make sure you have a clear problem definition.
Step 2: Get access to the tools and servers you need.
Step 3: Check if the problem still exists
Step 4: Run a cluster validation report and solve any warnings/errors.
Step 5: Look for similarities
Step 6: ……………..