Strange Networking Issues in vSAN Environment
As an IT professional, I have encountered my fair share of strange issues in virtualized environments. However, a recent incident in one of my lab environments left me baffled. I experienced networking connectivity issues with one of the hosts, which led to some unexpected discoveries. In this blog post, I will share the events that transpired and the workarounds I employed to resolve the issue.
Host Lost Networking Connectivity
——————————
On a weekend, I noticed that vSAN Health was reporting that one of the hosts had lost networking connectivity to the rest of the cluster. This is not an uncommon occurrence, and I have seen it happen intermittently in the past. However, this time, when I went to look at the host to put it into maintenance mode, I found something strange.
No VMkernel Adapters Listed
—————————–
When I opened the vSphere Web Client to view the state of the VMkernel adapters, I was shocked to find that there were no adapters listed! The host was still reported as being connected, and I could still access it via the Web UI and SSH. This was a very unusual situation, and I suspected that there might be an issue with the vSAN configuration.
No Interface Listed from CLI
——————————
I tried to troubleshoot the issue by running esxcli commands to get a list of VMkernel interfaces. However, I couldn’t find any interface listed! This was very strange, as I had never seen this before. I tried restarting the core services, but that didn’t resolve the issue. The host was still up and running, yet it had no network interfaces configured.
Rebooting the Server
———————-
My only option at this point was to reboot the server. Upon reboot, the host did come back up online, but the networking was still reporting as being 0.0.0.0/0 from the console. The host was completely offline, and I couldn’t reset the management network.
Last Known Good Configuration
—————————–
I decided to reboot using the last known good configuration, which restored all previous network settings. When I opened the vSphere Web Client again, I found that all VMkernel interfaces were present and functioning correctly. The cluster took some time to get back into working order, but once the vSAN re-sync had completed, all VMs were back up and operational.
Workaround to Bring Back Host Networking
—————————————
The workaround I employed to bring back the host networking was to reboot using the last known good configuration. This restored all previous network settings, and the host was back online with proper networking configured.
Root Cause Unknown
——————-
I have an active case going with VMware Support, and they are analyzing the logs to determine the root cause of this issue. I will update this post with the results when they come through.
Conclusion
———-
In conclusion, this was a very strange issue that I encountered in my vSAN environment. The host lost all of its network config, and I couldn’t find any interface listed from the CLI or the Web Client. Rebooting the server using the last known good configuration restored all previous network settings, but the root cause of this issue remains unknown. I will update this post with the results of the VMware Support investigation.
Esxi Version: 6.7.0.13006603
Vsphere Version: 6.7.0.30000
Nsx-v: 6.4.4.11197766