Network Mayhem

Troubleshooting Network Issues with Windows NLB and HP ProVision Switches

As an IT professional, I recently encountered a challenging network issue while setting up a tiny VMware vSphere cluster on a customer’s site. The setup involved a HP BladeSystem c7000 with 7 HP ProLiant BL460 Gen8 servers, and the goal was to bring the Windows NLB (Network Load Balancing) cluster to life. However, I hit a roadblock when I found that both Onboard Administrator (OA) network interfaces were unavailable.

After switching from static IP addresses to DHCP, I noticed that both interfaces were available only when I connected my notebook directly to the interfaces. Furthermore, the Insight Display was unresponsive after connecting one or both OA to the network. The customer reported that they had experienced network-related issues with both physical and virtual machines over the past few days, including short outages, lost pings, and other similar problems.

To troubleshoot the issue, I began by checking the switches and found an enormous amount of “Drops TX” on every active port. However, I did not observe any loops or issues with the network configuration. The network was flat, with a single VLAN and a /16 network.

To gain more insight into the network activity, I asked the customer to start Wireshark. Upon launching Wireshark, I noticed spooky traffic flowing through the network. Normally, I would expect traffic such as broadcasts, ARP, traffic from my client or for my client, but instead, I saw traffic from a domain controller to a Windows NLB cluster and Citrix traffic to a Windows NLB cluster.

After further investigation, I discovered that the issue was related to the Windows NLB running in unicast mode. Using unicast mode for NLB is not recommended, as it can cause network flooding and other issues. In this case, the mac address of the cluster adapter, which is used for cluster communication, was mapped to all cluster members. This caused the switch to blow packets out on all ports, resulting in flooded traffic for the Windows NLB nodes.

To resolve the issue, I recommended switching the NLB to multicast mode. However, this requires support from the switches, as not all models are compatible with multicast mode. On HP ProVision-based switches, you can enable multicast mode by entering the following command in config mode: “core-sw-01(config)# ip arp-mcast-replies.” This command allows the switch to accept a multicast mac address in an ARP reply.

Alternatively, you can use IGMP (Internet Group Management Protocol) multicast mode, which is supported by all HP ProVision switches. To enable IGMP snooping on VLAN 1, run the following command: “core-sw-01(config)# vlan 1 ip igmp.” This will allow only clients that have joined the multicast group to receive traffic.

In summary, troubleshooting network issues with Windows NLB and HP ProVision switches can be challenging, but understanding the underlying causes of the problem is crucial to finding a solution. In this case, I encountered issues related to the Windows NLB running in unicast mode, which caused network flooding and other problems. By switching to multicast mode or using IGMP multicast, I was able to resolve the issue and ensure proper network functionality.