NSX-V 6.4.13 Resolves High Network Latency and Jitter Issue
Introduction:
In one of my customer environments, we recently experienced high network latency and jitter issues after upgrading to NSX DataCenter for vSphere 6.4.12. The issue started two days after the upgrade and affected multiple tenants. One of our edges became unstable with latencies up to 5 seconds, followed by an automatic failover after a few days of uptime. In this blog post, I will discuss the root cause of the issue and the workaround that helped us resolve the problem.
Issue Overview:
After the upgrade to NSX DataCenter for vSphere 6.4.12, our customer environment experienced high network latency and jitter issues. The affected edge became unstable with latencies up to 5 seconds, and it also performed an automatic failover after a few days of uptime. The issue was observed in the NSX Edge logs, where an exception was seen: kernel[]: []: [kern.alert] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 kernel[]: []: [kern.alert] IP: []. The issue seemed to be caused by a kernel exception that disrupted the stable operation of the NSX Edge.
Workaround:
To work around the issue, we had to log in via root to an edge node. Here are the steps to follow:
1. Login to the NSX manager via SSH or console.
2. Debug engineering mode enable.
3. St eng.
4. Password: IAmOnThePhoneWithTechSupport.
5. Copy the desired root passwords of the edges from /home/secureall/secureall/sem/WEB-INF/classes/GetSpockEdgePassword.sh.
6. Login to the affected NSX Edge using the admin account on the VM console.
7. Enable (use admin password).
8. Debug engineering mode enable.
9. St eng (use the root password collected from NSX Manager).
10. CD /opt/vmware/vshield/Framework.
11. Take a backup of the file: cp config_manager.pm config_manager.pm.orig (keep the old file).
12. Vi config_manager.pm.
13. Search for configManagerDone (finding should be around line 227).
14. Get in insert mode, add ‘#’ at the beginning of the line configManagerDone($configManagerData->{“highAvailability”}, $configManagerData->{“iptables”}{“changed”}))).
15. Save and close the file using wq!.
16. Check with less config_manager.pm that the change was successful.
Update: This issue has been resolved with NSX-V 6.4.13.
Conclusion:
In this blog post, we discussed a high network latency and jitter issue that arose after upgrading to NSX DataCenter for vSphere 6.4.12. The issue was caused by a kernel exception that disrupted the stable operation of the NSX Edge. We provided a workaround that involved logging in via root to an edge node, making changes to the config_manager.pm file, and taking a backup of the file. The issue has been resolved with NSX-V 6.4.13. If you are experiencing similar issues, please try these steps and update me with your results.