Sure, here is a new blog post based on the information provided:
—
Resolving Alarms and BGP Issues in NSX-T 3.1.3.7
As we recently encountered an issue with NSX-T 3.1.3.7, we wanted to share our experience and the solution/workaround for those who might encounter similar problems. The issue we ran into was related to alarms and BGP connectivity between sites. In this post, we will go over the steps to troubleshoot and resolve the issue.
Background
———-
We recently upgraded from NSX-T version 3.1.3.6 to 3.1.3.7, and one of our sites started experiencing alarms in the alarm section of the NSX-T management platform. The alarms were related to BGP connectivity issues between the sites. After investigating the issue, we found that all BGPs were established, and ping commands gave a reply, but the issue persisted.
Troubleshooting Steps
————————
To troubleshoot the issue, we followed these steps:
1. Check the connection: We logged into the Edges and grabbed the VRF ID of the RTEP tunnel. We then checked the BGP and ping between the RTEP IP addresses on both sites. As we could see all BGPs were established, and the ping commands gave a reply.
2. Check from Postman: We opened Postman and fired a GET API call to the NSX-Manager to grab the edge ID we needed in the next API call. We selected Basic Auth under the Authorization tab and filled in the admin credentials. When getting a reply in the body, we searched for the edge name and the corresponding ID.
3. Get RTEP status: We then used this ID to get the RTEP status with the following GET API call:
The output showed that the BGP to one of the peers was established, but the issue persisted. We noticed that the alarm was resolved on one of the manager nodes, but it was still showing on other nodes and was keeping the alarm active.
Workaround
———–
To resolve the issue, we performed the following workaround:
1. Restart the Proton service on all manager nodes:
We SSHed with the admin user to the NSX-T manager nodes and executed the following commands:
Stop service proton
Start service proton
This workaround removed the alarm, and the issue was resolved.
Conclusion
———-
In conclusion, if you experience alarms and BGP connectivity issues in NSX-T 3.1.3.7, follow the troubleshooting steps outlined above to identify and resolve the issue. The solution/workaround is to restart the Proton service on all manager nodes. This issue is known in the 3.1.3.7 version in a 3-manager nodes setup, but it is fixed in version 3.2.1.
Note: Always check the NSX-T documentation and official support channels for the latest information and updates before troubleshooting and resolving any issues.