The Story of a Virtualization and Storage Guy’s Journey to Resolve Random Exchange Disconnections and Delivery Delays
As a virtualization and storage expert, I have found myself involved in every performance troubleshooting session and new project lately. Recently, my boss approached me with a problem regarding our new Exchange 2010 environment. Users worldwide were experiencing random disconnections and delivery delays of up to four hours. The issue was not limited to any specific device or platform, and Activesync devices and OWA users were unaffected by the delays.
I began my investigation by reviewing event logs on all servers in the Exchange 2010 environment. I found numerous errors related to running Exchange 2010 SP1 without any update rollups. Corresponding KB articles from Microsoft confirmed these fixes in various update rollups. One event that stood out was Event ID 2915 on our CAS servers, indicating “Session Limit Over Budget” due to the default throttling policy.
To better understand the default throttling policy, I recommend reading Understanding Client Throttling Policies. To resolve the issue, I created a PowerShell script that set the throttling policy defaults to null, effectively removing any restrictions. After making this change, reported disconnections stopped, but delivery delays continued globally.
I decided to go back to basics and began troubleshooting by sending test messages to colleagues. All messages were promptly delivered without any issues, except for one colleague who was experiencing delivery delays of up to four hours. I turned off their cache mode on the Outlook client, and the problem magically disappeared. This led me to note the differences between the two servers.
To stop the global issue from occurring while I resolved the problem, I failed all DAG volumes over to the server that did not seem to be having the problem. Reports quickly confirmed that the issue was resolved. Then, I compared differences between the two servers and found one significant difference: Microsoft KB2393802 was applied to one server but not the other. After removing the patch and rebooting, testing with a test mailbox database showed that the problem was fixed.
Despite my research, I could not find any information from Microsoft regarding this patch causing issues in Exchange 2010 mail delivery. If any of you readers have an idea about what could be causing this problem, please comment and share your thoughts. I have attempted to contact Microsoft regarding this issue, but they have not replied yet.
In conclusion, my journey to resolve random Exchange disconnections and delivery delays was a fruitful one, with many lessons learned along the way. As virtualization and storage experts, we must be prepared to adapt to new challenges and embrace new technologies. In this case, I had to dig deeper into troubleshooting techniques and learn more about Exchange 2010, leading me to discover a potential issue with Microsoft KB2393802.