vSAN Datastore Full? Don’t Panic! Here’s What to Do Next

My Journey from Infrastructure Admin to Cloud Architect: Testing vSAN’s Hardening Capabilities during Capacity-Strained Scenarios

As an infrastructure administrator, I have always been curious about how vSAN handles capacity-strained scenarios. Recently, I had the opportunity to test vSAN’s hardening capabilities during such situations, and I was impressed with the results. In this blog post, I will share my experience and the lessons I learned from testing vSAN’s hardening features.

Testing vSAN’s Hardening Capabilities

I started by filling a datastore with data using HCI Bench, which creates multiple VMs with thin-provisioned VMDKs. As expected, vSAN Health displayed a warning that the datastore was running low on space, and the alarm turned critical when one of the hosts went offline. This triggered a very popular datastore alarm, “Datastore Usage on disk,” which warned me that the datastore was nearly full.

At this point, I noticed that some of the thin-provisioned VMs were stunned, and I deliberately ran a 100% sequential write test on them to invoke the process. The test affected my vRLI VM that was probably writing new datastore full logs on its VMDKs. Other VMs continued to run fine, but I was unable to create new VMs or clone existing ones due to the lack of free space.

vSAN’s Response to Capacity-Strained Scenarios

I was impressed with how vSAN responded to the capacity-strained scenario. The system did not allow the datastore to become completely full, queuing some activities instead. Hosts remained fully responsive, and I could power off VMs, run management activities, and even download a VM from the vSAN datastore.

vSAN’s hardening features kicked in when I added capacity disks to my vSAN disk groups. The reaction of the cluster was immediate, and I gained more free space, paused resync jobs started again, and rebalancing kicked in according to my disk balance policy.

Lessons Learned from Testing vSAN’s Hardening Capabilities

My experience testing vSAN’s hardening capabilities taught me several valuable lessons:

1. vSAN is designed to handle capacity-strained scenarios: vSAN is built to handle unexpected situations, and it did not disappoint during my tests. The system is robust and can handle a certain level of stress without compromising the health of the cluster.

2. Monitoring free space on every datastore is critical: As an infrastructure administrator, it is essential to monitor free space on every datastore to avoid running low on space. vSAN Health provides valuable insights into the health of your datastores and helps you identify potential issues before they become critical.

3. Hardening features are crucial during capacity-strained scenarios: vSAN’s hardening features, such as rebalancing and data placement policies, help ensure that the system remains healthy even when faced with capacity-strained scenarios. These features should be carefully configured to meet your specific needs.

4. Testing is essential to understanding vSAN’s capabilities: While vSAN’s documentation provides valuable information about its capabilities, testing is essential to gain a deeper understanding of how the system behaves under different conditions. This knowledge can help you make informed decisions about your virtual infrastructure.

Conclusion

Testing vSAN’s hardening capabilities during capacity-strained scenarios was an eye-opening experience that reinforced my confidence in the product. I learned valuable lessons about the importance of monitoring free space on every datastore, the importance of configuring hardening features correctly, and the value of testing to gain a deeper understanding of vSAN’s capabilities.

As an infrastructure administrator, it is essential to stay up-to-date with the latest virtualization technologies and test their capabilities under various scenarios. By doing so, you can make informed decisions about your virtual infrastructure and ensure that it remains healthy and performing optimally.