vSAN Disk Fault Injection

My Journey from Infrastructure Admin to Cloud Architect: Remote Proof of Concept Testing for vSAN

As a cloud architect, I have had the opportunity to work with various technologies and solutions in the virtualization space. One of the most interesting aspects of my job is conducting proof of concept (POC) testing remotely. Recently, I had the chance to test vSAN, a software-defined storage solution from VMware, using remote POC testing. In this blog post, I will share my experience with remote POC testing for vSAN and the challenges I faced during the process.

Why Remote POC Testing?

Traditionally, POC testing for vSAN involves setting up a test environment on-site, which can be time-consuming and costly. However, with remote POC testing, I can conduct the testing from my home lab, eliminating the need for on-site testing and reducing the overhead associated with it.

Remote POC testing also allows me to test vSAN in a more realistic environment, as I can simulate real-world scenarios and test the solution under different conditions. This helps me to identify potential issues and bottlenecks before deploying the solution in a production environment.

Challenges of Remote POC Testing

One of the major challenges of remote POC testing is the lack of physical access to the hardware. In on-site testing, I can physically access the hardware and perform tests such as hot unplugging or physical network failure. However, in remote testing, I need to rely on software-based tools to simulate these scenarios.

Another challenge is the limited visibility into the test environment. Without physical access to the hardware, it can be difficult to monitor the test environment and diagnose issues that may arise during the testing process.

vSAN Disk Fault Injection Script

To overcome these challenges, I used the vSAN Disk Fault Injection script, which is available on ESXi by default. The script allows me to simulate disk failures and test the resilience of the vSAN cluster.

The script has several options, including -u for injecting a hot unplug, which I used in my testing. To run the script, I needed to specify the device ID of the drive I wanted to test. I used esxli vsan storage list to obtain the device ID of the cache drive (Is Capacity Tier:false).

Testing vSAN with Remote POC

To test vSAN using remote POC, I followed these steps:

1. Configure the vSAN cluster on my home lab environment.

2. Connect to the ESXi host using the vSphere Client or esxcli command-line tool.

3. Run the vSAN Disk Fault Injection script with the appropriate options to simulate a disk failure.

4. Monitor the status of the data and the process of resyncing objects due to “compliance”.

5. After completing the testing, I simply scanned for new storage devices on the host to solve the issue.

Results and Observations

During my remote POC testing for vSAN, I observed several things:

1. The vSAN Disk Fault Injection script is a powerful tool for testing the resilience of the vSAN cluster. It allowed me to simulate disk failures and observe how the cluster responded to the failure.

2. The script provided detailed information about the status of the data and the process of resyncing objects due to “compliance”.

3. The remote POC testing environment closely mimicked a real-world production environment, allowing me to identify potential issues and bottlenecks before deploying the solution in a production environment.

4. The lack of physical access to the hardware did not significantly impact my ability to test vSAN. The software-based tools provided by VMware allowed me to simulate physical failures and test the resilience of the cluster.

5. The process of resyncing objects due to “compliance” was seamless and efficient, providing peace of mind that the data was safe and secure.

Conclusion

In conclusion, remote POC testing for vSAN is a valuable tool for cloud architects and infrastructure admins looking to test the resilience of their vSAN clusters. The vSAN Disk Fault Injection script provides a powerful way to simulate disk failures and test the cluster’s ability to recover from failures.

While there are challenges associated with remote POC testing, such as limited visibility into the test environment and reliance on software-based tools, these challenges can be overcome with careful planning and execution. By leveraging remote POC testing, cloud architects and infrastructure admins can identify potential issues and bottlenecks before deploying vSAN in a production environment, ensuring a successful implementation and minimizing downtime.