Optimize Your VMware VCSA 6.5u0 or PSC Appliance with SCSI Block Timeout Adjustments

Increasing SCSI Timeout Value on vCSA 6.5u0 and PSC Appliance

In a previous post, I discussed an issue occurring in a lab environment with vCSA 6.5u0 and PSC appliance, where VCSA or PSC appliance won’t boot after hard shutdown. As the issue became more regular with time, I tried to figure out the root cause of those events. As system’s logs reports SCSI timeout on write operations, I remembered that the default 30 seconds timeout could be insufficient in some virtualized environment. So the proposal to fix it is the modification of timeout to a higher value.

We can display the current (default at this time) value of SCSI timeout for any block device of the system with the following command (based on sysfs, a pseudo file system provided by the Linux kernel since version 2.6):

“`bash

find /sys/class/scsi_generic/*/device/timeout -exec grep -H . ‘{}’ ;

“`

As mentioned in KB #1009465 Increasing the disk timeout values for a Linux 2.6 virtual machine, VMware tools creates a udev rule at /etc/udev/rules.d/99-vmware-scsi-udev.rules that sets the timeout to 180 seconds for each VMware virtual disk device and reloads the udev rules so that it takes effect immediately. But on the Photon appliance, this udev rule doesn’t exist anymore.

To compare only: on a “non-Photon based” Linux VM, a /etc/udev/rules.d/99-vmware-scsi-udev.rules file exists (created by the VMware-tools installer) and contains:

“`bash

KERNEL==”sd[a-z]*”, RUN+=”/usr/bin/vmware-toolbox –set-disk-timeout 180″

“`

So we probably need to increase the value by ourselves at each system startup. One way to do this is by using rc.local file for example.

According to NetApp recommendations about disk timeout on virtualized guest OS, the expected value is 180 seconds as configured in VCSA 6.0 build-3339084.

There are multiple ways to fix the SCSI timeout value:

1. It’s not mentioned in the Release Notes, but VCSA 6.5 build 5973321 includes a fix for the missing udev rule with openvm-tools.

2. An upgrade is the best way to avoid this issue.

3. It’s possible to manually add the missing udev rule and apply it. A reboot is necessary to apply the new rule (the hot command `udevadm control –reload-rules && udevadm trigger` didn’t work for me).

4. By default, there is no created rc.local file on the Photon based appliance to run simple commands at every system startup. But it’s simple to find out where to create this file by displaying the systemd rc-local service configuration:

“`bash

systemctl cat rc-local

“`

As mentioned, the `/etc/rc.d/rc.local` must be created and executable. Let’s do it!

“`bash

vi /etc/rc.d/rc.local

“`

When saved, we change the file permission to make it executable:

“`bash

chmod +x /etc/rc.d/rc.local

“`

Then we activate the rc-local on system startup:

“`bash

systemctl enable rc-local

“`

And we test it:

“`bash

systemctl start rc-local

“`

No restart is needed to apply the new timeout settings. At every system startup, the rc.local file will be instantiated and the timeout value increased from 30 seconds to 180. Each block device should now use a 180 second timeout for SCSI commands.

To conclude, increasing the SCSI timeout value on vCSA 6.5u0 and PSC appliance can be done by modifying the udev rule or by using rc.local file. An upgrade is the best way to avoid this issue, but if you prefer to manually modify the configuration, a reboot is necessary to apply the new rule.