Today I upgraded my non-production vRealize Operations Manager (vROps) appliance from v6.7 to v7.0. However, when applying the initial OS update
.pak file via the admin interface, the upgrade immediately failed
with the error message
"source ./pak_python_wrapper.sh validate.py" failed.
Initially a colleague of mine pointed me to the well known disk space issues (KB article), but a quick look at the disk stats showed that there was plenty of space available on all partitions.
vrops.example.com:~ # df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 16G 4.6G 11G 31% / udev 7.9G 124K 7.9G 1% /dev tmpfs 7.9G 16K 7.9G 1% /dev/shm /dev/sda1 128M 39M 83M 32% /boot /dev/mapper/data-core 20G 225M 19G 2% /storage/core /dev/mapper/data-log 20G 9.1G 9.7G 49% /storage/log /dev/mapper/data-db 217G 115G 91G 56% /storage/db
Then I stumbled accross an article which pointed me at the
/storage/log/vcops/log/pakManager/vcopsPakManager.root.post_validate.log log file. In this log I saw the following errors regarding swap space.
INFO - ***swap_space_check*** --- Validating swap space on all nodes in the cluster. DEBUG - Updating state file with check: swap_space_check, start_time: True, stop_time: False, result: None, result_desc: None DEBUG - state file: "/storage/db/pakRepoLocal/vRealizeOperationsManagerEnterpriseVAOSUpgrade-70010098132/vRealizeOperationsManagerEnterpriseVAOSUpgrade-70010098132_validate.json" DEBUG - Swap: 0 0 0 DEBUG - total: 0mb, used: 0mb ERROR - swap_space_check failed, float division by zero Traceback (most recent call last): File "validate.py", line 1974, in main result, result_description = getattr(envInfo, check_name)() File "validate.py", line 1393, in swap_space_check swap_percent = float(used)/float(total) ZeroDivisionError: float division by zero ERROR - float division by zero DEBUG - Updating state file with check: swap_space_check, start_time: False, stop_time: True, result: False, result_desc: float division by zero DEBUG - state file: "/storage/db/pakRepoLocal/vRealizeOperationsManagerEnterpriseVAOSUpgrade-70010098132/vRealizeOperationsManagerEnterpriseVAOSUpgrade-70010098132_validate.json" ERROR - Failed running upgrade: ZeroDivisionError('float division by zero',) is not JSON serializable
I then used
top to see swap usage and sure enough it was showing
vrops.example.com:~ # top top - 15:06:30 up 36 min, 1 user, load average: 0.24, 0.22, 0.29 Tasks: 107 total, 1 running, 106 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16081M total, 3514M used, 12567M free, 52M buffers Swap: 0M total, 0M used, 0M free, 2782M cached
fdisk -l showed me which partition was supposed to be for swap.
vrops.example.com:~ # fdisk -l /dev/sda Disk /dev/sda: 21.5 GB, 21474836480 bytes 64 heads, 32 sectors/track, 20480 cylinders, total 41943040 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000cdac4 Device Boot Start End Blocks Id System /dev/sda1 2048 272383 135168 83 Linux /dev/sda2 272384 8675327 4201472 82 Linux swap / Solaris /dev/sda3 * 8675328 41943039 16633856 83 Linux
And then I ran the following to re-enable the swap partition.
vrops.example.com:~ # mkswap -L SWAP-sda2 /dev/sda2 Setting up swapspace version 1, size = 4201468 KiB LABEL=SWAP-sda2, UUID=f9eaa199-8121-4838-b277-81ca2820f64a vrops.example.com:~ # swapon -a
Finally I checked
top again and as you can see the swap is now
vrops.example.com:~ # top top - 15:07:13 up 37 min, 1 user, load average: 0.21, 0.22, 0.29 Tasks: 106 total, 1 running, 105 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16081M total, 3515M used, 12566M free, 52M buffers Swap: 4102M total, 0M used, 4102M free, 2782M cached
Sure enough, when I tried the update again it worked without issue. Hopefully this will save someone the couple of hours digging around online that I ended up doing today.