Some system should be put in place to record and regularly update the contents of server room racks.
The configuration of the various disk arrays needs to be dumped to a secure location at regular intervals.
Service data should be backed up, wherever possible, directly from the host providing the service.
There needs to be a supply of server and fibre hardware available for cases where a VM cannot act as a suitable replacement.
Actions
Instigate a rack content recording system
Arrange for regular dumping of disk configuration information (services unit)
Arrange for service data to be backed up directly (services unit)
Carry out test restore of entire AFS partition (services unit)
Experiment with mounting AFS volumes on a different server (services unit)
Arrange for supply of replacement server and fibre hardware
Give consideration as to the best way to restore the AFS service to normality after promoting offsite RO volumes (services unit)
Need to think about what it means if ext3/4 journals being held on SSDs internal to the server when the “disks” are on a SAN elsewhere, and the machine and SSD (and so journals) get destroyed, but the file systems on the SAN survive. What would that mean when trying to bring the file system back online on another machine (without the journal)?