web.inf.ed.ac.uk disaster recovery
The VM, dripped.inf, at KB is configured to take a mirror
of the live web.inf.ed.ac.uk site every 2 hours. Cron runs the script
/disk/data/scripts/update-edweb-dr
every 2 hours. By default that
script is actually mastered at the same location on the live web.inf
site, so make any changes there and copy them to the DR machine, or
wait for the DR machine to copy it itself, at the next cron run.
While it isn't actually being web.inf, it responds as
web-dr.inf.ed.ac.uk. The external firewall holes are closed by
default.
If the main web.inf.ed.ac.uk site becomes unavailable for some reason,
then to switch to the DR mirror, then you:
- edit dripped's profile and
#define I_AM_WEB_INF_STANDIN
- update the DNS to point web.inf at dripped.inf
Do the above steps in that order, as doing 1 first (and once the profile has
pushed) will stop it trying to do further mirrors from web.inf (which
it is about to become).
The content should not be more than 2 hours out of date, and it will
be a full service, ie editors will be able to edit content etc.
As the standin will be a full service, either people will have to be
asked NOT to make changes, so that you can simply switch back. Or you
will have to migrate changes made on the DR machine, back to the
regular live machine.
It is assumed that you are switching because the regular web.inf is
unavailable. You don't want it to be possible for people to be
updating both the live and standin site, as content will be lost when
you switch back. If you are switching to the DR for convenience, eg
scheduled maintenance affecting the live service (rather than an
actual disaster), then make sure you stop apache on the live service.
If you need to migrate changes back to live service
If there have been changes made on the DR machine, that need to be
migrated back to the live service. The basics are:
- stop apacheconf and mysql on both machines
- copy /var/lib/mysql/ to /disk/data/mysql/ on DR machine (this is because /var/lib/mysql is where the DB lives on the DR machine)
- copy /disk/data from DR machine back to the live machine
- start apacheconf and mysql on the live machine
- update the DNS so web.inf points at the live machine
There will be a period while the DNS propagates that browsers still
resolving to the DR machine will get a "not responding" error. If this
is an issue, then you could restart apache on the DR machine, but if
anyone makes any edits on the DR machine, they will be lost when you
switch the DR machine back into standby mode. You could put the "old"
site in to maintenance mode, so that people do at least get a web page.
cd /disk/data/edweb
drush vset maintenance_mode 1
Set it back to "0" to exit maintenance mode.
Once it all seems to be working back on the live site, then remove the
#define I_AM_WEB_INF_STANDIN
from dripped's profile so it starts
mirroring the live site again. Note that at this point, any DR data
will be overwritten. So make sure you've got any copies if you need
it. Also, if you have put the standin site into maintenance mode as above, then this will also be undone on the first new copy of the data from the live site.
Bare "metal" restore
If you need to restore the service, and for some reason the DR site is
also unavailable, then you will need to use the last regular mirror
and/or tape backup of the data. In the steps below it is assumed the
lost data lived in /disk/data/ (the usual location), and would be
restored to the same place. This can be changed by setting
#define EDWEB_DATA_DIR
to the actual path.
The basic steps would be:
- install a new machine (or VM) with a similar profile to the lost
machine, a standard small server with at least a 50GB disk should
be fine, but without the live/edweb-school.h header active.
- Once the new machine is up and running, restore the last mirror
or tape backup of the previous machine's /disk/data/
- It should now be safe to activate the live/edweb-school.h header.
- Run updaterpms. Try starting mysql and make sure things like
om mysql runcommand
work. Once they do, continue.
- Reboot the new machine, as it is the simplest way to make sure
all the correct components are started and configured.
- That should be it, assuming the DNS for web.inf points at the
new machine, you should have your site back. If not, check that mysql
and apache are up and running, and check their error logs if not.
--
NeilBrown - 05 Jun 2018