How to restore the service

The step by step, hand holding, documentation isn't quite here yet. It will be fleshed out "real soon now". However, for now, some brief pointers are included:

  • Find your suitable lump of hardware, could be a VM, if your happy about its resilience. It will need 100+GB of extra disk space (at the time of writing) to restore the data to. More would be better.
  • If possible, you really want to have this machine answer to the same IP addresses that wafer did, otherwise more fixups will be needed.
  • Copy the non-hardware specific bits from wafer's profile (current www.inf server) and reinstall. You may want to go with a basic profile first and install that, and then add the bits from wafer. Especially all the extra network port stuff.
  • To keep things simple, have your extra 100+GB disk space mounted as /disk/data/. If you don't mount it there, then you will need to update some of the #defines in the profile, and update paths mentioned below to your location.
  • www.inf uses a "proper" comodo SSL cert. The Infrastructure Unit are the keeper of the .crt and .key files and they need to be copied to /etc/httpd/conf/ssl.crt/, /etc/httpd/conf/ssl.key/ and /etc/httpd/conf/ssl.crt/comodo.chain. Or the offsite DR machine (see below) also has copies in those locations.
  • Now restore wafer's /disk/data to your machine, either from the mirror (currently) on lammasu:/disk/rmirror16/www.inf/ or from the tape backups. But see info on the offsite DR machine later.
  • The restored data will contain Postgres and Zope/Plone databases in a non-consistent state. You need to rebuild those from the regular snap shots are taken (and are included in the restored data). See sections below.
  • Once you've rebuilt Postgres and Zope, then that should be it, all the data should be back, and apache serving the correct content. I'd reboot, just to make sure, then try some URLs, to check.
  • Your new machine should have an AFS id, and to part of the AFS group system:infmainweb, or some small parts will not be accessible. This is mainly for the RAT unit.
  • If all looks well, then if www.inf has been off the air while you were recovering the system, then it should just work. If you've updated the DNS to point www.inf at a standin service, see ServiceUnitWwwOffsiteDrPlan, or if your new machine is listening on different IP addresses. Then you'll need to update the DNS. If it is different IP addresses, then you'll need to update the apache config, and zope/plone config to use those IP addresses.

Rebuilding PostgreSQL

The mirrors and tape backups contain a non-quiescent copy of the database, and a postgres dump of the database. You need to throw away the non-quiescent live copy and rebuild it from the dump. As root on the new machine do:

  • dropdb -U postgres osdb
  • Find the most recent dump file in /disk/data/infweb/osdb/db_dump/ say "osdb-Mon.sql" in this example.
  • restore the DB with grep -Ev "^(CREATE|DROP) ROLE " /disk/data/infweb/osdb/db_dump/osdb-Mon.sql | psql -U postgres

That should be it.

Rebuilding Zope Data

Again the mirrors and tape backups contain a non-quiescent copy of the Zope database, and a restorable dump of the database. You need to throw away the non-quiescent live copy and rebuild it from the dump. As root on the new machine do, each bullet is a single command line:

  • om zope stop # make sure there are no zope processes running
  • rm -rf /disk/data/plone/zope
  • /usr/lib/zope/bin/ -d /disk/data/plone/zope -u foo:bar
  • cp -af /disk/data/plone/zopebackup/_disk_data_plone_zope/instance/{etc,Extensions,import,Products} /disk/data/plone/zope
  • /usr/lib/zope/bin/ -R -r /disk/data/plone/zopebackup/_disk_data_plone_zope/repozo-backup -o /disk/data/plone/zope/var/Data.fs
  • chown -R zope /disk/data/plone/zope
  • om zope start # check it started up OK

That should be it.

Off Site DR Copy

If the offsite DR copy of www.inf ServiceUnitWwwOffsiteDrPlan is available, then there will be more up to date copies of the www.inf data in the mirrors it takes, rather than the nightly mirror/backup of www.inf. So you may want to look at copying the restore data from there, currently canny:/disk/data/mirror/wwwinf/. Also look at the "fixup" scripts it runs to automate the restore of the Zope and Postgres Databases. It also copes with IP address changes.

-- NeilBrown - 06 Mar 2012

Topic revision: r2 - 26 Mar 2012 - 16:29:08 - NeilBrown
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies