Tobermory sl6_64 Upgrade Plan
This is the detailed plan for the upgrade of tobermory to SL6.
Preparation
- Open a bunch of logins to tobermory, both by ssh and on the console.
-
nsu
in some of them, including the console one.
- Nagios has already been pacified.
Shut Down The Service
- Uncomment the following in lcfg/tobermory:
/* Restrict access to subversion to MPU only */
!subversion.authzentries_lcfg mSET(root)
!subversion.authzallow_lcfg_root mSET(mpu ALL)
!subversion.authzallowperms_lcfg_root_mpu mSET(rw)
!subversion.authzentries_dice mSET(root)
!subversion.authzallow_dice_root mSET(mpu ALL)
!subversion.authzallowperms_dice_root_mpu mSET(rw)
!subversion.authzentries_source mSET(root)
!subversion.authzallow_source_root mSET(mpu ALL)
!subversion.authzallowperms_source_root_mpu mSET(rw)
/* Stop rsync and rfe from starting automatically */
!boot.services mREMOVE(lcfg_rsync)
!boot.services mREMOVE(lcfg_rfe)
- Wait for tobermory's new profile to reach it.
At this point non-MPU access to svn stops.
- Stop the LCFG client on tobermory:
-
om client stop
-
less /var/lcfg/log/client
This protects tobermory from the following step.
- Alter lcfg/tobermory to os/sl6_64.h.
- Wait for pxeserver (
/var/lcfg/log/pxeserver
on schiff) and dhcpd (/etc/dhcpd.conf
on abbado) to update.
Logins to tobermory may break at this point (though probably not, since we've stopped the client) but existing services & sessions will keep running.
Access to rfe lcfg/foo has now stopped.
-
om server stop
on all slaves (mousa, trondra, vole, circlevm12, sernabatim, benaulim) - but keep apacheconf running
Profile building has now stopped. Slaves will continue to serve profiles but the profiles won't change.
Make Final Backups
- on tobermory as root:
-
/usr/bin/om subversion dumpdb -- -r lcfg -d /var/lcfg/svndump/lcfg -o lcfg.sl5_final
-
/usr/bin/om subversion dumpdb -- -r dice -d /var/lcfg/svndump/dice -o dice.sl5_final
-
/usr/bin/om subversion dumpdb -- -r source -d /var/lcfg/svndump/source -o source.sl5_final
All three subversion repositories will now have been dumped to a copyable format.
- on tobermory as postgres:
-
pg_dump orders > /var/rfedata/orders.sl5_final_backup
The orders database is now dumped.
- Run rmirror on sauce.
-
om rmirror run lcfghdrs lcfgrfedata lcfgstablerelease lcfgtestingrelease svndatadir lcfgsvn autocheckout lcfgreleases
-
less /var/lcfg/log/rmirror
This will have backed up these directories (rsync modules) from tobermory to sauce:
/var/rfedata (lcfgrfedata)
/var/lcfg/releases/stable (lcfgstablerelease)
/var/lcfg/releases/testing (lcfgtestingrelease)
/var/svn (svndatadir)
/var/lcfg/svndump (lcfgsvn)
/var/lib/autocheckout (autocheckout)
/var/cache/lcfgreleases (lcfgreleases)
- Stop the DR mirroring.
-
ssh sauce
-
nsu
-
crontab -e
- and remove the '0,15,30,45 * * * * /usr/bin/om rmirror run' line.
- Back up tobermory's / and /var partitions completely.
They have rsync modules defined as follows:
[root]
readonly=yes
hosts allow=sauce.inf.ed.ac.uk
hosts deny=*
path=/
uid=0
[var]
readonly=yes
hosts allow=sauce.inf.ed.ac.uk
hosts deny=*
path=/var
uid=0
using these resources:
!rsync.modules mEXTRA(root)
rsync.mentries_root readonly allow deny path uid
rsync.mentry_root_readonly readonly=yes
rsync.mentry_root_allow hosts allow=sauce.inf.ed.ac.uk
rsync.mentry_root_deny hosts deny=*
rsync.mentry_root_path path=/
rsync.mentry_root_uid uid=0
!rsync.modules mEXTRA(var)
rsync.mentries_var readonly allow deny path uid
rsync.mentry_var_readonly readonly=yes
rsync.mentry_var_allow hosts allow=sauce.inf.ed.ac.uk
rsync.mentry_var_deny hosts deny=*
rsync.mentry_var_path path=/var
rsync.mentry_var_uid uid=0
and are being backed up to sauce:/disk/useful/tobermory/backups using this script:
- as root on sauce:
-
/disk/useful/tobermory/backups/run-backups
The script contains this:
#!/bin/bash
/usr/bin/rsync -v -a -A -X -x -x -S tobermory.inf.ed.ac.uk::root/ /disk/useful/tobermory/backups/root/
/usr/bin/rsync -v -a -A -X -x -x -S tobermory.inf.ed.ac.uk::var/ /disk/useful/tobermory/backups/var/
# -v verbose
# -a do the sensible stuff
# -A preserve ACLs
# -X preserve extended attributes
# -x don't cross filesystem boundaries
# -x and omit mountpoints
# -S handle sparse files properly
After running the script, all data has now been backed up.
Installation
- Install SL6_64 on tobermory.
Recover The Data
- Login to tobermory
- Make
/var/lcfg/svndump
(and other important dirs) if not already done:
-
om file configure
-
less /var/lcfg/log/file
- Restore all of
/var/lcfg/svndump
:
-
rsync -v -a -A -X -x -x -S sauce::tobermoryvar/lcfg/svndump/ /var/lcfg/svndump/
Subversion dumps are now present.
- Restore
/var/rfedata
:
-
rsync -v -a -A -X -x -x -S sauce::tobermoryvar/rfedata/ /var/rfedata/
The LCFG source files exist again.
- Restore
/var/lcfg/releases
:
-
rsync -v -a -A -X -x -x -S sauce::tobermoryvar/lcfg/releases/ /var/lcfg/releases/
The stable and testing releases are now there.
* restore
/var/cache/lcfgreleases
:
-
-
rsync -v -a -A -X -x -x -S sauce::tobermoryvar/cache/lcfgreleases/ /var/cache/lcfgreleases/
Stable releases cache restored.
- Start the subversion component if not already started:
The repository exists once more, though it's empty as yet.
- Reload the lcfg repository:
-
svnadmin load /var/svn/lcfg < /var/lcfg/svndump/lcfg/lcfg.sl5_final
The lcfg repository now contains our data.
- Recreate the pre-commit and post-commit hooks:
This should make links from
/var/svn/lcfg/hooks/pre-commit
to
/usr/lib/lcfg/lcfg-svn-hooks/pre-commit
and from
/var/svn/lcfg/hooks/post-commit
to
/usr/lib/lcfg/lcfg-svn-hooks/post-commit
The lcfg repository hooks have now been restored.
Start The Service Running
- Restart apacheconf on tobermory.
-
om apacheconf restart
and check the log.
Apache may have failed to start because of the svn repositories' absence. Also, apacheconf failing to start at this point may be an indication of a problem with the restored data.
- Check that autocheckout is working.
- Check out something from the repository; change it; commit
- Look for your change (or anything) in
/var/lib/autocheckout/lcfg
.
- If this doesn't work, check that the permissions and ownership on the autocheckout directory match those on the svn repository sufficiently to allow the apache account permission to do a check out, e.g.:
[tobermory]root: ls -ld /var/svn/lcfg /var/lib/autocheckout/lcfg /var/lib/autocheckout/lcfg/lcfg
drwxrwsr-x 3 apache lcfgsvn 4096 Mar 17 2008 /var/lib/autocheckout/lcfg
drwxrws--- 7 root apache 4096 Mar 17 2008 /var/svn/lcfg
drwxrwxr-x 5 apache lcfgsvn 4096 Mar 17 2008 /var/lib/autocheckout/lcfg/lcfg
Also check
/var/lib/autocheckout
itself - it should have this owner, group & permissions:
drwxrwxr-x 3 root lcfg 4096 Mar 17 2008 /var/lib/autocheckout
Make a functioning new one as follows:
cd /var/lib/
mv autocheckout autocheckout-aside
mkdir autocheckout
chown root:lcfg autocheckout
chmod 755 autocheckout
om file configure
/var/lib/autocheckout
should now be populated.
The develop and default releases should now be there.
The MPU should now have full access to svn.
- Start the rfe component on tobermory
CO access to lcfg/foo is restored.
- Start the rsync component on tobermory
Changes are now available to slaves again.
- Remove the boot.services alterations from lcfg/tobermory
- Remove subversion.authzallow restrictions from lcfg/tobermory
This will set the repository access back to normal once tobermory has its new profile.
- Delete caches on main LCFG slaves mousa and trondra to speed up rebuilds:
-
rm -f /var/lcfg/conf/server/cache/*
- Gentlemen, start your engines.
-
om mousa.server start
-
om trondra.server start
-
om vole.server start
-
om sernabatim.server start
-
om benaulim.server start
-
om circlevm12.server start
Mammoth rebuilds now hopefully start.
- Mail a progress report to cos
COs will now have access to rfe but not yet to subversion, not until tobermory has its new profile.
Restore the dice and source repositories too
- Reload the dice repository:
-
svnadmin load /var/svn/dice < /var/lcfg/svndump/dice/dice.sl5_final
The dice repository has now been restored. There are no commit hooks to restore.
- Reload the source repository:
-
svnadmin load /var/svn/source < /var/lcfg/svndump/source/source.sl5_final
- Restore svn hooks:
-
om file configure
- Check the hooks in
/var/svn/source/hooks
.
Final Touches
- Wait for the rebuilds to finish. This is expected to take 80 minutes or so, provided the caches are empty.
CO access to svn has been restored.
The LCFG service is now functional.
Profile-building is now back to normal.
- Re-enable the 15 minute rmirror cron job on sauce if LCFG hasn't already done it.
-
ssh sauce
-
nsu
-
crontab -l
- and check that the
0,15,30,45 rmirror
line is there.
- Check logs for errors after first rmirror run.
The DR arrangements are back in place.
- Announce to COs & LCFG deployers
- Ask Alastair to restore the ordershost database.
--
ChrisCooke - 10 Apr 2012