Tobermory sl6_64 Upgrade Plan

This is the detailed plan for the upgrade of tobermory to SL6.

Preparation

  • Open a bunch of logins to tobermory, both by ssh and on the console.
  • nsu in some of them, including the console one.
  • Nagios has already been pacified.

Shut Down The Service

  • Uncomment the following in lcfg/tobermory:
/* Restrict access to subversion to MPU only */
!subversion.authzentries_lcfg                mSET(root)
!subversion.authzallow_lcfg_root               mSET(mpu ALL)
!subversion.authzallowperms_lcfg_root_mpu  mSET(rw)

!subversion.authzentries_dice                mSET(root)
!subversion.authzallow_dice_root               mSET(mpu ALL)
!subversion.authzallowperms_dice_root_mpu  mSET(rw)

!subversion.authzentries_source                mSET(root)
!subversion.authzallow_source_root               mSET(mpu ALL)
!subversion.authzallowperms_source_root_mpu  mSET(rw)

/* Stop rsync and rfe from starting automatically */
!boot.services mREMOVE(lcfg_rsync)
!boot.services mREMOVE(lcfg_rfe)
  • Wait for tobermory's new profile to reach it.

At this point non-MPU access to svn stops.

  • Stop the LCFG client on tobermory:
    • om client stop
    • less /var/lcfg/log/client

This protects tobermory from the following step.

  • Alter lcfg/tobermory to os/sl6_64.h.

  • Wait for pxeserver (/var/lcfg/log/pxeserver on schiff) and dhcpd (/etc/dhcpd.conf on abbado) to update.

Logins to tobermory may break at this point (though probably not, since we've stopped the client) but existing services & sessions will keep running.

  • om rfe stop on tobermory

Access to rfe lcfg/foo has now stopped.

  • om server stop on all slaves (mousa, trondra, vole, circlevm12, sernabatim, benaulim) - but keep apacheconf running

Profile building has now stopped. Slaves will continue to serve profiles but the profiles won't change.

Make Final Backups

  • on tobermory as root:
    • /usr/bin/om subversion dumpdb -- -r lcfg -d /var/lcfg/svndump/lcfg -o lcfg.sl5_final
    • /usr/bin/om subversion dumpdb -- -r dice -d /var/lcfg/svndump/dice -o dice.sl5_final
    • /usr/bin/om subversion dumpdb -- -r source -d /var/lcfg/svndump/source -o source.sl5_final

All three subversion repositories will now have been dumped to a copyable format.

  • on tobermory as postgres:
    • pg_dump orders > /var/rfedata/orders.sl5_final_backup

The orders database is now dumped.

  • Run rmirror on sauce.
    • om rmirror run lcfghdrs lcfgrfedata lcfgstablerelease lcfgtestingrelease svndatadir lcfgsvn autocheckout lcfgreleases
    • less /var/lcfg/log/rmirror

This will have backed up these directories (rsync modules) from tobermory to sauce:

/var/rfedata (lcfgrfedata)
/var/lcfg/releases/stable (lcfgstablerelease)
/var/lcfg/releases/testing (lcfgtestingrelease)
/var/svn (svndatadir)
/var/lcfg/svndump (lcfgsvn)
/var/lib/autocheckout (autocheckout)
/var/cache/lcfgreleases (lcfgreleases)

  • Stop the DR mirroring.
    • ssh sauce
    • nsu
    • crontab -e
      • and remove the '0,15,30,45 * * * * /usr/bin/om rmirror run' line.

  • Back up tobermory's / and /var partitions completely.
    They have rsync modules defined as follows:
[root]
readonly=yes
hosts allow=sauce.inf.ed.ac.uk
hosts deny=*
path=/
uid=0

[var]
readonly=yes
hosts allow=sauce.inf.ed.ac.uk
hosts deny=*
path=/var
uid=0
using these resources:
!rsync.modules 			mEXTRA(root)
rsync.mentries_root 		readonly allow deny path uid
rsync.mentry_root_readonly	readonly=yes
rsync.mentry_root_allow		hosts allow=sauce.inf.ed.ac.uk
rsync.mentry_root_deny		hosts deny=*
rsync.mentry_root_path		path=/
rsync.mentry_root_uid		uid=0

!rsync.modules 			mEXTRA(var)
rsync.mentries_var			readonly allow deny path uid
rsync.mentry_var_readonly	readonly=yes
rsync.mentry_var_allow		hosts allow=sauce.inf.ed.ac.uk
rsync.mentry_var_deny		hosts deny=*
rsync.mentry_var_path		path=/var
rsync.mentry_var_uid		uid=0
and are being backed up to sauce:/disk/useful/tobermory/backups using this script:
  • as root on sauce:
    • /disk/useful/tobermory/backups/run-backups
The script contains this:
#!/bin/bash
/usr/bin/rsync -v -a -A -X -x -x -S tobermory.inf.ed.ac.uk::root/ /disk/useful/tobermory/backups/root/
/usr/bin/rsync -v -a -A -X -x -x -S tobermory.inf.ed.ac.uk::var/ /disk/useful/tobermory/backups/var/

# -v    verbose
# -a    do the sensible stuff
# -A    preserve ACLs
# -X    preserve extended attributes
# -x    don't cross filesystem boundaries
# -x    and omit mountpoints
# -S    handle sparse files properly

After running the script, all data has now been backed up.

Installation

  • Install SL6_64 on tobermory.

Recover The Data

  • Login to tobermory
  • Make /var/lcfg/svndump (and other important dirs) if not already done:
    • om file configure
    • less /var/lcfg/log/file
  • Restore all of /var/lcfg/svndump:
    • rsync  -v -a -A -X -x -x -S sauce::tobermoryvar/lcfg/svndump/ /var/lcfg/svndump/

Subversion dumps are now present.

  • Restore /var/rfedata:
    • rsync  -v -a -A -X -x -x -S sauce::tobermoryvar/rfedata/ /var/rfedata/

The LCFG source files exist again.

  • Restore /var/lcfg/releases:
    • rsync  -v -a -A -X -x -x -S sauce::tobermoryvar/lcfg/releases/ /var/lcfg/releases/

The stable and testing releases are now there.

* restore /var/cache/lcfgreleases:

    • rsync  -v -a -A -X -x -x -S sauce::tobermoryvar/cache/lcfgreleases/ /var/cache/lcfgreleases/

Stable releases cache restored.

  • Start the subversion component if not already started:
    • om subversion start

The repository exists once more, though it's empty as yet.

  • Reload the lcfg repository:
    • svnadmin load /var/svn/lcfg < /var/lcfg/svndump/lcfg/lcfg.sl5_final

The lcfg repository now contains our data.

  • Recreate the pre-commit and post-commit hooks:
    • om file configure
This should make links from /var/svn/lcfg/hooks/pre-commit to /usr/lib/lcfg/lcfg-svn-hooks/pre-commit and from /var/svn/lcfg/hooks/post-commit to /usr/lib/lcfg/lcfg-svn-hooks/post-commit

The lcfg repository hooks have now been restored.

Start The Service Running

  • Restart apacheconf on tobermory.
    • om apacheconf restart and check the log.
Apache may have failed to start because of the svn repositories' absence. Also, apacheconf failing to start at this point may be an indication of a problem with the restored data.

  • Check that autocheckout is working.
    • Check out something from the repository; change it; commit
    • Look for your change (or anything) in /var/lib/autocheckout/lcfg.
    • If this doesn't work, check that the permissions and ownership on the autocheckout directory match those on the svn repository sufficiently to allow the apache account permission to do a check out, e.g.:
[tobermory]root: ls -ld /var/svn/lcfg /var/lib/autocheckout/lcfg /var/lib/autocheckout/lcfg/lcfg
drwxrwsr-x 3 apache lcfgsvn 4096 Mar 17  2008 /var/lib/autocheckout/lcfg
drwxrws--- 7 root   apache  4096 Mar 17  2008 /var/svn/lcfg
drwxrwxr-x 5 apache lcfgsvn 4096 Mar 17  2008 /var/lib/autocheckout/lcfg/lcfg

Also check /var/lib/autocheckout itself - it should have this owner, group & permissions:

drwxrwxr-x 3 root lcfg 4096 Mar 17  2008 /var/lib/autocheckout
Make a functioning new one as follows:
cd /var/lib/
mv autocheckout autocheckout-aside
mkdir autocheckout
chown root:lcfg autocheckout
chmod 755 autocheckout
om file configure

/var/lib/autocheckout should now be populated. The develop and default releases should now be there.

The MPU should now have full access to svn.

  • Start the rfe component on tobermory
    • om rfe start

CO access to lcfg/foo is restored.

  • Start the rsync component on tobermory
    • om rsync start

Changes are now available to slaves again.

  • Remove the boot.services alterations from lcfg/tobermory
  • Remove subversion.authzallow restrictions from lcfg/tobermory

This will set the repository access back to normal once tobermory has its new profile.

  • Delete caches on main LCFG slaves mousa and trondra to speed up rebuilds:
    • rm -f /var/lcfg/conf/server/cache/*

  • Gentlemen, start your engines.
    • om mousa.server start
    • om trondra.server start
    • om vole.server start
    • om sernabatim.server start
    • om benaulim.server start
    • om circlevm12.server start

Mammoth rebuilds now hopefully start.

  • Mail a progress report to cos

COs will now have access to rfe but not yet to subversion, not until tobermory has its new profile.

Restore the dice and source repositories too

  • Reload the dice repository:
    • svnadmin load /var/svn/dice < /var/lcfg/svndump/dice/dice.sl5_final

The dice repository has now been restored. There are no commit hooks to restore.

  • Reload the source repository:
    • svnadmin load /var/svn/source < /var/lcfg/svndump/source/source.sl5_final

  • Restore svn hooks:
    • om file configure
    • Check the hooks in /var/svn/source/hooks.

Final Touches

  • Wait for the rebuilds to finish. This is expected to take 80 minutes or so, provided the caches are empty.

CO access to svn has been restored. The LCFG service is now functional. Profile-building is now back to normal.

  • Re-enable the 15 minute rmirror cron job on sauce if LCFG hasn't already done it.
    • ssh sauce
    • nsu
    • crontab -l
      • and check that the 0,15,30,45 rmirror line is there.
    • Check logs for errors after first rmirror run.

The DR arrangements are back in place.

  • Announce to COs & LCFG deployers

  • Ask Alastair to restore the ordershost database.

-- ChrisCooke - 10 Apr 2012

Edit | Attach | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r7 - 15 Apr 2012 - 08:01:15 - ChrisCooke
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies