LCFG master SL7 upgrade plan

Previous upgrade notes: SL6, SL5

Preparation

Apply all firmware upgrades ahead of time, too risky to roll it all into one big upgrade event.

As the lcfg-master is so critical it is essential to have a complete backup available. That way if anything is lost in the upgrade process it can be easily and quickly restored.

Well ahead of time, find a suitable server with sufficient space to take a complete copy of the / and /var partitions (currently this needs a minimum of about 22Gb).

Add a new rsync module to the lcfg-master which allows backups of all the filesystems (using salamanca as the backup server):

!rsync.modules          mEXTRA(root)
rsync.mentries_root       readonly allow deny path uid list
rsync.mentry_root_readonly   readonly=yes
rsync.mentry_root_allow      hosts allow=salamanca.inf.ed.ac.uk
rsync.mentry_root_deny      hosts deny=*
rsync.mentry_root_path      path=/
rsync.mentry_root_uid      uid=0
rsync.mentry_root_list          list=no
rsync.monitor_root              no

then do some backups on salamanca (as root):

mkdir /disk/dr/lcfg-master/
rsync -v -a -A -X -x -x -S --delete --exclude var/ lcfg-master::root/ /disk/dr/lcfg-master/
rsync -v -a -A -X -x -x -S --delete lcfg-master::root/var/ /disk/dr/lcfg-master/var/

# -v     verbose
# -a     do the sensible stuff
# -A     preserve ACLs
# -X     preserve extended attributes
# -x -x  don't cross fs boundaries and omit mount-point dirs from the copy
# -S     handle sparse files properly

If the crossing of filesystem boundaries was permitted then all of AFS would be backed up as well. Note that this means /var has to be copied separately.

This needs to be done ahead of time (maybe an hour or so before) with an extra final (hopefully quick) backup just after all editting is blocked.

Add the reverse rsync access on the backup server for the lcfg-master so that it is easy to restore files if necessary:

#include <dice/options/rsync.h>

!rsync.modules          mEXTRA(backup)
rsync.mentries_backup       readonly allow deny path uid list
rsync.mentry_backup_readonly   readonly=yes
rsync.mentry_backup_allow   hosts allow=steen.inf.ed.ac.uk
rsync.mentry_backup_deny   hosts deny=*
rsync.mentry_backup_path   path=/disk/dr/lcfg-master/
rsync.mentry_backup_uid      uid=0
rsync.mentry_backup_list        list=no
rsync.monitor_backup            no

Remember to delete all this backup data once the upgrade is finished.

Also allow rsync access on the DR server (salamanca) for all DR rsync modules for the lcfg master:

!rsync.mentry_autocheckout_hallow        mADD(steen.inf.ed.ac.uk)
!rsync.mentry_infinv_allow               mADD(steen.inf.ed.ac.uk)
!rsync.mentry_lcfgdefaults_allow         mADD(steen.inf.ed.ac.uk)
!rsync.mentry_lcfginf_allow              mADD(steen.inf.ed.ac.uk)
!rsync.mentry_lcfgpreviousrelease_allow  mADD(steen.inf.ed.ac.uk)
!rsync.mentry_lcfgreleases_allow         mADD(steen.inf.ed.ac.uk)
!rsync.mentry_lcfgrfedata_allow          mADD(steen.inf.ed.ac.uk)
!rsync.mentry_lcfgstablerelease_allow    mADD(steen.inf.ed.ac.uk)
!rsync.mentry_lcfgsvn_hallow             mADD(steen.inf.ed.ac.uk)
!rsync.mentry_lcfgtest_allow             mADD(steen.inf.ed.ac.uk)
!rsync.mentry_lcfgtestingrelease_allow   mADD(steen.inf.ed.ac.uk)
!rsync.mentry_svndatadir_hallow          mADD(steen.inf.ed.ac.uk)

This will make it easy to restore anything which might be required.

Ensure that important services are not automatically started as soon as the machine finishes upgrading:

#ifdef LINUX_SL7
#ifdef FIRST_INSTALL

/* Avoid starting some services immediately after the upgrade */

!systemd.wanted_units_multiusertarget mREMOVE(httpd.service)
!systemd.wanted_units_multiusertarget mREMOVE(lcfg-apacheconf.service)

!systemd.wanted_units_multiusertarget mREMOVE(rfed.service)
!systemd.wanted_units_lcfgmultiuser   mREMOVE(lcfg-rfe.service)

!systemd.wanted_units_lcfgmultiuser   mREMOVE(lcfg-rsync.service)

!systemd.wanted_units_lcfgmultiuser   mREMOVE(lcfg-subversion.service)
#endif /* FIRST_INSTALL */
#endif /* SL7 */

Announce the downtime to cos and LCFG users.

Pre-Upgrade

1. Schedule downtime for steen with nagios

2. Stop apache to block any further svn access

om steen.apacheconf stop

3. Stop mirroring stuff to lcfg master from inventory

om steen.rmirror stop

4. Final subversion dumps (can take a while)

om steen.subversion dumpdb -- -r lcfg -d /var/lcfg/svndump/lcfg -g
om steen.subversion dumpdb -- -r source -d /var/lcfg/svndump/source -g
om steen.subversion dumpdb -- -r dice -d /var/lcfg/svndump/dice -g

5. Stop client on lcfg master (this is VERY IMPORTANT!)

om steen.client stop

6. Change OS in profile

rfe lcfg/steen

replace os/sl6_64.h with os/sl7.h

Also, ensure FIRST_INSTALL macro is defined at the start of the profile.

Check the changes have reached the DHCP server (dutoit) and the PXE server (hare).

7. Stop slaves from processing changes

rfe lcfgstop

8. Stop rfe to block further profile edits

om steen.rfe stop

9. Stop lcfg slaves completely

om leonardo.server stop
om rembrandt.server stop
om mole.server stop
om vole.server stop

10. Mirror everything to the DR server

om salamanca.rmirror run

11. Stop mirroring on DR server

om salamanca.rmirror stop

and manually remove all rmirror cronjobs (use crontab -e as root)

Check /var/lcfg/log/rmirror to ensure there are no errors.

12. Final rsync backup to salamanca

See preparation notes above for details.

Upgrade

Reinstall machine as SL7.

Post-Upgrade

1. Ensure file permissions are correct, some might get missed due to using groups which are in LDAP.

om file configure

2. Restore the following from backups on the DR server salamanca:

  • lcfgrfedata - Source profiles
  • svndatadir - All the subversion data files
  • lcfgsvn - All the nightly subversion dumps
  • lcfgreleases - All the weekly lcfg releases
  • lcfgstablerelease lcfgtestingrelease lcfgpreviousrelease - Current releases

rsync -av lcfg-dr::lcfgrfedata/ /var/rfedata/
rsync -av lcfg-dr::svndatadir/ /var/svn/
rsync -av lcfg-dr::lcfgsvn/ /var/lcfg/svndump/
rsync -av lcfg-dr::lcfgreleases/ /var/cache/lcfgreleases/
rsync -av lcfg-dr::lcfgstablerelease/ /var/lcfg/releases/stable/
rsync -av lcfg-dr::lcfgtestingrelease/ /var/lcfg/releases/testing/
rsync -av lcfg-dr::lcfgpreviousrelease/ /var/lcfg/releases/previous/
rsync -av lcfg-dr::backup/etc/lcfgbuilder.keytab /etc/
rsync -av lcfg-dr::backup/var/cvs/dice_archive/ /var/cvs/dice_archive/

Note that we are restoring the subversion repositories directly from the backups, the svn dump/load process is far too slow.

3. Start and test the subversion service

om subversion start
om apacheconf start

Make a minor change (e.g. whitespace only) to some header and commit. This will take a minute or so as the autocheckout stuff is created for the first time. Once the commit has completed check that the /var/lib/autocheckout/lcfg/lcfg directory exists and that the contents looks like:

ls -la /var/lib/autocheckout/lcfg/lcfg
total 20
drwxrwsr-x  5 apache lcfgsvn 4096 Jan 12 13:52 .
drwxrwsr-x  3 apache lcfgsvn 4096 Jan 12 13:52 ..
drwxrwsr-x 42 apache lcfgsvn 4096 Jan 12 13:53 branches
drwxrwsr-x  7 apache lcfgsvn 4096 Jan 12 13:52 core
drwxrwsr-x  5 apache lcfgsvn 4096 Jan 12 13:52 live

4. Restart and test the rfe service

om rfe start

Edit an LCFG profile with rfe and check that the process works correctly.

5. Restart and test the rsync service

om rsync start

6. Start the lcfg slaves

om leonardo.server start
om rembrandt.server start
om mole.server start
om vole.server start

7. Remove lcfgstop

rfe lcfgstop

8. Restart mirroring on DR server

Also fix the crontab

om salamanca.rmirror start
om salamanca.cron configure

9. Tidy profile

Remove hacks added to the LCFG profile for the LCFG master as part of the upgrade process (including FIRST_INSTALL macro in profile for steen).

rfe lcfg/steen
rfe lcfg/salamanca

10. After a week or two, once we're confident that nothing is missing, remove backup of SL6 master from DR server.

rm -rf /disk/dr/lcfg-master/

-- StephenQuinney - 06 Jan 2017

Topic revision: r5 - 26 Jan 2017 - 15:01:26 - StephenQuinney
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies