Plan for moving the LCFG master to a new server

The LCFG master service will be moved from tobermory to schiff on Tuesday 30th October, here's the plan:

0. Preparation

a) Verify that none of our LCFG source profiles or headers refer to the tobermory host name when they should be referring to the lcfg-master alias.

b) At least 24 hours beforehand, alter the DNS TTL for svn.lcfg.org to 30 minutes (1800 seconds) so that users do not have a long wait for the DNS change to propagate.

rfe dns/lcfg_org

c) Do rsync copies of all the data (see section 6 for details) so that on the day there is very little data to transfer.

d) Email Informatics COs and external users clearly stating when the downtime will occur and what services will be affected.

1. Disable nagios

Mark downtime for both tobermory and schiff so that we do not get any unnecessary nags for the duration of the transfer process.

2. Disable services on schiff

Some services on schiff need to be disabled to ensure people do not get access too early.

om schiff.apacheconf stop
om schiff.rfe stop
om schiff.rsync stop

3. Block write access to data

It's probably worth warning people on the cos chatroom just before this step to ensure we don't upset anyone in the middle of committing a change.

om tobermory.apacheconf stop
om tobermory.rfe stop
om tobermory.rmirror stop

Stopping apache blocks write access via subversion clients, stopping rfe prevents the modification of any LCFG source profiles. The rmirror is stopped to avoid syncing any new inventory headers from the school DB server.

4. Stop LCFG slaves

om lcfg1.server stop
om lcfg3.server stop
om lcfgtest.server stop
om diydice.server stop

On each machine check /var/lcfg/log/server and ensure everything is quiet before doing the stop. Clients will continue to be able to pull down LCFG XML profiles but, obviously, will not receive any changes.

5. Final backups

Take dumps for all the subversion repositories. We won't need these unless something goes badly wrong as we plan to just use the rsync copies of the data directories rather than going through the, painfully slow, dump and restore process.

om tobermory.subversion dumpdb -- -r lcfg -d /var/lcfg/svndump/lcfg -g -k 30
om tobermory.subversion dumpdb -- -r source -d /var/lcfg/svndump/source -g -k 30
om tobermory.subversion dumpdb -- -r dice -d /var/lcfg/svndump/dice -g -k 30

Do a final mirror run on the DR server so we have a complete snapshot then stop the component so we do not start mirroring from schiff too early. Need to check /var/lcfg/log/rmirror on sauce once the backup run is complete so that we are confident there were no errors.

om lcfg-dr.rmirror run
om lcfg-dr.rmirror stop

6. Final data transfer

On schiff do a final rsync copy of the data from tobermory

rsync -av -A -X --delete tobermory::lcfgsvn/ /var/lcfg/svndump/
rsync -av -A -X --delete tobermory::autocheckout/ /var/lib/autocheckout/
rsync -av -A -X --delete tobermory::svndatadir/ /var/svn/
rsync -av -A -X --delete tobermory::lcfgrfedata/ /var/rfedata/
rsync -av -A -X --delete tobermory::lcfgstablerelease/ /var/lcfg/releases/stable/
rsync -av -A -X --delete tobermory::lcfgtestingrelease/ /var/lcfg/releases/testing/
rsync -av -A -X --delete tobermory::lcfgreleases/ /var/cache/lcfgreleases/
rsync -av -A -X --delete tobermory::infinv/ /var/lcfg/conf/informatics_inventory/

7. Disable rsync access

Stop rsync on tobermory to prevent any further connections and cut-off any long-running connections which will need to switch to schiff

om tobermory.rsync stop

8. Change DNS

Need to edit the dns/inf map to move the aliases lcfg-master, lcfgsvn and ordershost over to schiff.

Need to edit the dns/lcfg_org map to move the entry for svn to point to schiff

Kick the DNS component on the following machines: schiff, dammers (x509 server), cockerel (nagios server)

om schiff.dns update
om dammers.dns update
om cockerel.dns update

9. Configure schiff

Restart various components to ensure we have all the necessary configuration files.

om schiff.x509 restart
om schiff.file restart
om schiff.subversion restart

10. Start Services

om schiff.apacheconf start
om schiff.rfe start
om schiff.rsync start

Important: Need to test these to ensure they all work correctly before going any further...

11. Start LCFG slaves

om lcfg1.server start
om lcfg3.server start
om lcfgtest.server start
om diydice.server start

After starting each LCFG server check the log (/var/lcfg/log/server) to ensure there are no errors. If everything went well we will NOT get a full rebuild but it's not guaranteed...

12. Start DR mirroring

Only do this when we are absolutely sure that schiff is working properly otherwise we will overwrite our good backups with duff data.

om lcfg-dr.rmirror start

13. Tidying Up

a) Inform Informatics COs and external users from other schools that the service has been restored.

b) Revert the DNS TTL on svn.lcfg.org to the standard 86400 seconds.

rfe dns/lcfg_org

c) In the short term we want to retain tobermory with its configuration and data intact in case we've missed anything. However it's good to close the firewall holes for tobermory, do this in the lcfg profile (lcfg/tobermory)

!ipfilter.export mSET()

d) Trawl the wiki for references to tobermory and, where appropriate, replace them with lcfg-master

-- StephenQuinney - 29 Oct 2012

Edit | Attach | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 29 Oct 2012 - 14:19:57 - StephenQuinney
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies