Plan for moving the LCFG master to a new server

The LCFG master service will be moved from tobermory to schiff on Tuesday 30th October, here's the plan:

0. Preparation

a) Verify that none of our LCFG source profiles or headers refer to the tobermory host name when they should be referring to the lcfg-master alias.

b) At least 24 hours beforehand, alter the DNS TTL for svn.lcfg.org to 30 minutes (1800 seconds) so that users do not have a long wait for the DNS change to propagate.

rfe dns/lcfg_org

c) Do rsync copies of all the data (see section 6 for details) so that on the day there is very little data to transfer.

d) Email Informatics COs and external users clearly stating when the downtime will occur and what services will be affected.

1. Disable nagios

Mark downtime for both tobermory and schiff so that we do not get any unnecessary nags for the duration of the transfer process.

2. Disable services on schiff

Some services on schiff need to be disabled to ensure people do not get access too early.

om schiff.apacheconf stop
om schiff.rfe stop
om schiff.rsync stop

3. ordershost

Note: doing this ahead of everything else to avoid disrupting Sheila too much

a) Alter the DNS entry for ordershost so it points to schiff and then update dns on tobermory and schiff

b) Disable the ordershost web interface on tobermory (rfe lcfg/tobermory)

!apacheconf.vhosts mREMOVE(ordershost)

c) Transfer the DB

On schiff create the new orders database.

nsu postgres
psql --dbname template1 --file /usr/lib/orders/bin/recreate_orders

On tobermory dump the data into a file and copy to schiff

pg_dump --data-only  --file /tmp/orders_data.sql orders
scp /tmp/orders_data.sql schiff:/tmp

Stop postgres on tobermory so that it cannot accept any more changes (e.g. from clientreports)

om tobermory.postgresql stop

On schiff load the data into the empty orders database.

nsu postgres
psql --dbname orders --file /tmp/orders_data.sql --single-transaction

The --single-transaction option is incredibly useful here in that no data will be inserted if an error occurs, the alternative is that you end up having to drop the db and start again...

4. Block write access to data

It's probably worth warning people on the cos chatroom just before this step to ensure we don't upset anyone in the middle of committing a change.

om tobermory.apacheconf stop
om tobermory.rfe stop
om tobermory.rmirror stop

Stopping apache blocks write access via subversion clients, stopping rfe prevents the modification of any LCFG source profiles. The rmirror is stopped to avoid syncing any new inventory headers from the school DB server.

5. Stop LCFG slaves

om lcfg1.server stop
om lcfg3.server stop
om lcfgtest.server stop
om diydice.server stop

On each machine check /var/lcfg/log/server and ensure everything is quiet before doing the stop. Clients will continue to be able to pull down LCFG XML profiles but, obviously, will not receive any changes.

6. Final backups

Take dumps for all the subversion repositories. We won't need these unless something goes badly wrong as we plan to just use the rsync copies of the data directories rather than going through the, painfully slow, dump and restore process.

om tobermory.subversion dumpdb -- -r lcfg -d /var/lcfg/svndump/lcfg -g -k 30
om tobermory.subversion dumpdb -- -r source -d /var/lcfg/svndump/source -g -k 30
om tobermory.subversion dumpdb -- -r dice -d /var/lcfg/svndump/dice -g -k 30

Do a final mirror run on the DR server so we have a complete snapshot then stop the component so we do not start mirroring from schiff too early. Need to check /var/lcfg/log/rmirror on sauce once the backup run is complete so that we are confident there were no errors.

om lcfg-dr.rmirror run
om lcfg-dr.rmirror stop

7. Final data transfer

On schiff do a final rsync copy of the data from tobermory

rsync -av -A -X --delete tobermory::lcfgsvn/ /var/lcfg/svndump/
rsync -av -A -X --delete tobermory::autocheckout/ /var/lib/autocheckout/
rsync -av -A -X --delete tobermory::svndatadir/ /var/svn/
rsync -av -A -X --delete tobermory::lcfgrfedata/ /var/rfedata/
rsync -av -A -X --delete tobermory::lcfgstablerelease/ /var/lcfg/releases/stable/
rsync -av -A -X --delete tobermory::lcfgtestingrelease/ /var/lcfg/releases/testing/
rsync -av -A -X --delete tobermory::lcfgreleases/ /var/cache/lcfgreleases/
rsync -av -A -X --delete tobermory::infinv/ /var/lcfg/conf/informatics_inventory/

8. Disable rsync access

Stop rsync on tobermory to prevent any further connections and cut-off any long-running connections which will need to switch to schiff

om tobermory.rsync stop

9. Change DNS

Need to edit the dns/inf map to move the aliases lcfg-master, lcfgsvn and ordershost over to schiff.

Need to edit the dns/lcfg_org map to move the entry for svn to point to schiff

Kick the DNS component on the following machines: schiff, dammers (x509 server), cockerel (nagios server)

om schiff.dns update
om dammers.dns update
om cockerel.dns update

10. Configure schiff

Restart various components to ensure we have all the necessary configuration files.

om schiff.x509 restart
om schiff.file restart
om schiff.subversion restart

11. Start Services

om schiff.apacheconf start
om schiff.rfe start
om schiff.rsync start

Important: Need to test these to ensure they all work correctly before going any further...

12. Start LCFG slaves

om lcfg1.server start
om lcfg3.server start
om lcfgtest.server start
om diydice.server start

After starting each LCFG server check the log (/var/lcfg/log/server) to ensure there are no errors. If everything went well we will NOT get a full rebuild but it's not guaranteed...

13. Start DR mirroring

Only do this when we are absolutely sure that schiff is working properly otherwise we will overwrite our good backups with duff data.

om lcfg-dr.rmirror start

14. Tidying Up

a) Inform Informatics COs and external users from other schools that the service has been restored.

b) Revert the DNS TTL on svn.lcfg.org to the standard 86400 seconds.

rfe dns/lcfg_org

c) In the short term we want to retain tobermory with its configuration and data intact in case we've missed anything. However it's good to close the firewall holes for tobermory, do this in the lcfg profile (lcfg/tobermory)

!ipfilter.export mSET()

d) Trawl the wiki for references to tobermory and, where appropriate, replace them with lcfg-master

-- StephenQuinney - 29 Oct 2012


This topic: DICE > WebHome > ManagedPlatformUnit > LCFGMasterMove
Topic revision: r2 - 30 Oct 2012 - 12:50:50 - StephenQuinney
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies