TWiki
>
DICE Web
>
ManagedPlatformUnit
>
LCFGMasterMove
(30 Oct 2012,
StephenQuinney
)
(raw view)
E
dit
A
ttach
---+ Plan for moving the LCFG master to a new server The LCFG master service will be moved from _tobermory_ to _schiff_ on Tuesday 30th October, here's the plan: ---++ 0. Preparation a) Verify that none of our LCFG source profiles or headers refer to the _tobermory_ host name when they should be referring to the _lcfg-master_ alias. b) At least 24 hours beforehand, alter the DNS TTL for _svn.lcfg.org_ to 30 minutes (1800 seconds) so that users do not have a long wait for the DNS change to propagate. <verbatim> rfe dns/lcfg_org </verbatim> c) Do rsync copies of all the data (see section 6 for details) so that on the day there is very little data to transfer. d) Email Informatics COs and external users clearly stating when the downtime will occur and what services will be affected. ---++ 1. Disable nagios Mark downtime for both _tobermory_ and _schiff_ so that we do not get any unnecessary nags for the duration of the transfer process. ---++ 2. Disable services on _schiff_ Some services on _schiff_ need to be disabled to ensure people do not get access too early. <verbatim> om schiff.apacheconf stop om schiff.rfe stop om schiff.rsync stop </verbatim> ---++ 3. ordershost _Note: doing this ahead of everything else to avoid disrupting Sheila too much_ a) Alter the DNS entry for ordershost so it points to _schiff_ and then update dns on _tobermory_ and _schiff_ b) Disable the ordershost web interface on _tobermory_ (=rfe lcfg/tobermory=) <verbatim> !apacheconf.vhosts mREMOVE(ordershost) </verbatim> c) Transfer the DB On _schiff_ create the new orders database. <verbatim> nsu postgres psql --dbname template1 --file /usr/lib/orders/bin/recreate_orders </verbatim> On _tobermory_ dump the data into a file and copy to _schiff_ <verbatim> pg_dump --data-only --file /tmp/orders_data.sql orders scp /tmp/orders_data.sql schiff:/tmp </verbatim> Stop postgres on _tobermory_ so that it cannot accept any more changes (e.g. from clientreports) <verbatim> om tobermory.postgresql stop </verbatim> On _schiff_ load the data into the empty orders database. <verbatim> nsu postgres psql --dbname orders --file /tmp/orders_data.sql --single-transaction </verbatim> The =--single-transaction= option is incredibly useful here in that no data will be inserted if an error occurs, the alternative is that you end up having to drop the db and start again... ---++ 4. Block write access to data It's probably worth warning people on the cos chatroom just before this step to ensure we don't upset anyone in the middle of committing a change. <verbatim> om tobermory.apacheconf stop om tobermory.rfe stop om tobermory.rmirror stop </verbatim> Stopping apache blocks write access via subversion clients, stopping rfe prevents the modification of any LCFG source profiles. The rmirror is stopped to avoid syncing any new inventory headers from the school DB server. ---++ 5. Stop LCFG slaves <verbatim> om lcfg1.server stop om lcfg3.server stop om lcfgtest.server stop om diydice.server stop </verbatim> On each machine check =/var/lcfg/log/server= and ensure everything is quiet before doing the stop. Clients will continue to be able to pull down LCFG XML profiles but, obviously, will not receive any changes. ---++ 6. Final backups Take dumps for all the subversion repositories. We won't need these unless something goes badly wrong as we plan to just use the rsync copies of the data directories rather than going through the, painfully slow, dump and restore process. <verbatim> om tobermory.subversion dumpdb -- -r lcfg -d /var/lcfg/svndump/lcfg -g -k 30 om tobermory.subversion dumpdb -- -r source -d /var/lcfg/svndump/source -g -k 30 om tobermory.subversion dumpdb -- -r dice -d /var/lcfg/svndump/dice -g -k 30 </verbatim> Do a final mirror run on the DR server so we have a complete snapshot then stop the component so we do not start mirroring from _schiff_ too early. Need to check =/var/lcfg/log/rmirror= on _sauce_ once the backup run is complete so that we are confident there were no errors. <verbatim> om lcfg-dr.rmirror run om lcfg-dr.rmirror stop </verbatim> ---++ 7. Final data transfer On _schiff_ do a final rsync copy of the data from _tobermory_ <verbatim> rsync -av -A -X --delete tobermory::lcfgsvn/ /var/lcfg/svndump/ rsync -av -A -X --delete tobermory::autocheckout/ /var/lib/autocheckout/ rsync -av -A -X --delete tobermory::svndatadir/ /var/svn/ rsync -av -A -X --delete tobermory::lcfgrfedata/ /var/rfedata/ rsync -av -A -X --delete tobermory::lcfgstablerelease/ /var/lcfg/releases/stable/ rsync -av -A -X --delete tobermory::lcfgtestingrelease/ /var/lcfg/releases/testing/ rsync -av -A -X --delete tobermory::lcfgreleases/ /var/cache/lcfgreleases/ rsync -av -A -X --delete tobermory::infinv/ /var/lcfg/conf/informatics_inventory/ </verbatim> ---++ 8. Disable rsync access Stop rsync on _tobermory_ to prevent any further connections and cut-off any long-running connections which will need to switch to _schiff_ <verbatim> om tobermory.rsync stop </verbatim> ---++ 9. Change DNS Need to edit the =dns/inf= map to move the aliases _lcfg-master_, _lcfgsvn_ and _ordershost_ over to _schiff_. Need to edit the =dns/lcfg_org= map to move the entry for _svn_ to point to _schiff_ Kick the DNS component on the following machines: _schiff_, _dammers_ (x509 server), cockerel (nagios server) <verbatim> om schiff.dns update om dammers.dns update om cockerel.dns update </verbatim> ---++ 10. Configure _schiff_ Restart various components to ensure we have all the necessary configuration files. <verbatim> om schiff.x509 restart om schiff.file restart om schiff.subversion restart </verbatim> ---++ 11. Start Services <verbatim> om schiff.apacheconf start om schiff.rfe start om schiff.rsync start </verbatim> Important: Need to test these to ensure they all work correctly before going any further... ---++ 12. Start LCFG slaves <verbatim> om lcfg1.server start om lcfg3.server start om lcfgtest.server start om diydice.server start </verbatim> After starting each LCFG server check the log (=/var/lcfg/log/server=) to ensure there are no errors. If everything went well we will NOT get a full rebuild but it's not guaranteed... ---++ 13. Start DR mirroring Only do this when we are *absolutely* sure that _schiff_ is working properly otherwise we will overwrite our good backups with duff data. <verbatim> om lcfg-dr.rmirror start </verbatim> ---++ 14. Tidying Up a) Inform Informatics COs and external users from other schools that the service has been restored. b) Revert the DNS TTL on _svn.lcfg.org_ to the standard 86400 seconds. <verbatim> rfe dns/lcfg_org </verbatim> c) In the short term we want to retain _tobermory_ with its configuration and data intact in case we've missed anything. However it's good to close the firewall holes for _tobermory_, do this in the lcfg profile (=lcfg/tobermory=) <verbatim> !ipfilter.export mSET() </verbatim> d) Trawl the wiki for references to _tobermory_ and, where appropriate, replace them with _lcfg-master_ -- Main.StephenQuinney - 29 Oct 2012
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r2
<
r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r2 - 30 Oct 2012 - 12:50:50 -
StephenQuinney
DICE
DICE Web
DICE Wiki Home
Changes
Index
Search
Meetings
CEG
Operational
Computing Projects
Technical Discussion
Units
Infrastructure
Managed Platform
Research & Teaching
Services
User Support
Other
Service Catalogue
Platform upgrades
Procurement
Historical interest
Emergencies
Critical shutdown
Where's my software?
Pandemic planning
This is
WebLeftBar
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback
This Wiki uses
Cookies