TWiki> DICE Web>AfsDbSwitch (revision 6)EditAttach
The first AFS DB server to be moved to the new openafs component and new hardware will be afsdb1 / charybdis (129.215.64.17). Here is my reasoning:

  • This is not likely to be the master due to not being the lowest IP address
  • It is in the Forum server room along with the new host fenrir
  • It is not far to walk if something goes wrong

For simplicity we need to do this after the stable release on Thursday 10th December. The plan is to do the switch on Monday 14th December, it is probably best to do it at 8am to avoid disrupting too many users.

All clients using the DB server beind turned off will hang for about 1 minute when it goes down. They'll then switch to using a different DB server. We should send something to sys-announce a week in advance to warn users that they might experience a short delay.

Here is the procedure, the first steps can be done well ahead of time.

1. Stop openafs component

On fenrir, stop the openafs component whilst we switch IP addresses.

2. Copy KeyFile

Grab the KeyFile from charybdis

[charybdis]root: scp /usr/afs/etc/KeyFile squinney@fenrir:

move it into place on fenrir and ensure permissions are correct

[fenrir]root: mv /home/squinney/KeyFile /usr/afs/etc/KeyFile
[fenrir]root: chown root:root /usr/afs/etc/KeyFile
[fenrir]root: chmod 0600 /usr/afs/etc/KeyFile

3. Dump the protection database

Just in case we get a worst case scenario and it all goes horribly wrong. Do this on one of the other DB servers (e.g. scylla):

/usr/afs/bin/pt_util -user -members -datafile /usr/afs/db/ptdb.dump

4. Shutdown old server

Do a complete shutdown of charybdis. Once the interface has been brought up on fenrir the old DB server must NEVER be resurrected whilst attached to the network.

Once the old machine has been turned off ensure that we still have quorum.

5. Bring up network interface

Currently in the LCFG source profile for fenrir there is the following network interface:

!network.interfaces             mADD(bond01)
network.device_bond01           bond0:1
network.ipaddr_bond01           129.215.64.27
network.hostname_bond01         afstest0
!network.ipaddr_bond0           mSET(129.215.64.20)

This should be changed to:

!network.interfaces             mADD(bond01)
network.device_bond01           bond0:1
network.ipaddr_bond01           129.215.64.17
network.hostname_bond01         afsdb1
!network.ipaddr_bond0           mSET(129.215.64.20)

6. Configure openafs

The OPENAFS_TEST_SERVER macro must be removed from the LCFG source profile for fenrir. Then on fenrir wait for the new profile to be accepted and then run:

[fenrir]root: om openafs configure

This should change the /usr/afs/local/NetInfo file so that it contains the IP address of afsdb1. It is probably worth checking through all the files in /usr/afs/etc/ to ensure they are sane before going any further.

7. Reboot

On fenrir wait for the network component to successfully reconfigure and then reboot. Once it is back up check that all is well with /sbin/ifconfig and with ping.

8. Verification

Check all the logs in /usr/afs/logs to ensure there are no issues.

Use udebug on ptserver and vlserver to check that all is well.

Particularly we need to ensure that the new machine has got the same DB version as the other machines, and is being seen as part of that cluster (i.e. that it's included in the output from the other db servers).

Look at nagios to see if the DB server is now being monitored.

9. Disposal of charybdis

To avoid any unfortunate problems with the old server coming back onto the network we will remove the disk and put the chassis into the cupboard for disposal. This way if we realise we need data off the old machine we can still get at it. At some later point the disk can go for secure disposal.

Remember to also remove the DNS entry in dns/inf for charybdis (it shares the address with afsdb1) and the LCFG profile.

-- StephenQuinney - 01 Dec 2009

Edit | Attach | Print version | History: r10 | r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r6 - 08 Dec 2009 - 15:42:53 - StephenQuinney
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies