To facilitate the move from FH to AT the second AFS DB server to be moved to the new openafs component and new hardware will be afsdb2 / symplegades (129.215.64.18).

The plan is to do the switch on Thursday 25th February, it is probably best to do it at 8am to avoid disrupting too many users.

All clients using the DB server being turned off will hang for about 1 minute when it goes down. They'll then switch to using a different DB server.

Here is the procedure, the first steps can be done well ahead of time.

1. Stop openafs component

On hati, stop the openafs component whilst we switch IP addresses.

2. Copy KeyFile

Grab the KeyFile from symplegades

[symplegades]root: scp /usr/afs/etc/KeyFile squinney@hati:
[symplegades]root: md5sum /usr/afs/etc/KeyFile
c05e14585d4354c063e38b23d09f017c  /usr/afs/etc/KeyFile

move it into place on hati and ensure permissions are correct

[hati]root: mv /home/squinney/KeyFile /usr/afs/etc/KeyFile
[hati]root: chown root:root /usr/afs/etc/KeyFile
[hati]root: chmod 0600 /usr/afs/etc/KeyFile

Also sanity check:

[hati]root: md5sum /usr/afs/etc/KeyFile

3. Cleanup new server

For safety/paranoia delete any files in the /usr/afs/db directory on hati so the data cannot be replicated.

4. Dump the protection database

Just in case we get a worst case scenario and it all goes horribly wrong. Do this on one of the other DB servers (e.g. fenrir):

/usr/afs/bin/pt_util -user -members -datafile /usr/afs/db/ptdb.dump

5. Shutdown old server

Do a complete shutdown of symplegades. Once the interface has been brought up on hati the old DB server must NEVER be resurrected whilst attached to the network.

Once the old machine has been turned off use udebug to ensure that we still have quorum.

6. Bring up network interface

We need this in network interface in the LCFG source profile for hati:

!network.interfaces             mADD(bond01)
network.device_bond01           bond0:1
network.ipaddr_bond01           129.215.64.18
network.hostname_bond01         afsdb2
!network.ipaddr_bond0           mSET(129.215.64.22)
!openafs.netinfo_server         mSET(<%network.ipaddr_bond01%>)

Should just need uncommenting to make it live.

7. Configure openafs

In the LCFG source profile for hati uncomment the inclusion of the openafs-dbserver.h header. On hati wait for the new profile to be accepted and then run:

[hati]root: om openafs configure

This should change the /usr/afs/local/NetInfo file so that it contains the IP address of afsdb2. It is probably worth checking through all the files in /usr/afs/etc/ to ensure they are sane before going any further.

8. Reboot

On hati wait for the network component to successfully reconfigure and then reboot. Once it is back up check that all is well with /sbin/ifconfig and with ping.

9. Verification

Check all the logs in /usr/afs/logs to ensure there are no issues.

Use udebug on ptserver and vlserver to check that all is well.

Particularly we need to ensure that the new machine has got the same DB version as the other machines, and is being seen as part of that cluster (i.e. that it's included in the output from the other db servers).

Look at nagios to see if the DB server is now being monitored.

10. Disposal of symplegades

To avoid any unfortunate problems with the old server coming back onto the network we will remove the disk and put the chassis into the cupboard for disposal. This way if we realise we need data off the old machine we can still get at it. At some later point the disk can go for secure disposal.

Remember to also carefully remove the DNS entry in dns/inf for symplegades (it shares the address with afsdb2) and the LCFG profile.


This topic: DICE > AfsDbSwitch
Topic revision: r10 - 24 Feb 2010 - 15:15:12 - StephenQuinney
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies