TWiki> DICE Web>AfsDbSwitch (revision 9)EditAttach
To facilitate the move from FH to AT the second AFS DB server to be moved to the new openafs component and new hardware will be afsdb0 / scylla (129.215.64.16).

The plan is to do the switch on Wednesday 17th February, it is probably best to do it at 8am to avoid disrupting too many users.

All clients using the DB server being turned off will hang for about 1 minute when it goes down. They'll then switch to using a different DB server. We should send something to sys-announce a week in advance to warn users that they might experience a short delay.

Here is the procedure, the first steps can be done well ahead of time.

1. Stop openafs component

On skoll, stop the openafs component whilst we switch IP addresses.

2. Copy KeyFile

Grab the KeyFile from scylla

[scylla]root: scp /usr/afs/etc/KeyFile squinney@skoll:

move it into place on skoll and ensure permissions are correct

[skoll]root: mv /home/squinney/KeyFile /usr/afs/etc/KeyFile
[skoll]root: chown root:root /usr/afs/etc/KeyFile
[skoll]root: chmod 0600 /usr/afs/etc/KeyFile

3. Cleanup new server

For safety/paranoia delete any files in the /usr/afs/db directory on skoll so the data cannot be replicated.

4. Dump the protection database

Just in case we get a worst case scenario and it all goes horribly wrong. Do this on one of the other DB servers (e.g. scylla):

/usr/afs/bin/pt_util -user -members -datafile /usr/afs/db/ptdb.dump

5. Shutdown old server

Do a complete shutdown of scylla. Once the interface has been brought up on skoll the old DB server must NEVER be resurrected whilst attached to the network.

Once the old machine has been turned off use udebug to ensure that we still have quorum.

6. Bring up network interface

We need this in network interface in the LCFG source profile for skoll:

!network.interfaces             mADD(bond01)
network.device_bond01           bond0:1
network.ipaddr_bond01           129.215.64.16
network.hostname_bond01         afsdb0
!network.ipaddr_bond0           mSET(129.215.64.21)

Should just need uncommenting to make it live.

7. Configure openafs

In the LCFG source profile for skoll uncomment the inclusion of the openafs-dbserver.h header. On skoll wait for the new profile to be accepted and then run:

[skoll]root: om openafs configure

This should change the /usr/afs/local/NetInfo file so that it contains the IP address of afsdb0. It is probably worth checking through all the files in /usr/afs/etc/ to ensure they are sane before going any further.

8. Reboot

On skoll wait for the network component to successfully reconfigure and then reboot. Once it is back up check that all is well with /sbin/ifconfig and with ping.

9. Verification

Check all the logs in /usr/afs/logs to ensure there are no issues.

Use udebug on ptserver and vlserver to check that all is well.

Particularly we need to ensure that the new machine has got the same DB version as the other machines, and is being seen as part of that cluster (i.e. that it's included in the output from the other db servers).

Look at nagios to see if the DB server is now being monitored.

10. Disposal of scylla

To avoid any unfortunate problems with the old server coming back onto the network we will remove the disk and put the chassis into the cupboard for disposal. This way if we realise we need data off the old machine we can still get at it. At some later point the disk can go for secure disposal.

Remember to also carefully remove the DNS entry in dns/inf for scylla (it shares the address with afsdb0) and the LCFG profile.

Edit | Attach | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r9 - 16 Feb 2010 - 16:47:40 - StephenQuinney
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies