Inf Unit "Gas" DR Test 2014

This is the Inf-Unit's response to the January 2014 "gas explosion" DR test.


The assumption is that all AT basement services would be out of action. This would include our links to the upstairs floors, and our link to the EdLAN AT core router.

The upstairs floors are interlinked with copper, and have fibre links to the basement. They would therefore form a disconnected island until such time as external connectivity could be restored. How this would be done is out of scope for the current exercise.

The loss of our AT core switches would take out our 10GbaseSR link to the EdLAN AT core router. Routed IP traffic would re-route. Bridged IP traffic would break until such time as we restored bridged connectivity using the existing instructions. This would affect:

  • VoIP phones
  • Wireless
  • Wire-B connectivity to JCMB, which would affect kerberos and AFSDB.

Restoring bridged connectivity can be done remotely -- indeed it was designed in such a way that this could be done. The underlying mechanism to do so was tested recently as part of bringing up our new 10Gbps EdLAN-AT link. Fully testing it would be straightforward but disruptive to phones and wireless, and so we do not propose actually doing so.

There are no network infrastructure servers whose loss would affect any part of our network other than AT. Sites' switch and power-bar configurations are mirrored twice a day to off-site netinf machines, and so would be available when we were eventually in a position to restore the AT network service. The procedures to do this have been well tested, and the documentation is current. The DNS master is in the Forum. We would lose the AT OpenVPN endpoint, but could either bring up a new one easily, or else just ask users to connect to the Forum endpoint.

It would be straightforward to route "AT" subnets (e.g. 202 aka wire-AT1) at the Forum as necessary, should this turn out to be easier all round than just moving machines to one of the Forum subnets. If DHCP is not required, just add the VLAN if it's not already there, give one or more of the core routers an address on the subnet in question, and add the VLAN to the router-discovery list. If DHCP is also required, the Forum external router and network infrastructure machines could both be added, or alternatively DHCP set up on some third machine.

Note that IS's services would also be severly disrupted, and it would be as well to assume that we could not rely on whatever's based in AT working. In particular, their fibre paths are not as physically diverse as their diagrams would appear to show, and it's likely that damage to their AT comms room would result in large parts of EdLAN, including our link via Old College to JCMB and our VoIP phones, being partitioned.

Auth, Auth, Dir


Lost KDC slave/AFS db server - skoll No urgent action required, but could bring up another KDC fairly swiftly if deemed necessary. Services-Unit would need to deal with AFS db. Would need to modify DNS.


Lost slave - blackwell No urgent action required, could bring up another slave if needed. Likely to be necessary if/when we adopt ldap client/server model. May well have more slaves anyway, to cope with load. Also slaves may be virtual. All tlsdir slaves currently hosted on non-AT machines. Would need to modify DNS.


Lost cosign/ifriend slave - mcintyre Would be running with only one cosign server, so may well choose to bring another one up. ifriend database unaffected. Would need to modify DNS to remove weblogin1.


Unaffected - dammers is vm on KB host northern


Unaffected - dammers is vm on KB host northern, buchanan is vm on IF host hammersmith


Unaffected - vandellas is vm on IF host jubilee


Sites are mostly independent. A few Inf-Unit machines in the Forum are (deliberately) set up to use the AT console server, but these could easily be moved as and when necessary.

Monitoring (nagios)

The nagios master is in the Forum. The nagios secondary is on a VM in AT. We could easily bring up a new one on a VM at JCMB.


The jabber server is on a VM in the Forum.

-- GeorgeRoss - 13 Jan 2014

Topic revision: r3 - 16 Jan 2014 - 09:46:40 - GeorgeRoss
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies