Gas explosion drill - MPU report

Conclusion

No lessons were learnt from this particular exercise. Only services which were already replicated were affected.

Machines lost

This MPU kit was affected by the pretend AT gas explosion:
  • kubelik (student.ssh)
  • KVM servers circle and waterloo
  • wildcat (RPM cache and backup PXE)
  • With waterloo we have also lost vermeer aka lcfg1, one of the two main LCFG slave servers.

The loss of waterloo knocks out these virtual machines:

bank MPU test
barking RAT Trac server
borges Services backup print server
capon Infrastructure secondary Nagios server
wobleg Services test
spadina RAT projects.inf
vermeer MPU lcfg1.inf

These VMs were lost along with circle:

argus RAT testing forumtracker
arlott INF testing sl6 KDC
armitage Services student labs monitoring test
cardus INF testing sl6 cosign
circlevm0 MPU for testing
circlevm2 MPU for testing
circlevm3 MPU for testing
circlevm4 MPU for testing
circlevm5 MPU for testing
circlevm6 MPU for testing
circlevm7 MPU for testing
circlevm8 MPU for testing
circlevm9 MPU for testing
circlevm10 MPU for testing
dilley INF testing sl6 KDC
ekcof RAT testing Coltex
engadine RAT gdutton test
idoru Services gordon's test vm
keele RAT test portal
littlebird RAT iainr test
monmouth RAT iainr test
monty INF testing sl6 prometheus
moody ? Moodle

Services affected

LCFG
One of the two main LCFG slave servers has been lost. The service will carry on more or less unaffected using the other slave server. The MPU is considering bringing up another slave server elsewhere. In the meantime the DNS aliases lcfg1 and lcfg3 have been moved to the other slave server rembrandt, safely in the Forum.
Package cache and updaterpms
We have lost wildcat, one of the two RPM cache servers serving cache.pkgs.inf.ed.ac.uk. This is the address from which updaterpms gets its RPMs on most DICE machines. We have altered the DNS to remove its IP address from cache.pkgs. An om dns update or waiting an hour should be enough to get updaterpms working on DICE machines outside the Tower.
SSH
student.ssh.inf.ed.ac.uk aka ssh.inf.ed.ac.uk has gone. The MPU has brought its hot spare shrew at KB into use as the new temporary student ssh server.
KVM service
  • We have recovered the backup of /etc/libvirt for the lost KVM server waterloo, in case it should come in handy, though we hope that the waterloo wiki page and the LCFG should give sufficient detail to enable people to restore their VMs elsewhere. We have sufficient capacity on other KVM servers, partly thanks to waterloo having been underused. It only hosted seven VMs of which two were test VMs. The backup of /etc/libvirt from waterloo can be found in /etc/waterloo on oyster if anyone needs it.
  • We do not expect to provide a short term replacement for circle as it is only a test server.

Spare servers on offer

One of the MPU's functions is to carry spare machines for use in emergencies such as this. We don't have spares of every model but we have these in the Forum:
Dell PowerEdge R805 central Currently configured as a staff NX server but not yet in service.
Dell PowerEdge 850 figgy  
HP DL 180 G6 juice  
Dell PowerEdge R710 metropolitan Currently configured as a KVM server but not in service.

-- ChrisCooke - 13 Jan 2014

Topic revision: r5 - 29 Jan 2014 - 09:28:16 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies