Updating Users during a Service Disruption

This page intentionally is quite verbose to give people a clear background to the suggestions. Ultimately, I expect we would want a summary page of this that people could refer to in the event of a disaster with quick step-by-step bullet points.

Planned Service Disruption

In advance of the disruption, the following steps should be taken:

  • The CO planning the service disruption should ensure that Support are aware of the outage including the reasons behind it and the expected time for the outage so that they can answer user queries.
  • Support should then consider what impact this might have on users (e.g. lab exams, deadlines) and discuss with the CO responsible for the outage.
  • The CO should be responsible for updating the Service Disruption page - http://computing.help/statuspage
  • The CO should also email sys-announce (or a smaller selected group if more appropriate) with details of the disruption giving an estimated timeout.
  • On the day of the disruption, the CO should put up the initial message on the support pages by editing the alert status page on computing.help.

During the disruption:

  • If the planned disruption looks as though it will exceed the initial planned downtime, the CO responsible for the outage should ensure that support are updated. They can then update web pages/notice boards. This should be done at regular intervals.
  • Where support are not kept informed, they should actively chase at regular intervals for updates.

Un-planned Service Disruption

The aim should be to let users know about disruption as soon as possible even if the reason for the disruption has not been determined and then to keep them regularly informed of progress.

  • If the disruption hasn't been reported to COs by support (e.g. a CO has identified the problem or a user has gone directly to a CO) support should be contacted straight away. If this happens out of hours .....
  • If support have been contacted first they should notify the appropriate team and should update http://computing.help.inf.ed.ac.uk/statuspage
  • The extent of the disruption should be assessed by the Unit(s) dealing with the problem and support to establish which users need to be contacted.
  • Support should then devolve the details amongst the team, including details of who to contact for updates. To avoid unnecessary disturbances, contact should be between 1 CO and 1 member of the support team.
  • Where possible, affected users should be emailed immediately. This can be done by support but the CO(s) dealing with the outage should be prepared to spend a short while explaining the situation and if possible give an estimate of the time it might take to bring the service back.
  • An entry should be made in alertmessage.inc and this needs to be kept up to date. This is particularly important when the service has resumed.
  • Support should ensure that ISS are aware of potential disruption to students, particularly near deadlines.
  • For major disruptions, notice boards should be placed in the Forum. This is particularly important where it has not been possible to email users. Notices should also be placed in the Forum Support area and possibly printer areas.
  • If the disruption affects AT, appropriate notices should be put on all levels. As the AT support is not manned all the time, support should consider the need to send someone across. The AT lab bookings should be checked.
  • For major disruptions, support should arrange for offices to be visited. It may be that admin staff will be available to help with this.
  • The chatroom should be used where possible for keeping COs updated. Should this be unavailable, the alternative jabber service should be used - see DICE pandemic planning page.
  • Additional emails should be sent to provide updates on progress. This can be done by support but again will require input from the CO dealing with the incident.
  • Support should ask for regular (half-hourly ?) updates and keep all notices updated.
  • Once resolved, all users need to be informed by email, notices taken down and http://computing.help.inf.ed.ac.uk/statuspage updated.

-- AlisonDownie - 15 Dec 2011

Topic revision: r6 - 26 Nov 2013 - 14:56:29 - AlisonDownie
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies