Forum power-down on Monday 21st December 2009

Table of actions

Who What Comments
Stephen Shutdown options for DICE desktops We have a plan
Alison sys-announce message: shut down self-managed desktops  
Alison Warn managers of machines in self-managed server room done
Ian Warn E&B re. their machines Done
Alison Try to shut down Windows managed desktop machines automatically  
Unit Managers Ensure that someone is around to shut down their unit's machines  
Alison More desktops and screens/keyboards for server room will add 1 more desktop which gives 3 in total plus an extra 2 trolleys
inf-unit Check IT closets Power-down/up to reset switches and for firmware upgrades
services-unit Arrange for .inf mail to go through smtp.inf and virtualrelay.inf switched to KB (beano), mail.inf and postbox.inf switch to KB (chewy)
services-unit Arrange AT print server provision Done Laney at FH will stand in.
AFS team Move ITO to KB if necessary Done
services-unit Sort out web servers Most will survive, those that won't are listed on the Wiki page
rat-unit ITODB login server  
Stephen Neutralise updaterpms, at least on servers We have a plan
mp-unit Prioritise tobermory afterwards  
ALL Make sure fsck isn't a problem!  
Alison Arrange for a status display to be picked up on Friday from level 4 admin
Ian Neutralise nagios Plan (as per Alastair's suggestion) is to hand-edit the alert script Nagios uses simply to throw away alerts for the period; the Nagios web page will still reflect the truth
services-unit Cosign-based "homedir-server" cgi Done. Currently at https://www.inf.ed.ac.uk/systems/support/checkhomedir.html

Notes from power-down brainstorming meeting

Present: George, Ian, Toby, Craig, Neil, Graham, Gordon, Alison, Lindsey, Alastair, Stephen

Notes are in roughly the order of discussion. See also Dec2009PowerDown.

Background: due to failures in the model of HV switchgear that's used in the Forum, power will be going off on Monday 21st at 09:30 to allow an inspection to take place. The server-room UPS doesn't have enough oomph to keep everything running for the expected couple of hours, and even if it could there wouldn't be any air-conditioning. So...

Almost everything in the server room will have to be turned off. The intention is to leave a couple of switches and network infrastructure machines running, particularly for AT's benefit, but the breakers will be thrown for everything else.

Office machines should be shut down in advance. Stephen'll look into options for making this happen. Self-managed machines should also be shut down -- a message to sys-announce should be sent to this effect.

The power will be turned off in the small server room. People with machines there will be expected to shut them down themselves. Anything which isn't will be summarily chopped. Those people will also be expected to bring their machines back up again afterwards.

E&B have a couple of machines in the building. Ian'll let them know. (Done: E&B have been informed and are dealing with these machines: there's nothing we (i.e. Inf) have to do in addition. -- idurkacz, 14.12)

Can the managed (Windows) desktops be made to shut down automatically? Alison'll ask IS.

Alastair wondered whether the fire stewards would be the right people to sweep the building to make sure it's empty.

For the main server room, at least one person from each Unit will be expected to be there to supervise their machines' shutdowns.

To make things simpler, we could do with at least one more desktop machine in the server room. Alison'll arrange. More screens/keyboards on trolleys would also be very useful.

Afterwards, as well as bringing things back up again, someone should go around the building to check the IT closets and the door locks.

Things which will be affected...

The mail relay. Mail to .inf addresses will have to go through.

The ITO RT can probably be left down. (Post-meeting comment from Tim: the ITODB login server is currently on a Forum-hosted virtual machine, and will need to move.)

Printing at AT currently relies on Forum print servers. We'll need print-server support elsewhere, so that this can keep going.

Do the managed desktop machines need AFS? We could move the ITO to KB AFS servers.

Some web services are essential. Some are expendable. List?

Jabber? We can use the central provision.

Boot dependencies...

updaterpms would cause a delay during boots. Stephen'll disable it, at least on servers.

LDAP will grumble about replication failures, but shouldn't cause a delay.

There's a lcfg server in FH, so that's OK. However, we'll want tobermory (the master) back pretty quickly, in case configuration changes are needed.

The other rfe masters probably aren't so critical. achilles has a bunch of stuff, but probably not things we'll want to change. Other maps are hosted on the network machines, but they'll be up anyway.

We really really don't want machines to be doing big fscks. Everyone should take steps to avoid this happening.

Information for users should be a priority, as it'll take a while to bring everything back up. The low-tech solution of a display board by the lifts might be best, though an on-line list of affected services and their states would be good too.

nagios: we don't want to be bombarded with messages about things we already know. It should be muted in some way, or else turned off.

The VMs should be started after the AFS servers.

There are ssh hosts elsewhere for those who want to use them. There's a "ssh pool" which could be used.

Roughly half the users' home directories are still out at KB, so they'll be relatively unaffected. A cosign-based CGI which could say whether a user is or isn't affected would be very useful.

cvs and svn can be left down. coltex users will be able to edit their documents, but just not collaboratively.

Services-unit would like to take the chance to do some evo firmware upgrades. Arrangements for this will be made separately.

-- GeorgeRoss - 09 Dec 2009

Topic revision: r16 - 21 Dec 2009 - 09:15:31 - NeilBrown
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies