November 2010 Power Shutdown - Infrastructure Unit notes
For reference,
Dec2009PowerDownInfUnit contains our notes from last time around.
Organisational things we might want to do
Beforehand
- As all the switches will be powered down, we should make sure in advance that they have the most recent firmware loaded in flash, even though they might not actually be running it yet. That way, they'll all be upgraded.
- We should check marriner's BIOS settings as we shut it down, to ensure that it comes straight up when the power is restored.
- There might be a case for rebooting "important" machines in advance, to ensure that things which could delay the power-on (e.g. fsck, updaterpms) are done and out of the way.
- Can we do anything to nagios to keep it happy for the duration?
- Last time around we provided some TP leads for COs' laptops, though not many were used. This time we could pre-configure some ports on the sr99 switch on the dexion racking for that use (and possibly find a couple of C14-13A mains blocks too). s99.pdu has now been powered from DB-0.14 via a 32A-commando extender (RT#50480) and the machines and switch powered from it. s98.pdu remains on DB-0.20.
Forum server rooms
Power
- Generally, want to take the opportunity of a complete shutdown (and restart) to trace all power connections, and to confirm that the labelling which correlates the circuit-breakers in distribution boards DB-0.14 (main server room) and DB-0.20 (self-managed server room) with their corresponding power bars (or other connections) is both correct and complete.
- Trace the 5 power bars in use on the Beowulf racks. Label the corresponding circuit breakers on distribution board DB-0.14. They should all be on L3.
- Trace the 2 power bars in use on the Physics DR racks. Label the corresponding circuit breakers on distribution board DB-0.20. They should both be on L2.
- Trace the under-floor 13A sockets used by the floor-tiles-with-fans (in both server rooms) and label the breakers accordingly.
- Trace the 13A wall sockets and label the breakers accordingly.
- Move s07.pdu to an DB-0.14-L2 socket (it's currently on L1) to match its partner s06.pdu. Label s07.pdu, and its corresponding circuit breaker on distribution board DB-0.14, accordingly.
- Last time around we kept the core switches running for the benefit of Appleton Tower. This time we don't need to, so we should just shut everything down.
- ... and so there's no particular order to shut down the comms-rack servers, other than doing marriner last of all.
- The network servers' BIOS settings should all be correct. Rather than check them all, we can just watch what happens when the power comes back.
IT closets
We've had firmware upgrades pending for a while, and it would be handy to do a complete self-test, so we should power them all down beforehand and then bring them all back up one by one afterwards. (Probably excluding the basement closet, as it has the connections to the UPSes.) Don't forget the GPS-clock machine "farg" in the 5A closet.
Afterwards
- Do we want to turn everything on, or only the circuits where we think there should be something plugged in?
Infrastructure Unit machines
See also our
kit list.
Machine |
Role |
Comments |
barrett |
KDC master |
remove _kerberos._udp SRV record to speed up authentication; om kerberos push to slaves; take dump of kerberos db |
dunlin |
Nagios secondary |
|
farg |
GPS clock |
Located in 5A IT closet |
fenrir |
KDC slave/AFSDB |
down late, up early; remove _kerberos._udp record |
franklin |
LDAP master |
take dump of ldap db |
harnoncourt |
Forum extRt |
|
hickox |
Forum netServ |
|
hogwood |
Forum netInf |
|
kingsmen |
toby test machine |
|
linnaeus |
Forum extNS |
|
marriner |
Forum consoles |
Last down, first up. (Note that verte - in AT - is the console server for Forum network machines hogwood , harnoncourt and marriner .) |
mckinley |
LDAP slave |
move infdir alias; remove from dir round-robin; remove _ldap._tcp SRV record |
osprey |
cosign/ifriend KDC master |
remove from weblogin alias the night before; om kerberos push to slave; take dump of ifriend kerberos db |
panther |
prometheus master |
take paranoid dump of everything, can be shut down early |
reeves |
new prometheus machine |
|
--
TobyBlake - 29 Oct 2010
--
IanDurkacz - 28 Oct 2010
--
GeorgeRoss - 28 Oct 2010
Topic revision: r10 - 05 Nov 2010 - 14:34:56 -
TobyBlake