November 2010 Power Shutdown - Infrastructure Unit notes

For reference, Dec2009PowerDownInfUnit contains our notes from last time around.

Organisational things we might want to do

Beforehand

  • As all the switches will be powered down, we should make sure in advance that they have the most recent firmware loaded in flash, even though they might not actually be running it yet. That way, they'll all be upgraded.

  • We should check marriner's BIOS settings as we shut it down, to ensure that it comes straight up when the power is restored.

  • There might be a case for rebooting "important" machines in advance, to ensure that things which could delay the power-on (e.g. fsck, updaterpms) are done and out of the way.

  • Can we do anything to nagios to keep it happy for the duration?

  • Last time around we provided some TP leads for COs' laptops, though not many were used. This time we could pre-configure some ports on the sr99 switch on the dexion racking for that use (and possibly find a couple of C14-13A mains blocks too). s99.pdu has now been powered from DB-0.14 via a 32A-commando extender (RT#50480) and the machines and switch powered from it. s98.pdu remains on DB-0.20.

Forum server rooms

Power

  • Generally, want to take the opportunity of a complete shutdown (and restart) to trace all power connections, and to confirm that the labelling which correlates the circuit-breakers in distribution boards DB-0.14 (main server room) and DB-0.20 (self-managed server room) with their corresponding power bars (or other connections) is both correct and complete.

  • Trace the 5 power bars in use on the Beowulf racks. Label the corresponding circuit breakers on distribution board DB-0.14. They should all be on L3.

  • Trace the 2 power bars in use on the Physics DR racks. Label the corresponding circuit breakers on distribution board DB-0.20. They should both be on L2.

  • Trace the under-floor 13A sockets used by the floor-tiles-with-fans (in both server rooms) and label the breakers accordingly.

  • Trace the 13A wall sockets and label the breakers accordingly.

  • Move s07.pdu to an DB-0.14-L2 socket (it's currently on L1) to match its partner s06.pdu. Label s07.pdu, and its corresponding circuit breaker on distribution board DB-0.14, accordingly.

  • Last time around we kept the core switches running for the benefit of Appleton Tower. This time we don't need to, so we should just shut everything down.

  • ... and so there's no particular order to shut down the comms-rack servers, other than doing marriner last of all.

  • The network servers' BIOS settings should all be correct. Rather than check them all, we can just watch what happens when the power comes back.

IT closets

We've had firmware upgrades pending for a while, and it would be handy to do a complete self-test, so we should power them all down beforehand and then bring them all back up one by one afterwards. (Probably excluding the basement closet, as it has the connections to the UPSes.) Don't forget the GPS-clock machine "farg" in the 5A closet.

Afterwards

  • Do we want to turn everything on, or only the circuits where we think there should be something plugged in?

Infrastructure Unit machines

See also our kit list.

Machine Role Comments
franklin LDAP master take dump of ldap db
barrett KDC master remove _kerberos._udp SRV record to speed up authentication; om kerberos push to slaves; take dump of kerberos db
osprey cosign/ifriend KDC master remove from weblogin alias the night before; om kerberos push to slave; take dump of ifriend kerberos db
mckinley LDAP slave move infdir alias; remove from dir round-robin; remove _ldap._tcp SRV record
panther prometheus master take paranoid dump of everything, can be shut down early
reeves new prometheus machine  
fenrir KDC slave/AFSDB down late, up early; remove _kerberos._udp record
kingsmen toby test machine  
harnoncourt Forum extRt  
hogwood Forum netInf  
hickox Forum netServ  
linnaeus Forum extNS  
marriner Forum consoles Last down, first up. (Note that verte - in AT - is the console server for Forum network machines hogwood, harnoncourt and marriner.)
dunlin Nagios secondary  
ventura Self-managed consoles
farg GPS clock Located in 5A IT closet

-- TobyBlake - 29 Oct 2010 -- IanDurkacz - 28 Oct 2010 -- GeorgeRoss - 28 Oct 2010

Topic revision: r11 - 05 Nov 2010 - 14:40:59 - IanDurkacz
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies