IPMI Problems

Machines affected on 2015/08/10

  • MPU: hammersmith (r720 - iDRAC enterprise) - 23:50:56
  • RAT: flapjack (r320 - BMC ) - 00:05:26, craggy (r320) - 00:05:26, shortbread (r720) - 00:10:26
  • Services: belter (r620 - iDRAC express) - 00:02:39, nessie (r720) - 00:10:26, kraken (r720) - 00:04:26 and yeti (r720) - 00:13:26 (note that huldra, should be identical to those machines, was not affected)
  • Inf: bevan (r320) - 00:02:26

All machines in Forum server room.

Separately hyde (r420) is in AT and has the same problem starting on 2015/08/05 15:56:25.

Not all machines of these models were affected.

These DICE R720s were not affected: nuggle oyster amarela jubilee haight ashbury hubel sprinkles wiesel huldra nix pergamon kelpie hobgoblin majestic trout

These DICE R320s were not affected: knussen gatti rattle norrington elder blatiere sponge grepon teasel salamanca runnicles

These are the versions of iDRAC firmware on some of the R720s. The affected machines are highlighted:

iDRAC7 firmware host model
1.06 pergamon R720
1.06 (1.06.06 according to web interface) belter R620
1.6 (1.06.06 according to web interface) nessie R720
1.23 (1.23.23 according to web interface) kraken R720
1.23 jubilee R720
1.23 hammersmith R720
1.23.23 (according to web interface) yeti R720
1.35 nix R720
1.35 waterloo R720
1.35 kelpie R720
1.35 nuggle R720
1.35 oyster R720
1.40 sprinkles R720
1.56 majestic R720
1.57 hobgoblin R720
1.57 haight R720
1.57 wiesel R720
1.57 huldra R720
1.57 ashbury R720
1.57 vermelha R720
1.57 amarela R720
1.57 hubel R720
1.57 trout R720
1.10 elder R320
1.10 norrington R320
1.20 bevan R320
1.35 rattle R320
1.35 blatiere R320
1.55 grepon R320
1.55 knussen R320
1.55 gatti R320
1.57 sponge R320
2.0 runnicles R320
2.0 salamanca R320
2.0 teasel R320
1.10 flapjack R320
1.40 craggy R320
1.23 shortbread R720

Probably unrelated but the logs on the Forum console server - blatiere - showed dhcp request storms on VLAN 468 involving dyatlov, kinloch and rockall (all r815).

iDRAC hardware reset via ' i ' button

Dell iDRACs/BMCs can be hardware reset without having to reset/reboot their 'hosting' machines. This is a good way to sort out a 'wedged' or unresponsive iDRAC if you have convenient physical access to the machine.

To do so: hold the blue ' i ' button on the rear of the machine down for about ten secnds or so - at least, until it ceases to be illuminated - and then release.


iDRAC soft reset via HTTP

The iDRACs/BMCs of some machines also offer a web interface, some of which offer the possibility of a 'soft' reset. This can be useful to try remotely to sort out an iDRAC on which IPMI functionality has become wedged, but which is still capable of communication via HTTP.

Note that not all our iDRACs offer this possibility: e.g. those on our R620 and R720 machines probably do; but most/all of those on our R320s don't

To try this: get to the web interface of the bmc in question. The way I did it was to 'ssh -D 8888 kbconsoles' to setup a SOCKS proxy on localhost:8888 on my desktop. Then in FireFox 's Preferences -> Advanced -> Network -> Connection Settings. Configure a manual proxy to a socks host on localhost:8888.

You'll now be able to go to http://machine.bmc.inf.ed.ac.uk/ (or you may have to use the IP address depending on the DNS on your machine).

You'll need to login in as root and use the conserver secret password. Infrastructure can tell you how to discover that (or we could document it here).

You should now be logged in to the web interface for the iDRAC. You may see the "Reset iDRAC" in the "Quick launch tasks" section of the front page. If not, you can find it under Troubleshooting -> Diagnostics -> Reset iDRAC. Reset it and wait a couple of mins, your running OS will be fine. Honest.

-- neilb

-- StephenQuinney - 10 Aug 2015

Topic revision: r11 - 14 Mar 2016 - 15:19:25 - IanDurkacz
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies