DICE servers are protected from overheating by a small script called toohot.

It's called from cron once a minute. The script runs ipmi-sensors to discover

  • the room temperature as detected by the machine
  • the temperature at which the machine should be shut down.
If the former isn't less than the latter the script initiates a machine shutdown.

The idea is to shut the machine down cleanly shortly before the temperature rises to the point at which server hardware components start performing their own emergency shutdowns for self-preservation, at which point data loss can occur.

The script is installed by an RPM called toohot. The RPM and the cron job are installed by a header, dice/options/toohot.h. IPMI doesn't work for our oldest server models (650, 750, 2650) so dice/options/toohot.h is included via the model-specific headers in dice/hw.

TooHot is a product of the Server Hardware Interaction project.

-- ChrisCooke - 08 Feb 2010

Topic revision: r1 - 08 Feb 2010 - 10:05:25 - ChrisCooke
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies