Investigate Power Management Options for DICE Desktops

This is the final report for School of Informatics computing development project 34

Contents

Introduction

This report looks for ways of making DICE desktop computers use electricity more efficiently than at present.

It focuses mainly on ways in which computers could be made to enter a sleep state when not needed for use. A secondary motive for encouraging the use of sleep states is noise: DICE users have said that a sleep facility for their machines would make their offices less noisy. Some attention is also paid to ways in which computers may be made to operate more efficiently while in active use.

Note that a pursuit of efficiency differs slightly from the original proposal, which promised to “investigate what power management techniques we could apply to DICE desktops to reduce the school's energy consumption”. An easy way to reduce a computer's energy consumption would be to reduce the use made of the computer. The logical extreme would be to switch off and unplug all computers, saving a great deal of electricity - at some cost to productivity.

This report does not consider the issue of how much a computer ought or ought not to be used. Instead, it seeks to offer ways of operating the computer more efficiently no matter what its level of use may be. However, all other things being equal, the potential does exist for a significant reduction in the amount of electricity used by our desktop computers.

Greater Efficiency While Running

Speedstep

A small but significant difference can be made by enabling "Enhanced SpeedStep" in the BIOS on our Dells. Enhanced Speedstep is an Intel technology which makes the CPU frequency (slightly) adjustable. On the test Dell 745 with Enhanced Speedstep enabled the possible operating frequencies were 1.60GHz and 1.87GHz. Measuring the machine's power consumption with a power meter, without Enhanced Speedstep the machine would consume a steady 63.5W when idle. With Enhanced Speedstep, the idle machine would consume between 62.1W and 64.8W, with the power consumption figure on the power meter changing once or twice a second. The CPU frequency information is from the cpufreq-info command, which uses the cpufreq system, which is enabled automatically on reboot once Enhanced Speedstep is enabled. The cpufreq startup script is /etc/init.d/cpuspeed; when cpufreq is running the process /usr/libexec/hald-addon-cpufreq exists. When cpufreq is not available the cpufreq-info command reports for each CPU that "no or unknown cpufreq driver is active on this CPU".

Comment from ascobie: What was the full speed clock rating of this particular 745. Perhaps specify the processor model (/proc/cpuinfo)

Chris: Thanks; /proc/cpuinfo is now included in its entirety, for interest.

The /proc/cpuinfo file on the test machine contained:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU          6300  @ 1.86GHz
stepping        : 2
cpu MHz         : 1600.000
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni mon
itor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips        : 3723.45

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU          6300  @ 1.86GHz
stepping        : 2
cpu MHz         : 1600.000
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni mon
itor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips        : 3721.29

Kernel Improvements

For some years now kernel developers have been trying to make substantial improvements to the efficiency of the Linux kernel. One major driver of this has been the desire to use Linux on laptops; another is the desire to use Linux on small hand-held devices where both power use and heat production have to be kept to a minimum. Many kernel patches have made contributions. A number of these concentrate on switching system components such as the CPU into low power modes wherever appropriate. This can be a complex area because typically several low power modes may be available, with the amount of power which can be saved generally being somewhat proportional to the time taken to enter or exit the low power mode. The kernel has to be able to make a reasonably accurate forecast of how long a component's "rest" period is likely to last before the component is needed again and has to be switched into its normal higher power running mode. An important principle here is "the race to idle". This principle holds that the greatest efficiency will be gained if computing tasks are done as quickly as possible, so that the computer can get back to an idle state as soon as possible, and so that the idle state can (conceivably) be longer than it might otherwise have been, possibly leading to an idle time of increased length, in turn making it economical to enter a lower power mode than might otherwise have been possible. In other words, it's a bad idea to switch a CPU to a low power mode if you actually want to get it to do some work for you; this just makes your computer run slower and it shortens the idle periods between bursts of activity.

Kernel development is a highly specialised area, and power efficiency enhancements can interact with each other in unexpected ways; and anyway, we use vendor-provided kernels wherever possible rather than rolling our own. For these reasons I don't suggest any enhancements to our kernels. We expect to gain from efficiency enhancements as and when they are added to our vendor-supported kernels from time to time.

Sleeping

The ACPI specification (official spec, wikipedia summary) defines a number of power states.

As well as the obvious "on" (ACPI's G0 / S0) and "off" (G2 / S5, G3) states Linux also supports "sleep" (S3) and "hibernate" (S4) states. The latter two states are reliably supported under SL5, and are what we hope to use during idle periods in order to save power.

In hibernation the machine's state is saved to non-volatile storage, and so survives a termination of the electricity supply, for example an unplanned power cut or a move of the machine to a different building. In sleep the machine's state is saved to RAM. In tests a machine in hibernation used 12W (exactly the same as it uses when "powered down" and plugged in, that is in ACPI state G2) whereas one in sleep used 14.5W. By comparison a running machine uses 62-64W when totally idle and up to 100W or so while busy.

Times Taken to Change State

In a recent test on a Dell 745 running SL5 DICE the following times were recorded:

  • To shut down, from initiation of the command to power down: 69 seconds
  • To boot, from pressing the power button to seeing the GDM login screen: 130 seconds
  • To hibernate, from initiation of the command to power down: 14 seconds
  • To thaw from hibernation, from pressing the power button to getting the screensaver password prompt: 44 seconds
  • To sleep, from initiation of the command to power down: 6 seconds
  • To resume from sleep, from pressing the power button to getting the screensaver password prompt: 7 seconds

These times may vary depending on whether someone was logged in or not, how many processes were running and how demanding they were, the machine's power, whether any software was due to be installed at boot, etc. but they do indicate quite well the dramatic differences between the times taken to boot, to thaw from hibernation and to resume from sleep.

When waking a machine from sleep it may often be the case that a user or would-be user is in front of the machine waiting for it; so it seems important that we attempt to minimise the time taken to regain normal running. With this in mind sleep (S3) seems like the best power-saving state to use.

ascobie: One potential problem with using S3 is that an end-user may consider it safe to disconnect a sleeping machine?

cc: Yes, possibly - although the power button will still be flashing green, indicating some ongoing activity? We could perhaps see with experience whether that was in practice more or less of a problem than the greatly increased wake-up time from S4. We have the choice, anyway, as both work reliably.

Power Management Software

Power Management on Linux is done by a collection of software operating at various levels. Here's how Simon Wilkinson explained it to me:

Basically, you have things like GNOME Power Manager, which talk, over DBus to HAL. HAL is an abstraction layer - it doesn't actually do anything, it just sits there and interfaces between the desktop applications, and the utilities that control the hardware. On RedHat, these utilities are the pm-utils set of tools, which instruct the kernel when to start/stop/suspend etc. The ACPI code in the kernel then takes these commands from userspace, and passes them on to the BIOS.

There is another mechanism for passing power management instructions to the kernel: various files in /proc or /sys (a machine will have one or the other, depending on kernel version) can be written to to effect power management changes. For an example of this see the section on Wake Alarms.

At the end of the day /proc and /sys are just ways of communicating between userspace and kernel. The pm-utils utilities achieve their suspending by using a different mechanism (the ioctl) for talking to the kernel. Ultimately it's a set of shell scripts that calls pm-pmu to do the actual suspending operation - the shell scripts take care of starting and stopping any necessary services before and after the suspend.

The pm-utils utilities offer a hook mechanism which can be used to execute scripts on power management events (sleep, resume, etc.) - scripts are simply put into the /etc/pm/hooks/ directory.

Using the Power Management Software

In Gnome, add a power management applet to your panel by selecting (from the Gnome panel menu) System -> Preferences -> More Preferences -> Power Management then clicking the General tab and selecting "Always display icon". The applet lets you "Suspend" or "Hibernate" the machine.

Alternatively run pm-suspend or pm-hibernate as root.

Alternatively insert the number of the ACPI sleep state desired into /proc/acpi/sleep as per the documentation at http://acpi.sourceforge.net/documentation/sleep.html

While the machine is in sleep mode the power button on the system box will flash green. To wake the machine from sleep manually, press the power button.

While the machine is in hibernate mode the power button is unlit and the machine appears identical to one that has been normally powered down. When the power button is pressed, instead of initiating the normal Linux kernel boot procedure the saved state will be restored and the machine will thaw from hibernation.

What is affected when a machine sleeps?

When a machine sleeps the state of the machine is saved exactly, so that when the sleep is terminated the machine resumes into the same state as it had before sleep was initiated. Most processes on the machine should cope perfectly well with sleep and need not be aware of it.

When a machine resumes from sleep with a running GDM session, a screensaver password prompt comes up; typing in the password unlocks the pre-sleep session and also renews Kerberos credentials, judging by the changed times in the output of klist pre-sleep and post-sleep.

The Hook Mechanism

Those processes which do need to be made aware of sleep can drop a "hook script" into the power management hook script directory /etc/pm/hooks. Such a script can be made to run commands on sleep, on resume or at other power management events.

LCFG components can use this same mechanism. It should not be necessary to revive the old APM era "suspend" and "resume" component methods, because it will be easier, more lightweight and more standard to use the OS's own power management hook mechanism.

AMD: an Example Hook Script

When trying out suspending and hibernating on an FC6 DICE machine one problem was evident fairly quickly: if a machine had been sleeping for a longer period - hours or days rather than minutes - then the amd daemon would crash on resume. This doesn't happen on SL5 DICE. We're using the same version on both platforms so we can guess that the difference in behaviour is caused by a bug in the Fedora kernel.

Pending the arrival of an amd or kernel bug fix (unlikely for the FC6 kernel at least!) we can get round the problem on fc6 by putting a hook script in /etc/pm/hooks which simply restarts the amd component when the machine wakes up at the end of a sleep period. The example script which achieved this was called /etc/pm/hooks/25amd and it contained the following:

#!/bin/bash

. /etc/pm/functions

# On resume/thaw, restart the amd component if it was running, because
# amd crashes on resume if the machine has been suspended for more than
# a few minutes.
# We have to test for amd.run because we only want to restart amd if
# it was actually running in the first place.
case “$1″ in
     hibernate|suspend)
          ;;
     thaw|resume)
          [ -f /var/lcfg/tmp/amd.run ] && /usr/bin/om amd restart
          ;;
     *)
          ;;
esac

exit $?

Cron and Anacron

The cron daemon executes commands in users' crontabs in /var/spool/cron, and this is the directory in which the LCFG cron component puts files. It also executes a system crontab /etc/crontab and on Redhat-based systems this in turn executes scripts in several other directories - the frequency of execution is in each directory's name: /etc/cron.hourly; /etc/cron.daily; /etc/cron.weekly; /etc/cron.monthly.

This is fine for machines which run 24 hours per day. However a machine which sleeps for much of the time might miss most or all of its cron jobs if they run at a particular time of day, as ours do. To get round this we could make use of a command called anacron. This is a cron replacement which is designed for machines which are not necessarily up all the time. It can be run at arbitrary times - I suggest that we could trigger it from a power management hook script to run jobs when a machine wakes - and it uses a system of time stamps to ensure that however many times anacron is run, its jobs are run only as often as specified. The only granularities on offer are daily, weekly and monthly. Is this a problem?

ascobie: The problem with doing this is that the machine will likely be very sluggish for the first 10-15 minutes whilst it runs all the daily cron jobs (and runs updaterpms); just when the user is most likely to want a responsive machine.

cc: Sorry, I probably wasn't clear enough there. I envisage waking each machine up (if sleeping) in the middle of the night to get it to perform as much as possible of its daily maintenance operations then - especially updaterpms. There would only be a minimum of additional system admin activity performed at any other wake-up: perhaps just a quick addition of new openldap data, and a DNS update? There certainly shouldn't be any need to run things like updaterpms at any time other than at night (with the exception of updaterpms also running when the machine does a full reboot - that happens now and I don't envisage us needing to change that). I agree that the machine would be sluggish if it was performing lots of system admin tasks during a day-time wake-up but I don't think it would need to all those tasks, as they would have already run at night.

I went through a typical day's entries in /var/lcfg/log/syslog. Taking 'Mar 10' as the example day:

[gothenburg+]cc: grep 'Mar 10' syslog|wc -l
496
[gothenburg+]cc: grep 'Mar 10' syslog|grep crond|wc -l
489
Out of interest these are the non-crond entries for that day:
[gothenburg+]cc: grep 'Mar 10' syslog|grep -v crond
Mar 10 04:02:01 gothenburg anacron[24411]: Updated timestamp for job `cron.daily' to 2008-03-10
Mar 10 04:02:04 gothenburg runuser: pam_unix(runuser:session): session opened for user news by (uid=0)
Mar 10 04:02:04 gothenburg runuser: pam_unix(runuser:session): session closed for user news
Mar 10 09:30:40 gothenburg sshd[3586]: nss_ldap: reconnected to LDAP server ldap://127.0.0.1 after 1 attempt
Mar 10 09:30:40 gothenburg sshd[3586]: pam_unix(sshd:session): session closed for user cc
Mar 10 09:30:47 gothenburg sshd[3392]: nss_ldap: reconnected to LDAP server ldap://127.0.0.1 after 1 attempt
Mar 10 09:30:47 gothenburg sshd[3392]: pam_unix(sshd:session): session closed for user cc

This is how the 489 crond entries in syslog break down:

Command Number of runs in one day Suggestion
/usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok 288 This is a completely vanilla unconfigured install of mrtg and despite running every 5 minutes it produces no results in the results dir /var/www/mrtg. It wouldn't matter to us if this was never run.
/usr/lib/sa/sa1 1 1 144 This collects system accounting data every ten minutes; it's really not going to matter if it doesn't run while the machine is sleeping.
Scripts in the cron.hourly directory 24  
Scripts in the cron.daily directory 1  
/usr/lib/sa/sa2 -A 1 This one IS a bit of a problem. It's run from /etc/cron.d/sysstat and runs at 23:53 every day. It could be switched to run from anacron but running at variable times of day would tend to invalidate the results somewhat perhaps? It depends what, if anything, we want from these stats. (And if we don't want them, maybe we should stop collecting them?)
/usr/bin/om openldap kick -q |egrep -v '^\[OK\]' 24 Could be run from anacron?
(sleep 1800; /usr/bin/om openldap kick -q hard |egrep -v '^\[OK\]') 1 We should wake the machine at night in order to run this.
/usr/bin/om openldap run -q |egrep -v '^\[OK\]' 1 Could be run from anacron?
Comment from Toby Blake: om openldap kick - this runs hourly and is used to keep the local LDAP database in sync with the master by checking for changes since the last run and applying the modifications to the local db. Obviously if a machine is asleep/hibernating/off then it doesn't need to be run, as there's no real need to keep the ldap up-to-date if nothing is using it. When a machine is up, however, then it does need to be run regularly, so if anacron can only run things at most daily, this could present a problem.
Chris Cooke: Hmm, I suppose anacron itself will need to be triggered somehow though and it wouldn't be enough just to run it when the machine wakes up! So we'll need to run cron too, and trigger anacron from that - we just won't be able to rely on cron running 24 hours a day. But it should be running whenever the machine's up. We could also trigger this job to run when the machine wakes maybe, does that sound necessary or a good idea?
Toby Blake: Yes, I think that sounds sensible - it should definitely run when it wakes up, to sync ldap and then hourly thereafter.
om openldap kick hard - this is an extension to the above, but does deletes as well. Again, the need for it will disappear if we move to a proxy-caching solution.
Chris Cooke: Presumably it wouldn't matter exactly when in the day or night this happened, as long as it happened with roughly the right frequency? So it could be run once a day by anacron?
Toby Blake: om openldap run - this removes unneeded openldap transaction log files - it should be run regularly to stop them building up in number and taking up disk space, but this is primarily a tidying operation. It'll still be needed if we move to proxy-caching.
Chris Cooke: OK. Ditto this one?
Toby Blake: Yes to both of these.
/usr/lib/orders/bin/clientreport > /var/lcfg/tmp/clientreport.log 2>&1 2 Could be run from anacron?
/usr/bin/om boot run 2>&1 |egrep -v '^\[OK\]' 1 We should wake the machine at night in order to run this.
om dns fixperms -q 1 Is this still necessary?
Comment from George Ross: Yes, or something like it. It's there because rpm updates have a nasty habit of changing the ownership of various files, which generally won't break things immediately but will definitely do so several days down the line when the rpm update has been completely forgotten about. If there were a better way to trigger this from an rpm update then that would definitely be a better way to go.
Paul Anderson: The dns component RPM should be able to define a trigger so that it gets called whenever the RPM is updated (provided you know what RPM updates you are interested in ....)
Chris Cooke: If/when this is still needed it could presumably be done at an arbitrary time each day rather than the same time each night, so we could trigger it from anacron?
George Ross: If it can't be triggered in some way (which might be better) then it ideally wants to run reasonably soon after updaterpms. Perhaps I should give lcfg-dns a Run() method, which could be added to the boot component's list of things to run somewhere after updaterpms, assuming that's still kicked off in the same way; though there was some reason I didn't do it that way in the first place...
/usr/bin/printers |sed 's/\t/|/' > /etc/printcap 1 Could be run from anacron?

cron.daily scripts
Name Purpose Suggestion
00webalizer Analyses /var/log/httpd/access_log Could run from anacron. Do we even use webalizer? Irrelevant for desktops anyway.
0anacron Updates anacron timestamps  
0logwatch System log analyser and reporter Could run from anacron. Do we even use logwatch?
certwatch Warns if SSL certificates expire Could run from anacron. Irrelevant for desktops anyway.
cups Tidies /var/spool/cups/tmp Could run from anacron.
inn-cron-expire Expires news articles for innd daemon if running innd is not running so we don't need this.
logrotate Rotates log files We use this. Run from anacron.
makewhatis.cron Updates whatis DB with new man pages We use this. Run from anacron.
mlocate.cron   ascobie: Populates the database used by the very useful "locate" command
prelink Decreases command startup time Now irrelevant?
rpm Constructs /var/log/rpmpkgs Run from anacron.
slrnpull-expire Pulls news articles for offline reading Unconfigured so currently works only on half a dozen example groups. Could run from anacron anyway.
squirrelmail.cron Tidies /var/spool/squirrelmail/attach/ Could run from anacron.
tetex.cron Tidies /var/lib/texmf Could run from anacron.
tmpwatch Tidies tmp directories Could run from anacron.
yum.cron yum maintenance Unused.

The cron.hourly directory contains a couple of scripts which help to run the Usenet News system. Both could be run from anacron. In addition there are cron.weekly and cron.monthly script directories. These were not triggered on the sample day above.

Wake On LAN

A large part of the project's effort went into trying to get Wake On LAN working, because for a while it seemed that this would be the only LCFG-configurable way of waking a sleeping computer at a desired arbitrary time. Later this all became irrelevant when we got Wake Alarms working.

Wake On LAN is a system whereby a computer in a "suspend" or "hibernate" or "powered down" low power state can be "woken" or powered up into its normal operating state by sending it a wake up "Magic Packet" over the network from another computer.

I got Wake On LAN working with a powered down computer, as follows:

  1. Enable Wake On LAN on the target machine:
         # /usr/sbin/ethtool -s eth0 wol g
  1. Power down the target machine in the normal way.
  2. From another machine on the same subnet:
         # /sbin/ether-wake  MAC address of target machine

The first step ensures both that the target machine's network hardware will remain powered up during shutdown and that it will listen for Magic Packets addressed to it. The network hardware remains prepared for Wake On LAN even when the machine is deprived of mains power for short periods after being shut down.

You also need to check the target machine's BIOS settings. On the test machine, a Dell Iptiplex 745, two BIOS settings in the Power Management section needed to be set appropriately for Wake On LAN to work:

  • Low Power Mode had to be turned Off. This was the default setting on the test computer.
  • Remote Wake Up had to be turned On. The default setting was Off on the test computer.

The Magic Packet has to be sent from a machine on the same subnet because the packets are not routed. However they can be; a number of web pages offer a free internet-wide wake-up service based on sending Magic Packets over IP. IS is looking into routing its Magic Packets. An alternative way of getting a machine on one subnet to wake a second machine on a different subnet would be to use VLANs to connect the sending machine to the target machine's subnet.

I didn't manage to get Wake On LAN working with machines in suspend or hibernate states. This is either impossible or exceptionally tricky with our current hardware and software. Some subscribers to the Discuss mailing list at lesswatts.org went to great lengths to help me get this to work, with no success. One subscriber to the list has reported success with a different hardware and software combination. If this facility is ever needed the problem could be revisited.

Wake Alarms

These wake the computer at a set time in the future. They're easy enough to use and work on a machine that's in any of the available low power states - sleep, hibernation and normally powered down. Just do this:

# echo "+00-00-00 00:05:00" > /proc/acpi/alarm 
This would wake a machine five minutes in the future (so obviously it would have to be put into sleep mode or shut down during those five minutes).

Documentation can be found at http://acpi.sourceforge.net/documentation/alarm.html

Wake alarms would be fundamental to the operation of any LCFG-controlled automated sleep system; we would want to wake the machine at night to run updaterpms; we might want to wake it at the opening of a student lab session or working day; we might want to wake it every few hours to check for eligible queued Condor jobs waiting for a host to run on.

Recommendation: LCFG-Configured Automated Sleep

These are the circumstances in which one might want a computer to go to sleep:

  • it might be configured to sleep between set times when not in use - for example a student lab machine could sleep during the night when the lab was closed.
  • it might be sent to sleep by some kind of idleness monitoring process once a set of idleness criteria was matched - an example of such criteria being that the computer had had nobody logged in for a while, had no user jobs running and wasn't then performing system configuration activities such as automated software package management.
  • the machine's user might send it to sleep - for example an office user reading a book or having a meeting might want to make their office quieter by having their machine sleep when not in use.

These are the circumstances in which we might want a computer to wake up from sleep:

  • to perform automated system configuration tasks, for instance the nightly adjustment of software packages
  • when a physically present user wants to use the computer and initiates wake-up (presses the power button)
  • when a remote user wants to use the computer and initiates wake-up (somehow)
  • at a certain time, for instance in a student lab just before opening time
  • to check to see if Condor needs the machine to run jobs

LCFG-configured software would need to:

  • send the machine to sleep when appropriate (e.g. when no user activity had been detected in a while and no LCFG-initiated maintenance was running and no Condor jobs were running or about to run on the machine)
  • wake the machine for its automated maintenance period, and initiate that maintenance on wake-up
  • if the machine was part of a Condor pool, wake it every few hours to check for Condor jobs needing to be run - letting Condor and automated sleep easily coexist, and letting the Condor pool expand and contract according to demand
  • cooperate cleanly with the autoreboot component

Any automated sleep system would need to be highly configurable by LCFG, including for instance factors such as wake-up times; sleep duration; the period during which automated sleep instigation was allowable; whether automated sleep instigation was permissible at all; the idleness criteria; and the degree of aggressiveness of such criteria.

Different wake-up behaviour would conveivably be needed for user-initiated sleep and automatically initiated sleep: if a user sends her computer to sleep because she wants some peace and quiet, she's probably not going to want the machine to wake up every now and then during the day. On the other hand if a machine had gone to sleep because it was currently idle and unused, it would be perfectly reasonable for it to wake up periodically, for instance to check for Condor jobs. In either case, and probably in virtually all circumstances, we would want sleeping machines to wake up in order to perform regular essential system configuration tasks such as software installation.

We could accommodate both wake-up methods easily by running DICE machines in normal operation with a wake alarm set to wake the computer for its next nightly maintenance (boot.run, updaterpms, etc.) so that if the computer is "slept" by its user it will wake at the right time. If our software decided that the computer was idle and should be made to sleep; the wake alarm could then be changed if necessary to wake the computer at some other time, for instance 2-3 hours in the future to check for free Condor jobs.

Possible Financial Savings

The test Dell 745 used for this project used between 64.5W and roughly 90W when running normally. The lower figure was observed when the machine was running no user jobs and the higher figure when it was busy with a computationally intensive task. The lower figure was also typical of a computer running light user jobs such as reading mail or editing documents - normal commodity computing activities.

A machine like this running normally would rarely rise much above its "resting" power consumption in light commodity use. If we say that the typical DICE desktop uses an average of 65W, and that it is switched on for 50 weeks of the year, then we can say that it would use about (65 x 50 x 7 x 24) Wh = 546000Wh, or 546kWh.

The current price of electricity seems to be roughly 11p per kWh, so the electricity needed to run the average DICE desktop for a year costs about £60.

We have about 600 DICE desktop computers, so electricity for our DICE desktops costs about £36,000 each year.

If we conservatively assume that a DICE desktop with an LCFG-controlled automated sleep system slept only at night, say from 10pm to 7am with an hour of normal running in the middle for automated software maintenance - say 8 hours per night - then the average daily power consumption would drop to ( 8 hours x 15W + 16 hours x 65W ) / 24 hours giving 48.3W average consumption; the annual electricity cost for DICE desktops would then be £26,750, a saving of £9250.

If we assume more aggressively that computers are in use for an average of only 8 hours per day plus one at night for maintenance we get an average consumption of ( 9 x 65 + 15 x 15) / 24 = 33.75W; giving a yearly bill of £18,700.

In other words we could cut the DICE desktop electricity bill almost in half.

However, this doesn't take Condor into account. Condor pools the spare cycles of desktop computers into a high throughput processing system. In my tests a week's membership of the Condor pool increased a computer's power consumption by about 20%. When busy running Condor jobs the computer typically used about 90W, which itself is only a 38% increase on the resting level, so we can say that the test machine was running Condor jobs for a bit over half the time, say 13 hours per day. My test computer in the Condor pool would cost about £72 per year to run instead of £60. 600 of these would cost £43,200 per year.

Let's once again add in automated sleeping during idle periods. If we assume a system whereby the computer only sleeps when there is no Condor job available for it to run, the Condor figures will be much as before. The commodity computing use will also be the same as before, so on an average day the machine will be spending 8 hours being used directly by a user (8 hours x 65W) plus 13 hours running Condor jobs (90W) leaving only 2 hours sleeping (15W) giving an average daily power consumption of 72W and an annual cost of £66.50 - £39,900. Sleeping in this case would only save us just over £3000 per year.

Is this too pessimistic though? The above calculations assume that every day is a working day. If we assume that the average computer is largely idle for 2 days out of every 7, but with Condor use unchanged from day to day, and if we assume universal machine membership of the Condor pool, things look more encouraging. The average daily weekend power consumption for a DICE desktop would then be 65Wh (an hour of automated maintenance) plus 13 x 90 = 1170 Wh (Condor) plus 150 Wh (10 hours of sleep), totalling 1385Wh, or average consumption of 57.7W. Averaging consumption out over the year, 57.7W x 24 x 2 x 50 plus 78W x 24 x 5 x 50 gives 138480 + 187200 = 325680Wh, say 325kWh, or £35.75. Totalled over 600 machines this gives £21,450.

This would mean a saving of £21750. Once again we would save about 50% of the cost of the DICE desktop electricity bill, but this time with all the desktop machines in a well-used and useful Condor pool.

This does depend on having a system whereby machines went to sleep when idle, and where sleeping machines would briefly wake up from time to time to check whether Condor had any jobs for them. This currently seems achievable.

The calculations above are full of averages and roundings and should of course be taken only as a very rough illustration of possible costs and savings.

-- ChrisCooke - 31 Mar 2008

Post-project Reflections

While working on the project there was no sense of steady progress being made. Progress, understanding and knowledge tended to come in large discrete lumps. The experience of going through long intervals without any apparent progress being made tended to be alarming and discouraging; but as long as the effort continued to be put in regularly, meaningful results were obtained from time to time, however unevenly they arrived. My advice to anyone else doing a largely research-based project would be don't be discouraged.

The experience of starting the project with little knowledge of the area and trying to build up a critical mass of understanding was also somewhat alarming. Again, persistence paid off; widespread and sometimes random reading around the subject eventually built up a joined up picture of the subject area and allowed work to proceed.

Most of the progress in the project came via conversations and contact with other people. If you're doing research, don't hesitate to talk to other people, to put ideas to them and to ask for knowledge or opinions. It's not just valuable, it's absolutely vital. Above all talk to external people as much as possible; find relevant mailing lists and internet forums and put your questions there, introduce your project, communicate as much as possible with as many people as you can. (If you're worried about displaying your ignorance, don't be. You won't look or seem daft; plenty of people far stupider than you ask questions on mailing lists every day, and get useful answers back. It's what mailing lists are for.)

-- ChrisCooke - 04 Apr 2008

Topic revision: r5 - 07 Apr 2008 - 16:37:54 - ChrisCooke
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies