Chris's Power Diary

I've started reporting progress on this project to my new blog instead of here. Look for the power management tag.

This is my diary for the Investigate power management options for DICE desktops project. More recent entries appear above older entries, blog-style. Feel free to add comments - the more comments and ideas, the better the project - as long as you include your [wiki]name too. ChrisCooke.

31 Jan 2008

Nigel Cunningham (a.k.a. Mr. TuxOnIce) has been in touch with a really helpful message, which he says it's fine for me to post here:
Hi.

I just discovered your wiki entries regarding power management.
Interesting stuff! I noticed your comment that I hadn't documented the
wake alarm support in TuxOnIce very well. I'll seek to fix that. In the
meantime, here's a brief description:

/sys/power/tuxonice has five files that are relevant to this: lid_file,
wake_alarm_dir, wake_delay, powerdown_method and post_wake_state.

Wake_alarm_dir says which rtc alarm to use. If you want to use
/sys/class/rtc/rtc0/wakealarm, put rtc0 in this file.

Wake delay is the delay (in seconds) after going to sleep until we
should wake again.

Powerdown_method says what we should do after writing the image and
setting the alarm (if we set one). The numbers are based on ACPI state
numbers, so 0 = non acpi poweroff, 3 = suspend to ram, 4 = enter ACPI
platform suspend-to-disk state and 5 = acpi power off.

Post_wake_state is like powerdown_method, but described what to do after
we wake.

Lid_file says which file to use to check the lid state
(/proc/acpi/button/$Lid_file/state. If you want to use
/proc/acpi/button/lid/LID/state, you'd put "lid/LID" in here.

If the lid is open when we resume, the post-wake-state is ignored.

Using this combination of files (and assuming the wake events work on
your computer), you can:

* Write a hibernation image, then suspend to ram. Wake (say) 20 minutes
later and power off completely, unless the lid is opened and the
computer woken in the meantime (in which case we just resume).

* Write a hibernation image and powerdown completely. Wake at 6am (set
an absolute time by setting the wake_alarm prior to starting the cycle
and not using the wake_alarm_dir/wake_delay entries), resume (ie reload
the hibernation image) and suspend to ram until you're ready to use the
computer.

Hope that helps!

Nigel
Thanks a lot, much appreciated! (And I'm slightly alarmed and quite pleased to find that people are actually reading this page!) In subsequent email Nigel also said
All I'd really ask is that if you have some feature you'd like to see,
let me know please. I know from your comments that you guys want to
minimise the diff against vanilla kernels. That said, I'm far more
focussed on providing hibernation features than the guys who work on
the mainline kernel, so I'll be more responsive to issues and feature
requests. (Of course if you end up not using TuxOnIce, I won't mind
either!)

30 Jan 2008

I've at last got round to trying out a pm hook script to get us round the problem of the amd automounter crashing when the machine resumes from a sleep state. As noted before the hook scripts go in /etc/pm/hooks. At our site the amd automounter is stopped, started and configured using the corresponding LCFG component, and to restart the daemon we do om amd restart. So here's the hook script /etc/pm/hooks/25amd:
#!/bin/bash

. /etc/pm/functions

case "$1" in
        hibernate|suspend)
                ;;
        thaw|resume)
                /usr/bin/om amd restart
                ;;
        *)
                ;;
esac

exit $?
25 was just an arbitrary choice, putting it somewhere in the middle of the running order for the scripts. Anyway, I've tried suspending the machine and leaving it for several hours, then resuming, which is normally enough to make amd give up completely at resume time. This time when the machine resumed amd simply restarted. Magic. Yes, it was a new process with a different pid, and amd was doing its job properly.

Actually I had better go back and check the amd crashing situation. Confession time: since we're thinking of possibly moving our desktops from FC6 to SL5 in the summer, I've tried out SL5 on my power management test machine to see what differences I could see. (I couldn't see any - the software seems to look and behave pretty much identically - the only difference I could see was that the SL5 version of gnome power manager puts a prettier icon on your menu bar...). Anyway, so I'd better go back and double check that long suspends and hibernates do indeed provoke an amd crash on resume on SL5 just as they do on FC6.

21 Jan 2008

I've just checked that the timed wakeup works with hibernate as well as with suspend and shutdown. It does.

  • set machine to wake in 5 mins
  • hibernated
  • machine woke up successfully

  • set machine to wake in 5 mins
  • suspended
  • machine woke up successfully

I then got curious about what would happen if the power plug was pulled from the machine while it was hibernating.

  • set machine to wake in 5 mins
  • hibernated
  • pulled power from machine for a couple of seconds
  • restored power
  • machine woke up successfully

  • set machine to wake in 2 mins
  • hibernated
  • pulled power from machine for 5 minutes
  • restored power
  • machine woke up successfully when the power was restored.

  • set machine to wake in 2 mins 30 secs
  • hibernated
  • unplugged for one minute
  • plugged in again with a minute to go before the scheduled wake time
  • the machine failed to wake up again

Kind of the opposite of what you'd expect. The machine remembered its wakeup time even after being unplugged for several minutes. But it only managed to thaw/resume when the power was restored if the scheduled resume time had already passed.

19 Jan 2008

Here's the transcript of a chat I had recently with Simon. It's lightly edited mostly just to remove my most witless remarks. In it we discover how to suspend or shut down a machine with a wake-up alarm, neatly taking care of our desire to wake machines during the night to perform their routine sys admin maintenance tasks; Simon reckons that we shouldn't bother to revive LCFG's old APM era suspend and resume methods for LCFG-controlled software, but should get any software that needs to to use the OS's mechanisms directly; and we discover where to put hook scripts to be run when a machine wakes up.
 Simon : That HAL stuff you mention in your Power Management investigations is a DBUS interface, so you should be able to script it.
 Simon : Hang on - might be able to give you a command example ...
 Chris : ok
 Simon : dbus-send --system --print-reply --dest=org.freedesktop.Hal /org/freedesktop/Hal/devices/computer org.freedesktop.Hal.Device.SystemPowerManagement.Suspend int32:60
 Simon : (all as one line)
 Simon : might send the computer to sleep for 60 seconds. Completely untested.
 Chris : I shall try it now!
 Chris : Error org.freedesktop.Hal.Device.SystemPowerManagement.AlarmNotSupported: Waking the system up is not supported
 Chris : :-(
 Simon : Boo. Well - the interface is there, it's just that the implementation isn't.
 Simon : At least that lets you rule out that approach ...
 Chris : Yes indeed
 Chris : One of the big problems is being able to rule things out
 Chris : I just can't seem to find concrete information on things
 Simon : I think what we're trying to do is quite unusual. At the moment.
 Chris : Yes.  Which is odd.  You'd think people would want to save power on their desktops.  But every Linux power management site seems to be solely about laptops!
 Simon : Which OS are you trying all of this with?
 Chris : fc6
 Simon : Might be worth getting hold of fc8 and just checking that things like HAL don't do what you'd want there.
 Chris : I gather that HAL has definitely moved on a bit since fc6 so I think I'll do that.  Thanks.
 Simon : I'm not sure if the problem is HAL, or the acpid that's implementing the power management stuff.
 Simon : Hang on. I've got an FC8 VM here - I'll just give that a go on this.
 Chris : ah brilliant ta
 Simon : Doesn't work on FC8, unfortunately. The script is hardcoded to not do it - and the pm-suspend command that's called doesn't work.
 Simon : ... with wakeup alarms, that is.
 Chris : *frustration*
 Chris : oh well thanks!
 Simon : Have you tried /proc/acpci/alarm ?
 Chris : no I haven't?
 Simon : What kernel version is on your test machine?
 Chris : 2.6.22.9-61_FC6_dice_1.2
 Simon : Right, you'll need sys/class/RTC instead.
 Simon : Does the BIOS have a RTC function?
 Chris : pass?
 Chris : I'll boot up the bios if you like?
 Simon : Right, try this - do you have a /sys/class/rtc/rtc0/wakealarm directory ?
 Chris : nope, not even a /sys/class/rtc directory
 Simon : Looks like our kernels don't have the necessary magic enabled.
 Simon : Take a look at : http://www.mythtv.org/wiki/index.php/ACPI_Wakeup
 Chris : I do have a /proc/acpi/alarm for what it's worth
 Simon : Hmmm/ Lets try /proc/acpi/alarm.
 Simon : # echo "+00-00-00 00:05:00" > /proc/acpi/alarm 
 Simon : And the machine _should_ wake up 5 minutes later.
 Chris : OK - have tried that
 (Chris shuts down the machine with "shutdown -h now")
 Simon : Yes - I think this will bring it back from halt, rather than back from suspend.
 Chris : well it's a good start
 Simon : If it works ...
 Chris : I shall let you know!!
 Chris : I'll save a transcript of this :-)
 Simon : Please do. I know someone who's got this going with their MythTV box, but they're on a different BIOS.
 Chris : ah right does it wake up to record things?
 Simon : If it doesn't work, it might be worth checking to see if the Real Time Clock stuff is enabled in the BIOS.
 Simon : Yup.
 Chris : OK I'll check that
 Chris : or Stephen will...
 Simon : http://acpi.sourceforge.net/documentation/alarm.html
 Simon : Might be useful, too. It suggests that you can use the alarm to resume from sleep, too.
 Chris : Very useful if it works
 Chris : ah - signs of life!  It just woke up :-)
 Chris : "It's alive!"
 Simon : Well - I guess that means something works ... :)
 Chris : Excellent
 Simon : Might be worth trying exactly the same thing with /proc/acpi/alarm but suspend the machine (with pm-suspend) rather than shutting it down completely.
 Chris : Yes I'll try that now then...
 Chris : it's suspended...
 Simon : I wonder if any of this will work with newer kernels. My FC8 box has neither /proc/acpi/alarm, nor /sys/class/rtc
 Chris : Oh.  There must be a new equivalent surely...?
 Simon : Yes, the /sys/class/rtc stuff is - but the implementation of that does seem to be a little in flux at the moment.
 Chris : nothing yet from sleeping beauty... another couple of minutes...
 Simon : Ah - but my VM doesn't have an ACPI capable BIOS, so that might be why its not working there.
 Chris : Ah OK
 Simon : In theory it's got 2 minutes more before it should wake up ...
 Chris : we have life!
 Simon : Excellent. Looks like that might work, then.
 Chris : Yes!  :-)
 Simon : Cool.
 Chris : very!!
 Chris : Now to make LCFG aware of suspends and resumes I suppose...
 Simon : I'm not convinced that LCFG should know anything about them. That's up to the OS of the machine to handle, using it's own mechanisms - I don't think that LCFG should be getting in the way.
 Chris : Oh.  it used to be aware of APM suspends and resumes.
 Simon : Of course, you might want to make sure that the client knows about them, to do new profile fetches - but I think the (old) idea of every component having a suspend and resume method is wrong. Daemons that need to know about that should register with the OS's power management framework, rather than relying on LCFG.
 Simon : Fedora has a mechanism for things knowing about power management events - it's what a "normal" Fedora laptop uses.
 Simon : I just think we should use that mechanism, rather than building our own one.
 Chris : I was hoping to for instance keep amd wheezing along for a bit longer by getting it restarted by the amd component after a resume...
 Simon : So, in general, if a service needs to know about a suspend/resume it will probably already register for it.
 Simon : Amd may be a special case. Does amd need restarting after a resume? Is it just broken otherwise?
 Chris : it breaks horribly when the machine resumes.  Crashes and exits.
 Chris : Not that this is necessarily an insoluble problem.
 Chris : And Stephen doesn't think much of it and reckons we should use the other automount thing instead (I forget its name) as it's far more widely used and better supported
 Simon : I'm with Stephen. I don't think that the other automounter can handle our current map formats though. NFS for homedirectories has to die first, I think.
 Chris : OK - I'll craft a message to the amd mailing list about the crashes then... 
 Simon : Or, just register something that restarts amd with the current ACPI resume notification system.
 Chris : Ah - OK
 Simon : I'm not opposed to that, but I can't see the point of developing an entire LCFG infrastructure that mirrors something already in the OS.
 Chris : acpid?
 Chris : stick a rule in /etc/acpi/events?
 Simon : Possibly, yes. I think there might be a cleaner way of doing that, but I can't find it at the moment (something must be handling dhcp leases on a resume, for example, and it's not there)
 (Chris wonders how the various bits fit together)
 Simon : Basically, you have things like gnome-power-manager, which talk, over DBus to HAL.
 Simon : HAL is an abstraction layer - it doesn't actually do anything, it just sits there and interfaces between the desktop applications, and the utilities that control the hardware.
 Simon : On RedHat, these utilities are the pm-* set of tools, which instruct the kernel when to start/stop/suspend etc.
 Simon : The ACPI code in the kernel then takes these commands from userspace, and passes them on to the BIOS.
 Chris : And the business of writing values to various magic /proc and /sys files sits at the same level as the pm-* tools?  Or maybe that's how the pm-* tools ask the kernel to do things?
 Simon : Ish. 
 Simon : At the end of the day /proc and /sys are just ways of communicating between userspace and kernel. The pm-* utilities achieve their suspending by using a different mechanism (the ioctl) for talking to the kernel.
 Simon : Ultimately it's a set of shell scripts that calls pm-pmu to do the actual suspending operation - the shell scripts take care of starting and stopping any necessary services before and after the suspend.
 Chris : And this is exactly where we could tie in things too I suppose... like anything that needed to be done to amd (although really it should just cope properly with supend/resume)
 Simon : Yes - you just need to drop scripts into the /etc/pm/hooks directory.

15 Jan 2008

This is interesting... I've located the HAL 0.5.10 Specification, specifically the org.freedesktop.Hal.Device.SystemPowerManagement interface, and this lists the methods which can be called to accomplish power management tasks. Not only does it have methods for Suspend and Hibernate and Hybrid (both suspend and hibernate simultaneously, like a Mac) but also it lets you specify an optional number of seconds to wakeup, so you can tell it how long you want to sleep for! If we could use this there would be no need to bother with all this troublesome wake-on-LAN stuff, we could just calculate when a machine should wake up to perform its sys admin maintenance type tasks, find out what the time was now, then work out how long the machine should sleep, when we were going to send it to sleep.

How to use this? No idea yet. acpid is for notifying user processes of ACPI events, says its man page, so I can't use it to request a timed sleep. Gnome power manager's help pages make no mention of a timed sleep either. Where else to look...?

14 Jan 2008

A wee detail I was curious about while I was envisaging a shutdown and wake-up service. I remember reading somewhere about wake on LAN "Magic Packets" having to be sent from the same subnet as the machine to be woken as they're UDP not TCP and UDP packets aren't routable. Or something like that; I can't remember where I read this so can't check the exact wording.

I've just checked and I can power on a machine which has had wake on LAN activated (with ethtool as described earlier) and then been powered down, by sending a Magic Packet with ether-wake. I cannot power on the machine using ether-wake from a machine on a different subnet. But then, ether-wake doesn't let me associate the MAC address I give it with any sort of IP address so that's not very surprising.

However the wikipedia wake on LAN page gives links to a number of internet-based wake on LAN services. One of these is http://www.remotewakeup.com/ - just enter your machine's IP details and its MAC address and the site sends a Magic Packet. It does warn you that your firewall should let through port 9 / udp (or another port if you prefer). Doesn't work for my test machine.

Anyway I'm guessing from all this that the "udp isn't routable" stuff is rubbish; it appears that udp is perfectly routable, or at least that lots of people seem to think it is. I can't say whether or not we could get Magic Packets from one of our subnets to another with the routing in its current configuration; what I can say is that ether-wake isn't up to that particular job. It works perfectly well when run on the same subnet however.

9 Jan 2008

More on hal-info: http://hughsient.livejournal.com/6702.html

When configuring the latest hal-info downloaded from http://hal.freedesktop.org/releases/ I get this:

        ================ !!! WARNING !!! ========================
             You use a too old HAL version! You need at least
               hal >= 0.5.10 to use this hal-info version!
        =========================================================

I just want to try hal-info to see what it does so I'll look for one compatible with the hal I have on fc6.

Had to go back to hal-info-20070831 to find a version which doesn't blankly refuse to configure for my version of hal. This one instead says

configure: WARNING: hal 0.5.10 or later is required for ipw killswitch. Disabling this feature.
... which is fine by me, ipw seems to be some sort of wireless so I can live without it, I'm just interested in ACPI.

(Diversion: an interesting blog entry on ACPI suspend/resume on Linux: http://mihai.bazon.net/blog/acpi-suspend-resume-your-linux)

So anyway I then "make". It seems to be doing a load of stuff but it doesn't actually make anything and there's no hal-info script afterwards. I expect I've missed something.

4 Jan 2008 (2)

I put this in a file in /etc/acpi/events/ and did a kill -HUP on the acpid daemon:
# ACPID config to find out what events look like to acpid
event=.*
action=/bin/echo "%e" >> /tmp/acpid.chris
It took a bit of tweaking before /var/log/acpid stopped reporting
[Fri Jan  4 14:03:21 2008] reloading configuration
[Fri Jan  4 14:03:21 2008] ERR: regcomp(): Invalid preceding regular expression
[Fri Jan  4 14:03:21 2008] 2 rules loaded
and started saying
[Fri Jan  4 14:04:05 2008] reloading configuration
[Fri Jan  4 14:04:05 2008] 3 rules loaded
instead. When I press the power button - which triggers a shutdown - my temporary file gets this:
button/power PWRF 00000080 00000001
button/power PWRF 00000080 00000001
Did I press it twice? Not sure. Anyway, once I've booted the machine up again and logged back in, and made sure that my new acpid config file is still there and still loaded, I try suspending the machine using the gnome power management gui control. The machine suspends; I resume it a few seconds later. And my /tmp/acpid.chris file has no extra contents at all. The acpid daemon isn't reporting anything. Maybe I need to use the currently empty /etc/acpi/actions/ directory instead of /etc/acpi/events/? I don't know what the difference is. Of the two directories, the acpid man page only mentions /etc/acpi/events/. Putting a similar rule in a file in /etc/acpi/actions/ and hupping acpid again results in no error messages (in the log file) but no extra rules being reported either; still just the same "3 rules loaded". A further suspend still doesn't make anything show up in acpid's new output files. A hibernate and resume of the machine also doesn't produce anything in the acpid output files.

Just to settle the "did I press the button twice" question I've pressed it again, exactly once. This time, after the machine's booted again, the acpid output file contains

button/power PWRF 00000080 00000001
button/power PWRF 00000080 00000001
button/power PWRF 00000080 00000001
button/power PWRF 00000080 00000001
So that settles that, one button press produces two lines of output.

When the machine comes back from suspend or hibernate a screensaver password box appears. Presumably this is something to do with gnome-power-manager. I wonder how it knows that the machine has resumed.

OK, looking at gnome-power-manager's source (from http://ftp.gnome.org/pub/GNOME/sources/gnome-power-manager/2.16/) I see this in the README: GNOME Power Manager is a GNOME session daemon that acts as a policy agent on top of the Project Utopia stack, which includes the kernel, hotplug, udev, and HAL. GNOME Power Manager listens for HAL events and responds with user-configurable reactions. Currently it supports UPS's, laptop batteries and AC adapters. Its goal is to be architecture neutral and free of polling and other hacks.

Most of the code is actually in HAL for abstracting various power aware devices (UPS's) and frameworks (ACPI, PMU, APM etc.) - so the desktop parts are fairly lightweight and straightforward to write. Great, so I have to get LCFG to listen for HAL events too?

Introduction to HAL: http://en.wikipedia.org/wiki/HAL_%28software%29

HAL homepage: http://freedesktop.org/wiki/Software/hal

HAL quirks and power management: http://people.freedesktop.org/~hughsient/quirk/quirk-suspend-index.html

Apparently the latter concerns the optional extra package hal-info which isn't installed on our machines. It has model-specific information and scripts called quirks which implement wee tweaks to for instance help a resume to work properly. The site says that the Dell Optiplex 745 isn't covered - as usual it seems to be orientated towards laptops - but it also says that you can simply use dmidecode to generate a hal-info entry for your machine.

4 Jan 2008

I wondered if there was a way of changing BIOS settings from Linux. A net search seems to suggest that there isn't. You can get at the BIOS settings with the dmidecode utilities but the documentation suggests that the information returned can't really be trusted and that the quality of the results is heavily model-dependent. I was thinking that maybe LCFG could set the times of the computer's automatic switch-on and switch-off BIOS settings. But perhaps not.

I see from the ngeneric component code that the suspend and resume methods "get called by APM". This appears to have happened via apmd which was configured, started and stopped by lcfg-apm.

Presumably an ACPI equivalent would use acpid. The acpid man page starts: "*acpid* is designed to notify user-space programs of ACPI events." Which sounds promising.

I just tried running acpi_listen then suspending and resuming the machine; acpi_listen is meant to connect to acpid and listen for events, then when it detects an event it prints it out on stdout. In the event acpi_listen printed nothing out.

/var/log/acpid registers a connection though: when I run acpi_listen these lines appear in /var/log/acpid:

[Fri Jan  4 11:58:38 2008] client connected from 8087[28267:10000]
[Fri Jan  4 11:58:38 2008] 1 client rule loaded
When I suspend the machine this is what /var/log/acpid says:
[Fri Jan  4 12:02:24 2008] client connected from 6890[0:0]
[Fri Jan  4 12:02:24 2008] 1 client rule loaded
Nothing handy like "suspend detected" or "resuming now" then.

What sort of events are meant to go into the acpid config file anyway? Maybe I could put in a catch-all event with an action which logs the event name, then try a few suspends and hibernates to see what gets logged.

21 December 2007 (3)

Note to self: the only way to keep the diary properly comprehensive and up to date seems to be to keep a diary window open on the desktop at all times, and to document everything to it at the time. Then copy the lot to a wiki edit once or twice a day.

21 December 2007 (2)

some hints here:

http://archive.netbsd.se/?ml=xen-users&a=2007-07&t=4797852

try bnx driver instead of tg3: http://www.broadcom.com/docs/driver_download/NXII/linux-1.5.10c.zip

also try more recent tg3 driver form broadcom website too

https://docs.astro.columbia.edu/ticket/646

Struggling to get Ubuntu working with the NIC in a 745 http://www.broadcom.com/support/ethernet_nic/netxtreme_desktop.php

check above url...

Struggling to get SuSE working with the NIC in a 745 http://www.math.ucla.edu/~jimc/documents/optiplex-745.html

21 December 2007

Still trying to get the beast to respond to wake on lan while suspended or hibernated. It looks as if it can't be done with the current network card (unless a firmware upgrade is possible? No idea. But doing a firmware upgrade on all our machines is going way outside the realms of easy-peasy LCFGeasy remote controlled admin, anyway).
[gothenburg]root: ethtool eth0
Settings for eth0:
        Supported ports: [ MII ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Half 1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Half 1000baseT/Full 
        Advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: g
        Current message level: 0x000000ff (255)
        Link detected: yes
In other words it only supports MagicPacket wakeups. On an Ubuntu wiki page which I carelessly forgot to bookmark I found a hint that waking up from suspend or hibernate mode requires a PHY packet rather than a MagicPacket? Is that rubbish? Anyway, according to the ethtool man page, thatcorresponds to wol mode p rather than g. However all attempts at setting a wol mode to anything other than just g result in errors, since as you can see above ethtool says that this interface only supports g:
[gothenburg]root: ethtool -s eth0 wol pg
Cannot set new wake-on-lan settings: Invalid argument
  not setting wol
[gothenburg]root: ethtool -s eth0 wol bumpag
Cannot set new wake-on-lan settings: Invalid argument
  not setting wol
[gothenburg]root: ethtool -s eth0 wol p
Cannot set new wake-on-lan settings: Invalid argument
  not setting wol
[gothenburg]root: ethtool -s eth0 wol g
[gothenburg]root: 
and so on. The situation is the same after upgrading frpm ethtool version 3 to version 6, the latest and greatest. More data:
[gothenburg]root: lspci | grep -i net
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5754 Gigabit Ethernet PCI Express (rev 02)
[gothenburg]root: ethtool -i eth0
driver: tg3
version: 3.77
firmware-version: 5754-v3.15
bus-info: 0000:02:00.0
[gothenburg]root: 
Sigh. Anyway, as I noted recently, with this we could at least shut down machines when we wanted and have them boot later at an LCFG-controlled time, so that's something I suppose; not what was envisaged, but it sounds potentially useful. Condor permitting.

NB: these pages don't help to enable wake on lan for a sleeping machine but they would have saved a lot of time earlier:

http://www.wlug.org.nz/WakeOnLanNotes

http://ralien.nytka.org/index.php/2007/02/20/wake_on_lan_in_fedora_core

19 December 2007

At today's MPU meeting Alastair and Stephen contributed some helpful advice (thanks):
  1. By all means try out Tux On Ice, but bear in mind that we strive to avoid the need for local kernel patching wherever possible.
  2. To track down the cause of the Wake On LAN oddness, try looking at the source (and docs if any) for the machine's ethernet driver. Also see what ACPI settings there are and if they can be changed, and see what XP does on the same hardware.

gothenburg uses the tg3 driver. Looking in its source (drivers/net/tg3.c in the kernel sources) I found this helpful comment:

   /* By default, disable wake-on-lan.  User can change this
    * using ETHTOOL_SWOL.
    */
This led me to try the command ethtool -s eth0 wol g on gothenburg. This enables MagicPacket wake-on-lan. When I try running this then shutting down the machine in various ways I do get more success than before:
  • software-controlled shutdown (e.g. using the poweroff command) - machine successfully responds to wake on LAN packet afterwards (hooray!) smile
  • suspend - machine still doesn't wake to wake-on-lan
  • hibernate - machine still doesn't wake to wake-on-lan frown
So, definite and encouraging progress, if not quite the progress I wanted yet. Even without a reliable sleep solution one could now envisage e.g. student lab machines shutting down overnight automatically then being woken automatically. Mind you, the same could be done from BIOS settings without the wake-on-lan, but that's beside the point as the wake-on-lan solution could be LCFG-controlled.

18 December 2007

Matthias Hensler's "Software Suspend on Linux / Fedora Core / RHEL" pages tell you how to get TuxOnIce up and running on Fedora and offer some kernel packages with TuxOnIce compiled in: http://mhensler.de/swsusp/

Our systems use initrd to boot, so need TuxOnIce tweaks added to the initrd image before we can use TuxOnIce. Here's something about initrd: http://en.wikipedia.org/wiki/Initrd

And here's the pivot_root man page: http://man.linuxquestions.org/index.php?query=pivot_root&type=2&section=8

17 December 2007

Kernel rebuild howto: http://www.digitalhermit.com/linux/Kernel-Build-HOWTO.html (check for a more canonical version)

Tux On Ice howto: http://www.tuxonice.net/HOWTO

14 December 2007

Shutdowns with shutdown -h now and poweroff also leave the machine in a state in which it's unresponsive to wake on LAN packets.

The swsusp documentation can be found in the kernel source Documentation directory, but it doesn't mention wake on LAN anywhere and it was last revised on "2003-10-20".

Tux on Ice can be applied to the kernel as a patch. Its kernel source Documentation file doesn't mention wake on LAN either. It can run in or out of "swsusp replacement mode". The Tux on Ice website "Features" page mentions "wake alarms" as a feature, but this isn't mentioned or explained anywhere on the site or in the documentation or anywhere in the entire kernel patch. What's a wake alarm? The Tux on Ice website also has extensive lists of supported hardware, which makes me think that unsupported hardware must be a distinct possibility too; if we do end up using this we'll need to check our hardware against the list.

To do: find and check mailing lists for swsusp and Tux on Ice looking particularly for mentions of wake on LAN.

13 December 2007 (2)

Interesting: my test Dell 745 responds to a wake-on-LAN packet sent from a neighbouring machine (using ether-wake) but only sometimes. It doesn't respond to wake on lan when it's been suspended with pm-suspend or hibernated with pm-hibernate. It responds to wake on lan when it's been shut down with a press of the power key if the press happened before Linux had booted (that is if I'd only just switched the machine on!). It doesn't respond to wake on lan when it's been shut down from the Gnome system menu or from the gdm login screen. Wake on lan is enabled in the BIOS, and as I say it does work occasionally. When it's in a not-going-to-wake-on-lan state (e.g. when it's been shut down from the gdm login screen), pulling the power cable out then pushing it in again doesn't put it into a responsive state. When it is in a responsive state (when it's been shut down seconds after being powered on), pulling the power cable out then putting it back in a few seconds later doesn't change that; it still responds to wake-on-lan packets afterwards.

13 December 2007

Stuff I've been reading lately (well, last week):

Building an FC6 kernel: http://docs.fedoraproject.org/release-notes/fc6/en_US/sn-Kernel.html

Hardware Abstraction Layer: http://en.wikipedia.org/wiki/HAL_%28software%29 and http://freedesktop.org/wiki/Software/hal

Kernel Korner - Extending Battery Life with Laptop Mode - http://www.linuxjournal.com/article/7539

http://www.samwel.tk/laptop_mode/

http://www.thinkwiki.org/wiki/Laptop-mode

http://www.earth.org.uk/low-power-laptop.html

http://www.gentoo.org/doc/en/power-management-guide.xml

Linux: Reducing Power Consumption - http://kerneltrap.org/node/11700

http://www.ma.utexas.edu/users/stirling/computergeek/powersaving.html

6 Dec 2007 (2)

Examining /etc/init.d/cpuspeed, it does seem to have run at boot time. Compared to florence which doesn't have SpeedStep enabled, gothenburg has extra cpufreq information in /sys (e.g. /sys/devices/system/cpu/cpu1/cpufreq/) and ps shows an extra /usr/libexec/hald-addon-cpufreq process. cpufreq-info gives this:
[gothenburg+]cc: cpufreq-info
cpufrequtils 002: cpufreq-info (C) Dominik Brodowski 2004-2006
Report errors and bugs to linux@brodo.de, please.
analyzing CPU 0:
  driver: acpi-cpufreq
  CPUs which need to switch frequency at the same time: 0
  hardware limits: 1.60 GHz - 1.87 GHz
  available frequency steps: 1.87 GHz, 1.60 GHz
  available cpufreq governors: ondemand, userspace, performance
  current policy: frequency should be within 1.60 GHz and 1.87 GHz.
                  The governor "ondemand" may decide which speed to use
                  within this range.
  current CPU frequency is 1.60 GHz.
analyzing CPU 1:
  driver: acpi-cpufreq
  CPUs which need to switch frequency at the same time: 1
  hardware limits: 1.60 GHz - 1.87 GHz
  available frequency steps: 1.87 GHz, 1.60 GHz
  available cpufreq governors: ondemand, userspace, performance
  current policy: frequency should be within 1.60 GHz and 1.87 GHz.
                  The governor "ondemand" may decide which speed to use
                  within this range.
  current CPU frequency is 1.60 GHz.
The power meter was registering mostly 62.66W a minute or two ago; now it's up to a steady 63.7W.

6 Dec 2007

  • I gave a short presentation yesterday about the project (PDF slides).
  • Ian suggested powering down discs when idle. Solaris does this on some Suns and Mac laptops have done it for years. I don't know if Linux can do this or not
  • Ian also wondered about the breakdown of the power consumption by hardware component. That's also worth looking at.
  • Stephen suggested enabling Enhanced SpeedStep in the 745's BIOS - he observed that it's disabled by default. I enabled it yesterday but the machine's power consumption has gone slightly up if anything - from 63.5 to 64W when idle to 64.something. The power does now seem to fluctuate a lot more though. I've just had another look at the power meter and I see that the consumption is changing every second or more between various values in a range between 62.1W and 64.8W. I've never seen it go below 63.5W before so that's new. Stephen also pointed out /etc/init.d/cpuspeed which seems to have control over what is used to control the cpu state.

22 Nov 2007

  • Thought: wouldn't it be nice if Condor and power management could co-exist? We could have machines suspending or hibernating when not needed, but when the Condor queue got too big we could start waking machines up a few at a time until the queue had gone down enough. It'd need some work of course. I can't remember what facilities Condor might have for signalling a big queue; and I'd imagine that it wouldn't be too clever to have a machine suddenly hibernate from under a Condor daemon. I wonder if Condor could be told to quit the machine by a pre-hibernate hook of some kind? Anyway, an interesting thought for the future.
  • Slightly late with the non-Condor week's power reading: at 15:28 today the meter read 10.99kWh. Compared to that the power consumed in the Condor week, 13.2kWh, was almost exactly a 20% increase. Without Condor the power averaged 65.4W; with Condor, 78.6W.

21 Nov 2007

I'm trying to understand the relationships between pm-utils, Powersave, swsusp, uswsusp, suspend, suspend2, TuxOnIce, gnome-power-manager, acpid, which ones are descendants or relatives of which other ones, which ones use which other ones, which ones are in the kernel and at which version numbers, which aren't, which need to be, which don't, which might be relevant, which are installed on our systems and which aren't, and how to install them, and how to use them all, and which ones respond to Wake on LAN signals and which don't, and how LCFG might or might not or could or could not be tied in to any of them. It's a jigsaw puzzle with an unknown number of pieces.

I forgot to suspend florence last night. I've now set up a cron job which runs pm-suspend every night just before I go home; that should do it.

19 Nov 2007

With George's help florence is now running a new amd am-utils-6.1.6-0.inf.1 which has debugging enabled. Debugging has been turned on thuswise:
!amd.gvariables              mADD(debug_options)
!amd.gvar_debug_options      mSETQ("debug_options = full,info,readdir,trace")
!amd.gvar_log_options        mSET(log_options = all)
If I suspend it tonight we should have more data tomorrow on amd dying.

16 Nov 2007 (3)

  • I take it back about TuxOnIce looking tidy. The install process looks like a complete mess. Or at least, it won't just drop in in an RPM or a patch to an RPM; you need to tell it exactly which block of the disk to resume from, so it needs a custom install which needs to be changed each time you remake or otherwise muck about with the filesystem. This is from reading through the kernel patch at http://www.tuxonice.net/downloads/all/tuxonice-3.0-rc2-for-2.6.22.11.patch.bz2.

16 Nov 2007 (2)

dbus-send --session --dest=org.gnome.PowerManager --type=method_call --print-reply --reply-timeout=2000 /org/gnome/PowerManager org.gnome.PowerManager.Hibernate
No, really.

16 Nov 2007

  • Did reading yesterday on TuxOnIce. So far it looks pretty well put together, but its documentation says it's orientated mainly towards hiberation, not suspension. I have a feeling that hibernation could be too slow and clunky to be very usable for us but it's certainly worth trying; I'd like to get TuxOnIce up and running and give it a go.
  • I hibernated my main DICE machine florence last night. This morning when I woke it up, amd had disappeared, as it did when I suspended gothenburg earlier. om amd restart restored my X session to life after maybe ten seconds. This will have used TuxOnIce's relative swsusp which was forked off it two name changes ago and is now merged in to the 2.6 kernel. I've read about there being some frustration that TuxOnIce isn't merged into the 2.6 kernel, I came across a reference to its main developer not wanting it to be since he reckons it should be a userspace thing instead, but nevertheless TuxOnIce is distributed in the form of a kernel patch. Mysterious are the ways of developers.

15 Nov 2007 (2)

After exactly one week, the power reading for gothenburg taken at 15:02 today was 13.2kWh. Now I'll have a week without Condor. I've reset the power reading to zero.

15 Nov 2007

AlastairScobie - An article in Bits April 2006 reports that we are now paying commercial prices for our electricity and " The bill for 2006-2007 is estimated to be 3.4m, and after that, 6m -- more than double this year's bill ". Compare this with the electricity bill for 2002/2003 - 1.5m !!

ChrisCooke - Thanks for that. And yikes! So maybe my guess of 10p per kWh wasn't so unreasonable after all!

13 Nov 2007

  • This morning when I came in the power meter was showing 64W. So, gothenburg isn't running any Condor jobs? Sure enough, it's not; condor_status shows the Condor pool to be totally quiet just now, no machines claimed, no jobs running.
  • I may be giving a short talk on the project so far at the next development meeting. So, what's the state of the project? Maybe it's time to stop randomly flittering about the subject and get some thoughts organised.

I'm looking into the possibility of saving power on our DICE Linux machines. I want to find out how easy it might be and how much money we might save. I also recognise that we might find that some kinds of power saving aren't compatible with running a Condor network, so I'd like to find out how much extra power that might use and how much it might be costing us. I need to get us some real figures so we can decide the best course of action - for instance if we decide to go for Condor it'll be helpful to be able to demonstrate roughly how much extra we might be spending by doing so.

I particularly want to explore the possibility of sending the machines to sleep at night, either by suspending or hibernating.

A couple of definitions might be in order here:

Suspend
save the machine's state to memory, then power the machine down except for the memory and for bits which might be used to provide a cue to wake the machine back up (e.g.: power button; keyboard or mouse, so jiggling the mouse could wake the machine up; or maybe the network interface, so that the machine could wake up on the command of another machine. On cue, the machine will power back up and restore its state from the saved copy in memory.
Hibernate
save the machine's state to disk, then power the machine down. When the machine is powered back on, instead of booting the OS as normal it boots using a special restore-the-saved-state-from-disk program, which gets the machine right back to where it was before you hibernated.

There are lots of different terms for these two things depending on which OS or context you're talking about them in but these are the ones which Gnome uses. The ACPI specification defines five "sleep states", S1 to S5; of these, Gnome's suspend corresponds to S3 and hibernate to S4.

Anyway, where was I? Oh yes, organising my thoughts and not being distracted by side issues. Easier said than done.

So, I want to explore the following issues:

Sleeping
Can we make the machines sleep on command? Can we make the machines sleep and wake up again without breaking? What would need to be done? (For instance, when I tried a suspend recently it broke "amd".) How much power and money might sleeping save us? When would we make the machines sleep anyway - whenever they're idle? At night? How might we do the things which normally take place at night, for instance updating the software packages?
Saving power on running machines
There appear to be ways in which we could save power on machines when they're running - for instance automatically switching components such as the CPU to low power states whenever they're not being asked to work flat out. There are a number of possible things which can be done here; most of them involve changing the Linux kernel; some of them have been implemented already in various different kernel versions; some may save significant amounts of power and some may not. It has to be said here that this is less important than sleeping because new kernels come along from time to time anyway so we'll be getting some of these benefits for free in time. However, it'd be interesting to know what sort of power savings may be in store for us, so I want to list the methods I can find out about and for each one find out:
  • what kernel version will it come in?
  • [when] can we use that kernel version?
  • can we safely patch an earlier version of the kernel instead?
  • assuming I can get any of this working - I'd like to measure the resulting power saving to see if it would make a significant difference to us.
Condor
I want to measure the power use of Condor and non-Condor machines so I can see how much extra power a machine uses as a result of being in the Condor pool. Also, if I manage to get any of the kernel-level power saving techniques running, it'd be interesting to see how much they cut down the Condor and non-Condor power use.
Recommendations
Once I've found enough out I'll hopefully be able to make some.

12 Nov 2007 (3)

I looked at the crontabs on gothenburg the other day, curious to know what usually happened at night and what we might have to make special arrangements for or shift to another time should we start suspending or hibernating machines at night. My eye was caught by anacron which according to its man page "runs commands periodically" but " does not assume that the machine is running continuously". This naturally caught my interest. As far as I could gather from poking around various system files (details later when I dig them up again, perhaps) anacron is set up to run any script that appears in any of the directories /etc/cron.daily, /etc/cron.weekly and /etc/cron.monthly - though it doesn't touch /etc/cron.hourly or any of the crontab or /var/spool/cron files. anacron runs these scripts as appropriate whenever it itself is run. As far as I could see from system logs, anacron is not being run at the moment. Instead these scripts are being run from cron which also looks in those directories and in the ones which anacron doesn't.

Back in the days of APM and of supported DICE laptops, LCFG used to have the concepts of suspend and resume and components could be made to do things at either eventuality. I don't know if any of this still works or there's any ACPI support there.

12 Nov 2007 (2)

  • The Uni web search gives David Somervell's department as Estates & Buildings. A poke around there turns up the Energy & Sustainability Office, and there's some useful information there on energy use in the University. According to the most recent Utilities Report in 02/03 the University used a total of roughly 60 million kWh of electricity and paid about £1.5 million pounds for it, substantially less than the cost in previous years. I make that 2.5p per kWh, which is about a quarter of Scottish Power's average domestic tarriff. (Blimey.)

12 Nov 2007

  • It's spelled Somervell. I've mailed him to ask for electricity prices.
  • I've been monitoring the power meter again. Different Condor processes take different amounts of power, which I suppose makes sense. This morning when I came in the machine was running two "Simics" processes and was using 96.4W. Logging in to the machine makes Condor suspend one of the processes, bringing the power consumption down pretty quickly to 86.76W. Thus I can tell from a glance at the current power consumption how many Condor jobs the machine is currently running. I cross-check this by logging in to another machine in the Condor pool and running condor_status.
  • At 10:02 today (the 12th) on I noted the total power consumed since 15:02 on Thursday 8th: 7.83kWh. Next weekend I'll try to repeat this without membership of the Condor pool but also without any suspend or hibernate running. I expect the machine to use 64W x 89 hours = 5.696kWh.
  • Another thing I did over the weekend was to suspend my other FC6 machine florence using the Suspend option in Gnome's System menu, which I think really means gnome-power-manager. When I pressed the power button to unsuspend the machine this morning, this is what happened:
    • X didn't come back.
    • text and remote logins were available, but both hung.
    • I could interrupt a text login with Ctl-C and get a command prompt, but this also produced an error message: nfs: server pid4632@florence: /amd/legacy not responding, still trying.
    • klist was not found.
    • PATH was /usr/local/bin:/bin:/usr/bin
    • which klist said /usr/kerberos/bin/klist and running this gave what looked like a normal collection of tickets, just issued or renewed by the new text login.
    • I made a copy of the whole of /var for posterity to /disk/scratch/.
    • ps axuww | grep -i amd showed that no amd process was running.
    • X still wasn't working - still black and unresponsive.
    • om amd restart succeeded.
    • A login attempt on another text console then succeeded, and two messages appeared there, both saying nfs: server pid4632@florence: /amd/legacy OK.
    • X popped up a password prompt.
    • Entering my password to the prompt brought back Friday's X session, apparently fully functional and unharmed.
    • AFS is fine too.

8 Nov 2007 (4)

Ask David Somerville how much our electricity is likely to cost in the new building...

8 Nov 2007 (3)

Half an hour later. Condor seems to have pounced on gothenburg more or less immediately! Both bits of its CPU are "Claimed" and "Busy". It's used 0.03KWh in the last half hour.

8 Nov 2007 (2)

I think it'd be interesting to measure the power consumption of a machine that's in the Condor pool. So I've just joined gothenburg to the KB condor pool...
#include <dice/options/condor-kbpool.h>
...and set the power meter to measure total power used in KWh and set the total back to zero. It's just after 3pm on Thursday; I'll let it run until Monday morning. Then next weekend I'll do the same but without Condor.

8 Nov 2007

I mailed the Uni's COs lists yesterday with a plea for any Linux power management experience. Nobody came back with Linux power management tips but I did get some interesting and helpful replies:
Graeme Wood
There are moves afoot to start using open access labs and other PCs around the place in condor clusters when they are not otherwise engaged (which is interesting, I hadn't known that. I wonder if this is Linux Condor or Windows Condor or both? I'll ask...)
Donald Grigor
He offered to send me the measured power consumption for Select PC candidates, both active and hibernating. Very useful, thanks!
James Jarvis
James told me how IS gets round the problem of open access lab machines being dual boot but only having power management set up for Windows: they've set up Linux to logout and reboot after a period of idleness, and when the machine boots the default OS is Windows; so idle PCs end up participating in the Windows power management scheme in the end whatever OS they start out running. Here it is in slightly more detail:
Idlerebootd handles shutdown if nobody is logged in.

http://pie.ucs.ed.ac.uk/lcfg/rpms/mdp/idlerebootd-1.0-6.noarch.rpm

Contains the following text files

/etc/init.d/idlereboot
/etc/sysconfig/idlereboot.conf
/sbin/idlerebootd

On the lcfg box we add rc_idlerebootd to the boot.services...

I am happy to have it improved!

The idle logout uses a patched version of gnome-screensaver and a custom gconf setting (managed through the lcfg-gconf component). The rpm, src rpm for sl5 and my patch are available at

http://homepages.ed.ac.uk/jjarvis/gnome-screensaver-2.16.1-6.mdp.i386.rpm
http://homepages.ed.ac.uk/jjarvis/gnome-screensaver-2.16.1-6.mdp.src.rpm
http://homepages.ed.ac.uk/jjarvis/gnome-screensaver-2.16.1-enable-auto-logout.patch

7 Nov 2007 (3)

Gah! Whatever I try in the way of updated kernel packages, I get some sort of package conflict and/or a missing homedir. My latest attempt screwed the machine with a kernel panic. I'll give up on this for now, reinstall the machine with standard FC6 DICE, and hope for a working 2.6.22 (or so) kernel some time soon. Meanwhile I'll get on with other stuff.

7 Nov 2007 (2)

I should point out that the last thing I managed yesterday was to get the machine running with a 2.6.22 kernel - I didn't have time to try logging in... when I tried logging in today I found that the lack of a home directory (because of the lack of AFS) meant that Gnome and KDE both refused to start, leaving me without easy access to pointy-clicky power management tools. Hence the desire to go back and get AFS after all.

Hopefully sometime soon we'll have the new kernel version and AFS and AMD and NFS all working together.

7 Nov 2007

I've had an explore of our kernel and openafs packages then had a chat with Stephen. He's furnished me with some good ideas.
  • Instead of chucking out afs-client.h because of its incorrect version of openafs-kernel I could simply have minused out that version in lcfg/gothenburg then added in the version I wanted. Much easier.
  • There are two versions of 2.6.22 kernels for FC6 in our packages repository. AMD and NFS access are horribly broken in both of them. The earlier one 2.6.22.2-42_FC6_dice_1.1 has a matching openafs-kernel package but some of the required kernel-module packages aren't there. The later one 2.6.22.9-61_FC6_dice_1.1 has all the necessary kernel-module packages but no openafs-kernel package. I can live without AMD and NFS if I have to, as long as my AFS homedir is there; it just means that all the new kernel packages will have to be installed in one big updaterpms -f wave before a reboot.
  • The openafs-kernel package needs the machine to have a kernel package with a matching number.
  • Stephen first suggested building a matching openafs-kernel package and talked me through the instructions in OpenAFSKernelBuilding.
  • He then suggested a much easier alternative - go to http://www.openafs.org and download one. We looked and there is a matching one there. The standard package is very similar to ours - all you lose is some tweaks we made for Condor's benefit. I make it a practice never to allow Condor on a machine that's not bog standard - it's not fair on the job submitter for their job to encounter a machine with a non-standard environment - so that's fine by me.

6 Nov 2007

  • I've set up a new 745 gothenburg for the project. Standard DICE setup, nothing fancy.
  • Done some measurements on its screen. It's a "Dell 1708FP", looks different from the one I tested before (mostly black like the other one but this has a silver strip at top and bottom). In powersave mode this monitor uses 12W rather than 2W. At a brightness of 0% it uses 26W; at 88, 40W; at 100%, 43W; at 30%, 29-31W.
  • The computer takes 64W when up and running and not doing anything in particular.
  • Doing a long "M-x hanoi" in xemacs makes it run at a steady 80W.
  • Have played around with gnome preferences (mainly "Power Management" in the "More Preferences" menu on the "Preferences" menu). Suspend and Hibernate are both available.
  • Suspend turns out to be Suspend to RAM.
  • Hibernate turns out to be Suspend to disk.
  • Suspend takes a second or so to turn things off, after which the PC sits there with its power light flashing green. It takes 14.5W while suspended. It won't wake up unless you hit the power button; it doesn't listen to the keyboard or mouse.
  • Hibernate took about 12 seconds to save the state to disk. It flashes up a number of boot-style text messages to the screen while it's doing it. Once that's done the power goes off completely, and you can't tell the difference between a powered down computer and a hibernating one. Pressing the power button turns the machine on; it goes to Grub as normal; the difference comes just before a booting computer would say "Welcome to Fedora Core", the de-hibernating computer instead flashes up some swsusp text messages and then goes back to the state it was in pre-hibernation.
  • While hibernating, the machine takes about 12W.
  • When "powered down" the machine also takes about 12W.
  • I tried running a "M-x hanoi" in xemacs on gothenburg but displaying on another machine, then suspending the machine. The hanoi window froze on the other machine. When gothenburg was unsuspended again, the hanoi process just started exactly where it had left off. I did this several times.
  • Gnome's Power Management preferences let you tell the machine to Suspend (or Hibernate) after a specified period of inactivity. I set it to suspend after 2 minutes' inactivity, which seemed to be about the lowest it would go. Once again I was running a hanoi game but displaying it on another machine. The hanoi process once again froze and unfroze perfectly cleanly.
  • I've done numerous suspends and hibernates and have seen no trouble, not even any X or graphics trouble. All of them have been of short duration though; I don't know if that would make a difference.
  • Gnome power management settings are only in effect while the user is logged in; once the user logs out, the power management settings revert to whatever the machine had before. Or at least, after I'd logged out and had stopped running processes, the machine ran for more than 2 minutes without suspending itself.
  • When logged in to the machine, and also running a "hanoi" on that machine but displaying on another machine, the machine suspended due to inactivity when there had been no activity on its own keyboard or mouse - even with the "hanoi" process still running on it and displaying on the other machine.
  • I've quickly bludgeoned the machine's LCFG file into upgrading from our current standard 2.6.20 kernel to a 2.6.22 one so I can investigate the effect of the tickless kernel and the bunched-up interrupts. I had to disable openafs to get the kernel to install (I was just being hasty, no doubt there's a better method!). To change the kernel version you put
#define KERNELVER 2.6.22.9-61_FC6_dice_1.1
before the headers in the lcfg file. However, dice/options/afs-client.h also thinks it knows the desired kernel version as it's what says which version of the openafs-kernel package the machine gets. Rather than spending time wrestling with this to achieve a better fix I just put
#define DICE_OPTIONS_AFSCLIENT
before the headers to turn off AFS altogether. A couple of reboots later the machine was running with a 2.6.22 kernel.
  • /boot/config-2.6.22.9-61_FC6_dice_1.1 has CONFIG_NO_HZ (for tickless) and CONFIG_HIGH_RES_TIMERS defined, so it should now be capable of saving some CPU power during idle periods.
  • The dramatic first measurement: while just sitting around not doing anything in particular, the power meter says that the power consumption has dropped from 64W (64.1W I think it really said) to 63.75W. Impressive! If we roll this out across the whole School we might be able to buy a packet of biscuits with the savings.
  • With a 2.6.22 kernel I might now be able to install and use powertop, and use it to find out what it is that's stopping the CPU from being idle on this machine. Something must be, otherwise we'd surely see a difference of several watts in the power consumption.

2 Nov 2007 (4)

http://kerneltrap.org/node/11700 - Linux: Reducing Power Consumption

2 Nov 2007 (3)

I'm now the proud keeper of a Brennenstuhl PM 230 electricity meter. It cost the Uni all of £14 or so from CPC.

My Dell LCD screen takes 33W when in use and set to a brightness that I find comfortable (about 30% on its adjustment scale). With brightness set to the minimum ("0%") the backlight is still on but power use goes down to 19W. Set to the maximum ("100%") it takes 49W. Let the display time out and go into power-saving mode (so the green LED on the front turns orange, and the display goes black) and the power use goes down to 2W. A big difference. When switched off with the button on the front of the display, the unit still uses the same 2W.

I configured my screen to go off after 10 minutes using Gnome preferences. I see you can also do it using xset if you like - for example xset dpms 0 0 10 powers down the display after ten seconds of inactivity - useful for brief tests but not for normal life.

  • You can do one off forced settings using xset dpms force standby for example, to save messing with your normal timeouts - neilb

My Dell Optiplex GX745 uses 62-63W when doing light work such as editing this page. Building a big RPM gets it up to 80W or so. When booting it takes anywhere from 65W to 99W, the latter when first switched on. When powered down and switched off it still uses 17W!

2 Nov 2007 (2)

http://kerneltrap.org/taxonomy/term/194 - A selection of links to tickless articles on kerneltrap. This one says that the config options to look out for are CONFIG_NO_HZ (tickless kernels) and CONFIG_HIGH_RES_TIMERS (higher resolution timers, microseconds instead of milliseconds).

I just checked our current FC6 kernel and it has neither of these in its /boot config file.

2 Nov 2007

http://www.pps.ed.ac.uk/for/staff/computer_ordering/SelectPC.htm - the jumping-off page for the Dell Premier pages for the Uni - they contain some tech details about the Dell Optiplex 740 (or 745, if its config changes?) systems that will become our next standard desktop computer.

Wikipedia on AMD Cool'n'Quiet, AMD Cool'n'Quiet - this turns out to be a desktop equivalent to the laptop-only PowerNow! AMD power-saving tech programme I mentioned yesterday, so I needn't have worried. I've just checked our Dell pages and the 740s we'll probably be buying in quantity do appear to have AMD Athlon 64 chips with Cool'n'Quiet.

http://en.wikipedia.org/wiki/Power_management

http://en.wikipedia.org/wiki/Green_computing#Approaches_to_green_computing

Some interesting Wikipedia articles...

http://saf.bio.caltech.edu/saving_power.html - someone's practical power-saving project, full of basic recipes and measurements.

http://en.wikipedia.org/wiki/TuxOnIce - Yes, it's a Linux power-saving technology, and yes, the logo features a penguin and some ice. He's not ice-skating, he's frozen into the middle of an ice cube, poor thing. He has the same blank happy look on his face as always though so presumably he doesn't mind. Anyway. Previous articles have mentioned competing Linux kernel power management implementations. swsusp is in the kernel; uswsusp is a user-space version of swsusp; and suspend2 is said to work more reliably than either of them but not to be in the kernel yet. That might have changed by now, there were plans to merge it in to a later kernel version. Anyway, the point of this is that suspend2 was renamed TuxOnIce, much to the lead developer's pained disapproval, or so I gathered from his blog which I seem to have failed to bookmark. And this is the official page. OK?

http://en.wikipedia.org/wiki/Hibernate_%28OS_feature%29#Linux

More wikipedia. Its pages on power management topics seem spread about in a large number of topics, not all of which link to each other. This page turns out to be where I found out about some of what I said in the previous paragraph.

http://lwn.net/Articles/243404/ - a detailed status report from summer 2007 on suspend and hibernation facilities in the Linux kernel. I shall need to go through this one carefully.

http://lwn.net/Articles/240253/ - summaries of three talks by Len Brown, the maintainer of the Linux ACPI subsystem. the first covers the tickless kernel; the second tries to counter "ACPI myths" (i.e. criticisms) and the third covers heat-saving on Linux-powered mobile devices (such as phones). All are brief, clear and informative.

http://lwn.net/ - the Linux Weekly News, once again

http://www.google.com/search?hl=en&q=lwn.net%3A+%22power+management%22&btnG=Search - google-searching the LWN for power management

http://www.techworld.com/green-it/features/index.cfm?featureid=3589&pagtype=samechan - a chatty press article on the PowerTop tool.

http://linux-ata.org/ - of unknown value yet. Some information on Serial ATA for Linux from (apparently) one of the driver developers.

1 Nov 2007 (3)

It's getting pretty clear that I'm going to have to understand more about the Linux kernel, so I just asked Stephen and Iain if they had any handy kernel books I could take a look at. They say no, that the kernel changes too quickly for books to stay up to date for very long, but try websites. These are the ones that Stephen recommends:

http://kernelnewbies.org/ - a good starting point

http://kerneltrap.org/ - a bit more scary. (I agree, the article I found particularly scary in an earlier entry below was on kerneltrap.org! Though I doubt that's the sort of scary that Stephen meant.)

http://lwn.net/ - I found this myself yesterday actually and I agree again, it looks like a really useful site to keep up with. (Wonder if it has RSS menus for my browser bar? Will check)

1 Nov 2007 (2)

http://acpi.sourceforge.net/documentation/ - I thought at first that this was the central site for ACPI support under Linux, but now I see that everything in it is rather old, so maybe not.

Anyway - ACPI is the current dominant power management technology in the PC industry, replacing APM. APM got to the point where it pretty much worked reliably, but it was limited: it mostly happened in the BIOS, so was limited and inflexible, and as far as the OS was concerned pretty mysterious and invisible. So ACPI came along; it brings a lot of the functionality to the OS level and lets the OS specify the power management behaviour that's desired instead of having it just mysteriously happen and having to change BIOS settings to change the behaviour.

Another ACPI link is the ACPI wikipedia page - lots of acronyms explained there and some links to official ACPI pages (and, interestingly, to a Leaked Antitrust Memo: Bill Gates on Making ACPI Windows-specific).

Next up, some articles I found on advogato.org.

http://www.advogato.org/article/913.html - How Linux suspend and resume works in the ACPI age. Roughly what goes on when you try and do this on Linux, and what goes wrong too.

By the same guy (Matthew Garrett of Ubuntu), http://www.advogato.org/article/932.html - Runtime power saving on Linux - not all CPU use is equal. A very short article which is nevertheless a really useful intro to Linux power saving, it gives a brief, clear, simple explanation of tickless kernels and C states and mentions Intel's PowerTop tool (details at lesswatts.org).

And here's Matthew Garrett's blog: http://mjg59.livejournal.com/ - some interesting power management stuff there, mostly Ubuntu-specific as far as I can see so far.

http://www.advogato.org/proj/Perfd/ - a bit of a dead end, this one - it was a project aiming to solve the problem of power management software constantly having to react to events and sudden demands and find a balance between keeping a resource ready to use so it could respond quickly to any sudden demands for it and shutting the resource down to save power. The idea was that processes would indicate their likely resource needs to the system which could then predict which resources it was safe to shut down and how long for. However nothing here seems dated past about 2002 (!) and there's precious little there.

http://www.advogato.org/person/apenwarr/diary.html?start=167 - a very enjoyable and detailed flame on just why ACPI is absolutely awful and a huge step backwards from APM, which eventually just about half-worked on Linux in a limited kind of way. I love this article. Here's a sample:

ACPI stands for Advanced Control and Power Interface. Now, I have two things to say about that. First of all, there was no "Delightfully Simple and Straightforward Control and Power Interface," although ACPI actually makes APM seem that way in retrospect. And secondly, ACPI has nothing at all to do with APIC, the Advanced Programmable Interrupt Controller. The only comparable thing between the two is that they both have Linux kernel boot-time options to disable them because they both have buggy Linux drivers that cause your computer to crash a lot.

Now where was I? Oh, right, ACPI. So, the idea of ACPI was to get the BIOS developers out of the way on a normally-running system by turning around the power interface: instead of the BIOS running things and just occasionally notifying the kernel when something happened, the kernel would run things and just ask the BIOS to do stuff occasionally, like power down various hardware and blink the lights and so on. That would mean BIOS bugs wouldn't be so harmful. Oh! And while we're here, because we're insane, why not implement the whole thing using Forth-like bytecode instead of real assembly language, so it can also run on that new (doomed) 64-bit processor we've been working on? Forth-like bytecode is super simple and can be implemented in a couple of kbytes, so it won't cause much overhead, and suddenly everything will be portable. It'll be great!

Because I like foreshadowing, I'll give you the quick version of what I'm about to say. To my total amazement, they managed to fail totally on all counts. How's that for consistency?

http://www.linuxpowertop.org/ - the official URL for the powertop tool. Actually it just points back to tge powertop page on lesswatts.org.

http://en.wikipedia.org/wiki/AMD_powernow - just a brief stub article about AMD's power-saving technology PowerNow!. Following the link from there to AMD's own page on it, PowerNow! seems to be only on processors intended for laptops, so it'll be a fat lot of use to us, as I imagine our Dell 740s with AMD processors won't be fitted with processors intended for laptops. Better check that though.

1 Nov 2007

http://www.lesswatts.org looks like it'll be the most important site of the project. It's Intel's power-saving Linux community initiative. It draws together loads of useful information on current power saving ideas, future developments, and documents and promotes a number of new projects to tackle various power management needs. Packed with information. I'm expecting a lot of the project to consist of testing the effectiveness and current viability of each of the lesswatts ideas one by one.

31 Oct 2007 (2)

http://www.linux.com/articles/114220?tid=47&tid=121. This was my first positive hit when searching for Linux power management material. It's a howto from Feb 07 on suspending and/or hibernating a Linux laptop. It mentions ACPI and acpid, swsusp, uswsusp and suspend2, and presents scripts which can be used to perform suspends and hibernates. It points out that the scripts do the same job as commands which come with various power management technologies but that the latter commands don't always work properly, so the scripts work around the problems. It's the first place I came across a complaint about the ACPI specification being "largely misused by manufacturers". The article also briefly mentions kernel versions and recent processors and generally seems like a great source of pointers to useful things to read up on.

http://tuxmobil.org/ - "The Linux-Mobile-Guide is a guide for users of Linux and laptops, notebooks, PDAs and other mobile computers." No obvious power management content, though I would have thought there ought to be.

http://kerneltrap.org/node/8267 - a scary and unpleasant Linus Torvalds throws his toys out of the pram as he rants about suspending and resuming and kernel threads. I don't understand kernel threads. There are some interesting comments near the bottom of the page about suspend-to-ram and suspend-to-disk.

http://www.google.com/univ/ed?hl=en&ie=ISO-8859-1&q=power+management&btnG=Search - a google search of ed.ac.uk for power management. (My project page is coming up second on the first page of the results, eek.) I wanted to know what the rest of the University was doing in the area. The number one result is a short article in the August 2006 BITs:

Power Management rolls out to the labs

As indicated in the February issue of BITs, Desktop Services has successfully rolled out a campus-wide power management scheme to all the PCs in the open-access labs. The College of Medicine and Veterinary Medicine will be able to use the same scheme in its Student Labs by the start of the next academic session.

Power management on individual PCs is quite easy to set up, but deploying it to large numbers of PCs in a controlled manner has taken time to develop.

Through a combination of logon scripts, scheduled tasks and utilities, lab computers that have been idle for a set time period (currently configured to be 45 minutes) will automatically enter 'Standby' mode. Users only need to press a key or move the mouse to reactive the login screen. Standby mode uses considerably less power than normal operation, but still allows EUCS to maintain and update PCs with antivirus definitions and security updates at set times during the night using configurable scheduled tasks.

It is anticipated that, following successful trials in the open-access labs, this technology can be extended to Computing Officers in Information Services and Schools as part of a range of utilities developed by Desktop Services to assist in the management of the Supported Desktop. This would allow them to apply this power-saving technology based on agreed policies within their School or College.

Vladimir Zirojevic (EUCS)

It sounds like a Windows-based initiative, so it's good to know about (and I wonder to what extent we could use it for our own Windows machines?) but maybe not so relevant for DICE. Nevertheless it'll be interesting to find out more.

Also in the results were some messages in the EdLUG mail archive. The particular messages appearing were from 2005 so maybe they're a bit out of date for my purposes now but nevertheless the mail archive is an interesting discovery. The EdLUG site doesn't seem to offer a search of the mail archive but perhaps a google search targeted on http://www.edlug.ed.ac.uk/archive/ would throw up some more goodies.

http://www.maplin.co.uk/module.aspx?ModuleNo=38343&doy=26m10 - A wee plug-in mains electricity monitor that might have been handy for the project. In the end I got something from CPC instead - they're generally cheaper apparently and they're the supplier that the techs use nowadays.

31 Oct 2007

The first entry. I've created a diary for this project because my last project's diary proved to be so useful.

I'm hoping that this one will be good for:

  • bookmarking: instead of just bookmarking a useful-looking site, I should really be describing what looks interesting about it.
  • describing what I understand: attempting to describe something is generally a good way of finding out what I do and don't understand, and clarifying my thought.
  • describing what I don't understand: writing down what I haven't a clue about seems to help a lot with figuring out what to do next. It also gives other people the chance to jump in and helpfully explain things, which is great.

Unfortunately I've already accumulated several dozen bookmarks, so I'll have to go through them in subsequent entries and say something useful and/or interesting about them.

Topic revision: r49 - 23 Sep 2008 - 08:33:59 - ChrisCooke
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies