MPU Meeting Tuesday 6th October 2009
Power Management Project
All of the milestones are complete. However more has been added at various meetings. In general we should resist such enlargement of the scope of a project and have subsequent enhancements added to separate development prioritisation lists rather than added in to the project. These are the enhancements asked for and what we thought of them at the meeting:
- Add a command to inhibit sleeping
- this might be suitable for staff machines but not student lab ones so isn't a priority just now.
- Enable wake on mouse or key press (as well as on power button press)
- this would be useful as it's the expected way to wake a sleeping machine. Can HP dc7900s actually be woken in this way though? Chris could check with Lindsey the behaviour of sleeping MDP 7900s.
- Enable Wake On LAN and provide a cosigned Wake On Lan web service
- this would be nice but should be prioritised separately from the project.
rpmsubmit
The new system went live on Wednesday then fell over on Sunday, after working properly for months! It was probably knocked out by a stray extra Apache process. The transition to the new system happens in several stages; the structure for the clients still needs to be changed, and
yum support needs to be added.
AFS Component
Stephen has been doing more testing. He needs to do even more detailed testing.
TiBS
The deployment of the current lcfg-tibs component has hit a snag: TiBS needs to be quiescent before its configuration files are changed, but these change as a result of LCFG resource value changes which could happen at any time. Most of the proposed solution to this is set out in
this post on Chris's blog. In addition to this, Stephen suggests that a successful
configure
run could touch a file somewhere, and that the date/time of the file could be checked earlier in the
configure
process and any unduly long period since the last successful configure - a week, say - could be flagged up by the component.
LCFG Server Refactoring
Nothing happened
Pandemic Planning
Alastair has written all he can think of on both the
package service and
server virtualisation topics. Until and unless changes come out of the subsequent meetings they can be regarded as finished.
Chris was unsure quite what aspects of a service were covered by pandemic preparedness measures - it's really just meant to cover getting dead services back to life and keeping them ticking over. You don't need to know how to develop a service, for instance adding a new web site to Plone.
Stephen's
master LCFG recovery document is also done.
Miscellaneous Development
- om
- Stephen has a solution which seems to work, but he hasn't yet satisfied himself that it works for the right reasons and in all the right cases. He'll carry on examining the problem.
- PXE
- Stephen has added support to PXE for the Broadcom cards. While he was there he added some handy PXE tools:
- memtest
- you can now run
memtest
from PXE.
- HDT
- the Hardware Detection Tool can also be run from PXE. This lets you browse the hardware on your system. It can do all sorts of useful things such as tell you what kernel modules you'll need for your hardware (useful for new types of machine) and what the machine's MAC address is.
- Updaterpms
- The code to help delete particularly awkward packages is written but needs more testing. By default updaterpms will behave as it did before, but if the -z flag is added to
updaterpms.flags
any failed package removals will be retried with the noscripts
option. Stephen asked that any such removals should be flagged as a success of some kind rather than a failure: Alastair will check this.
- Bug in ngeneric
- Stephen fixed LCFG bug 180: Overrides sigdie handler but doesn't cope with eval.
- Nagios network monitoring
- Alastair has written a nagios module which (currently) tests the status of bonded network interfaces. Get it by including
dice/options/network_nagios.h
. At the moment it's a bit chatty.
Operational
- central reboot
- Chris and Alastair did this early one morning and it took close on two hours, and that was with guest suspensions rather than reboots. Alastair may order those faster disks for the array after all.
- MPU backup summary
- Stephen has compiled a full list of backups of MPU data.
- Backup restore guide
- Stephen has written a practical guide to restoring from backups.
- Bugzilla security fix
- There is one. Chris will install it.
- TWiki security fix
- There is one. Stephen will install it.
This Week
Alastair will:
- think about personal development topics
- Repackage scli
- Finish the updaterpms fix
- Check that the updaterpms fix flags the successful removal of a crap package as a success
- Tidy rpmsubmit
Carol will:
- Set up LCFG/inf level VM to monitor LCFG level
Chris will:
- think about personal development topics
- install the bugzilla security fix
- upgrade bugzilla.inf to version 3.
- TiBS component deployment
Stephen will:
- think about personal development topics
-
install the twiki security fix
- do more AFS component testing
- work on server refactoring.
--
ChrisCooke 6 October 2009
Topic revision: r3 - 08 Oct 2009 - 09:24:53 -
ChrisCooke