MPU Meeting Tuesday 6th October 2009

Power Management Project

All of the milestones are complete. However more has been added at various meetings. In general we should resist such enlargement of the scope of a project and have subsequent enhancements added to separate development prioritisation lists rather than added in to the project. These are the enhancements asked for and what we thought of them at the meeting:

Add a command to inhibit sleeping
this might be suitable for staff machines but not student lab ones so isn't a priority just now.
Enable wake on mouse or key press (as well as on power button press)
this would be useful as it's the expected way to wake a sleeping machine. Can HP dc7900s actually be woken in this way though? Chris could check with Lindsey the behaviour of sleeping MDP 7900s.
Enable Wake On LAN and provide a cosigned Wake On Lan web service
this would be nice but should be prioritised separately from the project.

rpmsubmit

The new system went live on Wednesday then fell over on Sunday, after working properly for months! It was probably knocked out by a stray extra Apache process. The transition to the new system happens in several stages; the structure for the clients still needs to be changed, and yum support needs to be added.

AFS Component

Stephen has been doing more testing. He needs to do even more detailed testing.

TiBS

The deployment of the current lcfg-tibs component has hit a snag: TiBS needs to be quiescent before its configuration files are changed, but these change as a result of LCFG resource value changes which could happen at any time. Most of the proposed solution to this is set out in this post on Chris's blog. In addition to this, Stephen suggests that a successful configure run could touch a file somewhere, and that the date/time of the file could be checked earlier in the configure process and any unduly long period since the last successful configure - a week, say - could be flagged up by the component.

LCFG Server Refactoring

Nothing happened

Pandemic Planning

Alastair has written all he can think of on both the package service and server virtualisation topics. Until and unless changes come out of the subsequent meetings they can be regarded as finished.

Chris was unsure quite what aspects of a service were covered by pandemic preparedness measures - it's really just meant to cover getting dead services back to life and keeping them ticking over. You don't need to know how to develop a service, for instance adding a new web site to Plone.

Stephen's master LCFG recovery document is also done.

Miscellaneous Development

om
Stephen has a solution which seems to work, but he hasn't yet satisfied himself that it works for the right reasons and in all the right cases. He'll carry on examining the problem.

PXE
Stephen has added support to PXE for the Broadcom cards. While he was there he added some handy PXE tools:
memtest
you can now run memtest from PXE.
HDT
the Hardware Detection Tool can also be run from PXE. This lets you browse the hardware on your system. It can do all sorts of useful things such as tell you what kernel modules you'll need for your hardware (useful for new types of machine) and what the machine's MAC address is.

Updaterpms
The code to help delete particularly awkward packages is written but needs more testing. By default updaterpms will behave as it did before, but if the -z flag is added to updaterpms.flags any failed package removals will be retried with the noscripts option. Stephen asked that any such removals should be flagged as a success of some kind rather than a failure: Alastair will check this.

Bug in ngeneric
Stephen fixed LCFG bug 180: Overrides sigdie handler but doesn't cope with eval.

Nagios network monitoring
Alastair has written a nagios module which (currently) tests the status of bonded network interfaces. Get it by including dice/options/network_nagios.h. At the moment it's a bit chatty.

Operational

central reboot
Chris and Alastair did this early one morning and it took close on two hours, and that was with guest suspensions rather than reboots. Alastair may order those faster disks for the array after all.

MPU backup summary
Stephen has compiled a full list of backups of MPU data.

Backup restore guide
Stephen has written a practical guide to restoring from backups.

Bugzilla security fix
There is one. Chris will install it.

TWiki security fix
There is one. Stephen will install it.

This Week

Alastair will:

  • think about personal development topics
  • Repackage scli
  • Finish the updaterpms fix
  • Check that the updaterpms fix flags the successful removal of a crap package as a success
  • Tidy rpmsubmit

Carol will:

  • Set up LCFG/inf level VM to monitor LCFG level

Chris will:

  • think about personal development topics
  • install the bugzilla security fix DONE
  • upgrade bugzilla.inf to version 3.
  • TiBS component deployment

Stephen will:

  • think about personal development topics
  • install the twiki security fix
  • do more AFS component testing
  • work on server refactoring.

-- ChrisCooke 6 October 2009

Topic revision: r3 - 08 Oct 2009 - 09:24:53 - ChrisCooke
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies