MPU Meeting Tuesday 26th May 2009

Power Management Project

gnome-power-manager doesn't support timed wake-up - there's a stub in the code for adding it in the future sometime but it's not there yet.

gnome-power-manager can send a machine to sleep successfully, and lcfg-sleep can then wake it up successfully - but after wake-up gnome-power-manager doesn't do anything further to signal or decide that it is still idle - so the machine simply stays awake. So machines with someone logged in will sleep only once per period of idleness. Not important, something to be tackled in the future.

lcfg-sleep is now in the develop release, but heavily limited. It had a header inclusion order problem - dice/options/video_intel.h was included after dice/options/sleep.h - but that's been solved: the component now has resources for the current graphics driver (which defaults to the value in the xfree.device_main resource) and a list of approved drivers. It uses these to compare the video driver in use with the list of approved drivers as part of its sleep evaluation process.

Chris has been reviewing the limitations placed on lcfg-sleep and thinks it may be possible to safely remove one: when the i810 driver is used in conjunction with 5.3, the machine suspends and resumes perfectly well except for a gnome warning that appears at a subsequent login. This warning can be suppressed! So we could perhaps spread lcfg-sleep to the i810-using 5.3 745s and 755s, making it a much better test. We'd still have to warn those people using VGA cables on their machines to disable sleep - probably this is largely COs using KVM switches.

Once SL5.3 has been rolled out we can start testing the sleep component in the student labs.


The coding is complete, Stephen will review.

Architecture documentation is still required.

Alastair was waiting for hardware for the package server but this has now arrived so he should be able to get on with the testing fairly soon.

AFS Component

The higher-level documentation still needs to be written.

The notes in the MPU minutes for 13/5/09 about migrating to the new DB servers were not totally correct. We need to have DB servers with the names and IP addresses which are in the cell information. We would get the new machine installed, the keyfile in place and the component configured and ready to go but without the component (and the AFS services) running. We would then drop the interface on one host and bring it up on the new machine before starting the component which will bring up the DB services. Users will see a ~22 second lag if they happen to be talking to the DB server that has been pulled, so this should all be done during an at risk period. To be cautious it might be best to do one DB server per week to ensure the new component is working as expected.

The AFS nagios monitoring translator has been finished. It now puts the DB servers and file servers into appropriate clusters based on the cell with which they are associated.

Support has been added for configuring the server NetInfo and NetRestrict lists so that the AFS services can be hosted on machines already providing other services.

Support for the AFS client on 64bit machines will have to be done purely through configuring files until the openafs project get the perl AFS module building on that platform. This is not a big problem as even without that module the new component will provide better configuration support than the old one.

Alastair will use the new dice/options/openafs-fileserver.h on his package server. He will be looking into tweaking the various parameters to get the best performance.


Nothing more from Craig so far so no progress.

LCFG Server Refactoring

Alastair needs to send the project proposal to COs today for consideration at the June Development Meeting.

Miscellaneous Development

Chris had a problem with calling the Run method of the sleep component from cron when the sleep component had not been started. This revealed a bug in the Run() method in both the shell and Perl code in the lcfg-ngeneric package. If the component had never been started then a call was made to Configure which would always fail in a rather unhelpful manner. After a bit of debate with COs it was agreed that the most desirable behaviour would be for the Run method to fail if the component had not been started. The new version (1.2.32) is in the testing release for this week.

We will need to hold one lab on SL5.2 until the end of September for the MSc students who have already started their projects on that platform. Stephen will talk to Alison about identifying which lab that will be. We would very much like to avoid having all the servers held at SL5.2 in the way we did the SL5.1 to SL5.2 transition, there doesn't appear to be any good technical reason to do this. We would be able to hold individual machines if any problems arise. Alastair will take the issue to CEG.

Catalyst work
Stephen spent some personal development time playing with Catalyst - the Perl web framework. He proposes a mini-project to create a search interface to our package repositories to allow searching for packages on file name and RPM tag contents. The package metadata would also be viewable. This would be an excellent way to get more experience with Catalyst and could with a few days work provide a very useful tool for COs.

Package repository management
Stephen mentioned that he would like a large chunk of SAN space (at least 0.5TB) so he could mirror the entire SL5 repository and start playing with ideas for improved package update management tools. Alastair suggested getting some space on the atabeast since the satablade was being unreliable. He will use telford for the trials, if it turns out that lots of work is required he will produce a project proposal.


New hardware
We now have new server models, we have a Dell R710, which is a replacement for the 2950, and a Dell R200, which is a replacement for the 860. There are some problems with IPMI and serial console support. Alastair is investigating.

LCFG level VM
Carol has started looking through the LCFG book to get up to speed. Stephen suggested that if she finds any bits particularly hard going she should let him know so that the documentation can be enhanced for beginners.

Moving dresden data
Chris will mount the new space from the evo array onto dresden and sync all the data across from the satablade.

om improvements
Stephen sent around a proposal for the changes but no-one has replied so far.

Stephen plans to do the upgrade this week, he just needs to get it installed onto a test machine again to be sure he has the correct configuration this time.

Mini-projects list
This mini projects list has now been tidied.

cmirror dependency problem
The problem is in the LCFG level installbase. Stephen will look into resolving the problem. Possibly it will just need cmirror removing for that stage of the install process.

AFS cachesize change
There does not seem to be any reason to hold back on rolling out the change to the AFS cachesize. Stephen will make the change this week. It will only have an affect at reboot time.

FC5 is dead
We are all agreed that FC5 is dead and has been for quite a while. Chris wants to send out a reminder that we no longer do any testing for this platform nor do we build install CDs (the current ones will continue to work).

Testing releases
Carol has started shadowing Chris to learn about making the testing releases each Monday. This revealed that not all aspects are thoroughly documented, Chris will update the wiki page. Stephen moved all the scripts into the LCFG subversion repository and packaged them. They are installed on tobermory, anyone doing testing or stable builds should also install them onto their machine using the live/release-scripts.h header. Stephen also added the start of a testing framework which could be used to improve the current release testing.

Brainstorming notes
Alastair has written a progress report on the LCFG brain storm.

central upgrade
The VM server central has been upgraded and rebooted but not without a few unexpected difficulties, bakerloo will be next, it should be easier.

This Week

Alastair will:

  • Talk to Carol about the LCFG level work.
  • Think about a virtualisation mini-talk.
  • Remind Craig about moving hawthorn to KB.
  • Work on rpmsubmit project.
  • Take SL5.3 on servers issue to CEG
  • Propose the start of the LCFG core refactoring project

Carol will:

  • Set up LCFG/inf level VM to monitor LCFG level.

Chris will:

  • Help Carol with the testing release.
  • Disable gnome's suspend/resume warning
  • enable sleep for i810 too on supported machines.
  • Move dresden data to evo array.

Stephen will:

  • Fix cmirror dependencies.
  • Switch to a bigger afs cachesize.
  • Upgrade lcfg twiki.
  • Tackle 'om' improvements.
  • Create DICE-level SL5.2 package lists
  • Add instructions to the LCFG wiki on introducing new components
  • Document the changes to the release scripts

-- StephenQuinney - 26 May 2009

Topic revision: r2 - 29 May 2009 - 09:01:06 - StephenQuinney
