MPU Meeting Monday 16th March 2009
Buildtools Project
Support for subversion has been added to the buildtools suite. Following on from that all MPU components and other source have been moved to a new subversion repository. This brings the buildtools project to a close. There are a few ongoing issues with MacOSX support to solve but otherwise it is all done. Stephen will write up a summary and then request the project be closed at the next development meeting.
Power Management Project
Chris has finished the development phase for this project. He is now testing the code and fixing any bugs he finds. The next step will be to ask for COs to volunteer to test the system. Chris would like to analyse the performance to see how often the machines sleep and for how long. Stephen suggested getting the component to log whenever the machine sleeps or wakes up, the log files could then be harvested and analysed later.
RPM Submission Project
Alastair has started work on this again and has been familiarising himself with the Perl AFS module and exploring possibilities. He still needs to change the milestones on the project page.
AFS Project
TiBS Project
Miscellaneous Development
- Packaging Guidelines
- Stephen has reviewed them and thinks they're fine, except that the RPM numbering policy could be more explicit, especially for when we make local adaptations to RPMs from elsewhere. Chris will make this change then publicise the changes and invite comments from COs.
- DIY DICE under static VMware
- Carol has tested this out and it works. She found a flaw in the instructions caused by the VMware configuration interface having been changed in a minor point release: Alastair will change the instructions.
- DIY DICE under roaming VMware
- Carol has tried this out and it didn't work out of the box. Alastair's theory is that he must have done a step manually; he will go back and look for it.
- Hardware Component
- Thanks to Stephen it now has support for creating
modprobe.d
files using new resources.
- Virtual DICE Documentation
- Alastair has checked it and it seems OK.
- Cron Component on Roaming Machines
- It doesn't get run often enough to do daily or more frequent cron jobs in a satisfactory way. Current roaming DIY DICE machines are using the cron component's
manual
method to get round this. If DIY DICE on roaming machines takes off we'll take another look at this. One possibility is to use anacron
. It would also be good to automatically nag a user who hasn't updated in a while.
The rewritten
lcfg-cron
component has been shipped as part of the stable release.
Alastair has written up some notes on creating VMWare hosts, he just needs to go through and check the instructions are correct.
Alastair still needs to document roaming support, he will do that this week.
Operational
- Split
- Following the move from KB, split needs to be reinstalled on the AT wire in FH. It's in the rack but probably needs wiring up. Chris will do it.
- Moving disk blob on central
- Only centaur is still to move. Unfortunately it's a Beowulf head node so it has to wait for Ian to arrange some maintenance time for the Beowulf cluster. Alastair will talk to Ian.
- KB Servers
- The old servers have been decommissioned. hvar has been moved to room 4.09 in the Forum. split will come to the Forum then needs to be moved to the FH machine room and attached to the AT wire. We need to do another rsync of the lcfg.org data on the morning of the move so that dresden can be easily relocated. As dresden has multiple interfaces it will need some manual effort to bring it back up in the Forum, Stephen will deal with this when it arrives. We need to inform external LCFG users that lcfg.org will be unavailable for a day.
- Dual path/controller
- There is not currently enough spare kit to do the testing. The plan is to set up the latest storage array so that testing can be done later without causing trouble for live services. We might get another storage array anyway in which case we could use that for testing.
- iFriend access for bugs.lcfg.org
- We now have iFriend access on bugs.lcfg.org, it is done with a slightly hacky mod_perl module.
- updaterpms problems
- An update to matlab (which is a 600MB package) caused
updaterpms
to segfault on some machines. It also resulted in the updaterpms component thinking that a reboot was required. There were really 3 separate issues involved here.
- The cache size limit for an individual file on the rpmcache servers was too low, it was set to 400MB, it has now been raised to 1GB. The matlab package is currently the largest in the repository so this should be sufficient for quite a while to come.
- The LCFG updaterpms component did not check that the exit code was a valid sum of the possible code values. The segfault appears to result in updaterpms exiting with the value of 134 (we're not sure why) but it is clear why this results in a reboot request as the reboot code is 4. The component will be modified to check that the exit code is no greater than 31 (the highest expected single code is 16).
- Log rotation on the RPM master
- As part of the investigation into updaterpms problem it was noticed that the apache logs on the RPM master were not being rotated. This was due to them being missed when the server was converted from the apache to apacheconf components. They have been converted to the relevant apacheconf resources. This highlighted a bug in the apacheconf logrotate template which has now been fixed.
- lcfg-mysql bug
- Whilst working on bugs.lcfg.org a problem was found with the LCFG mysql component and how it tried to set the initial password for the database root user. This is being tracked in the LCFG bug tracker
- qlogic driver
- This has been switched back to the stock driver for SL5
- New kernel
- This has been successfully tested with Fibre Channel and VMWare
- dresden data
- This has been copied to a satablade volume in the Forum server room, it is currently attached to telford. We will need to do another rsync on the morning of the move.
- Repository maintenance scripts
- These have been packaged as
pkgrepo-scripts
and the package is installed on telford and figgy They still need some code changes so that they use a config file rather than having the list of platforms hard-coded.
- Split
lcfg-utils
- The split of
lcfg-utils
has been completed and the new packages will go into the stable release this week.
- PXE NFS root
- This has been opened up to all Informatics machines.
- bakerloo in service
- The second VMWare server has been put into live service. The CPU load seems to be fairly light so Alastair is thinking of buying more memory to double what is available which will allow more VMs per physical host. It might also be good to configure the system so that memory is not shared between virtual machines.
This Week
Alastair will:
- Unstall rpmsubmit project
- rsync dresden data
- Move _central_'s disk blob from satabeast to new EVO array (just centaur to do)
- Check the virtual DICE documentation
- Finish the roaming documentation
-
updaterpms
fixes
Chris will:
- Test/Fix the
sleep
component
- RT Duty
Stephen will:
- Review the packaging guidelines
- Look at the hardware component
- Finish the buildtools project
- Help with the move from KB
--
ChrisCooke - 25 Mar 2009
Topic revision: r2 - 25 Mar 2009 - 13:51:23 -
ChrisCooke