MPU Meeting Thursday 4th February 2010
AFS Component
Hopefully the AFS DB server in KB will be moved soon. Craig has confirmed that the AFS fileservers are ready to be switched to the new component.
LCFG Server Refactoring
All the code has been passed through perltidy to make the layout consisten. The major causes of warnings (mainly uses of undefined values) have been fixed. Almost completed code improvements to reach perl-critic level 4. The release level name handling was rewritten. The next steps are to finish the move to building with Module::Build. It has become apparent that the command-line option handling is pretty bad and we need to switch to
Getopt::Long
to make it more readable. We need to properly integrate the new test system and also add code-level tests. Beyond this we need to investigate possibilities for object serialisation/storage, one option is KiokuDB.
Server Hardware
Chris has written a perl script to look at the ambient temperature sensor using IPMI and shutdown the server at the upper non-critical threshold. Stephen suggested running this from cron rather than having the script running permanently and sleeping between temperature checks. Not all servers (including some new ones) have the ambient sensor, they do have a planar sensor though. Could we use that as a fallback option? Presumably it will give a different temperature, Chris will do his review of machine temperatures again including that sensor to see how much it varies. Chris noted that IPMI hangs on some old machines, we need to ensure we don't run the script on those machines.
Installroot
Alastair has checked that the prototype works under SL5. The only package update required is for upstart where we need the version from F12.
There are some problems with installing SL5 under VMWare related to not seeing dhcp responses when using a bridged network configuration. The current CD installer occasionally has similar issues. There don't seem to be any issues with installing F12.
Alastair wants to talk to IS to see if they have plans to redevelop PIE and if so whether we could share technologies.
F12
As a first step we need to setup a mirror of F12 and get mock configured. We should then be able to build most of the lcfg packages from the SL5 SRPMs.
Miscellaneous Development
We should all review the
small projects list before the next MPU meeting.
The changes to
om
were discussed, Alastair asked if there was any way to avoid the warnings it now generates during the install. Stephen noted that this only affects DICE machines which require the
Om::Environment::NewAFSPAG
Perl module. Normally if the
AFS
module is missing this should be considered a problem so a warning is useful when it is missing. Unfortunately there is no way to set the
om_defaults.environment
resource for the install context in such a way that the context information gets passed to the per-component
om_environment
resources. The only option would be to add an install-context override of the resource for each affected component.
Operational
- refreshpkgs backup server
- Once telford has become an AFS fileserver this will be used as the backup location for the =refreshpkgs script.
- LCFG component namespace
- Stephen has added a bug report about how it would be good to run the LCFG components with better process names.
- VMWare kernel problems
- There are big problems with the various VMWare products and the latest RHEL5 kernel which has altered an API such that the kernel module doesn't compile. On the guests this could probably be worked around by using the open-source management tools instead of VMwareTools. There is no obvious fix for the host servers though.
- SL5.4
- A few problems came to light after the develop machines were switched to SL5.4. The filesystem and glibc packages needed to be marked with the reboot flag. The update to perl caused a conflict as it attempted to obsolete a local version of perl-Storable, this could only be fixed by renaming the package to perl-Storable-Local. The new KVM packages which are now available for x86_64 were in the wrong package list, we need to decide on the best home for these, might be worth asking at the LCFG Deployers Meeting.
- SL5 minor releases
- Stephen mentioned the idea of supporting SL5 minor releases. There are a few possible options, we could parameterise the
dice/options/sl5.h
header (or more likely do it at the lcfg level), we could add OS headers for each minor release, or we could do a combination. This should be discussed at the LCFG Deployers Meeting.
- dice-orders
- The
dice-orders
package on tobermory has been broken since the switch to the new Informatics database. We cannot just remove the package as it provides the ordershost web interface. Alastair will take a look.
- dresden disk space
- The lcfg.org server, dresden, has been seriously lacking in disk space since the storage array crash. Stephen will ask Craig to sort out some new space.
- telford
- We are planning to add some new disk space to telford and use this as an AFS file server for the various RPM repository mirrors. Once we have the space we will add a mirror of F12.
- FH machine moves
- Stephen and Chris will organise the move of mousa and split from FH to AT.
- VMWare servers
- We need to move guests away from central and bakerloo blob1.
- Space on bpbeast
- Alastair has 28 disks on the bpbeast to do RAID configuration testing. Currently it is a bit confused but hopefully that will be resolved soon.
- updaterpms
- Alastair mentioned that it should be possible to disable the updaterpms run method. Stephen asked if the updaterpms component could be changed to not send mail on errors if the test flag was set.
Next Meeting
The next meeting will be held on Tuesday 16th February.
This Week
Alastair will:
- Review small projects list
- Think about MPU logging requirements
-
Finish repository restructure
- Talk to George about routing problems
-
VMWare server hosting
-
pkgwrite access for AFS pkgs tree
-
dice-orders
- RAID testing
Chris will:
- Review small projects list
- Think about MPU logging requirements
- FH move
- Temperature shutdown
- F12
Stephen will:
- Review small projects list
- Think about MPU logging requirements
- FH move
- Server refactoring
- dresden disk space
- telford disk space
- F12 mirror
--
StephenQuinney - 08 Feb 2010