MPU Meeting Tuesday 25th January 2011

Software Build Farm

The LCFG pkgforge component can now configure everything needed by the daemons as well as the client, it still needs support adding for the website configuration.

init scripts have been added which take care of starting and stopping the daemons and also managing k5start. This took a lot of effort to get right, mainly due to a lack of documentation (on the web as well as locally). It turns out that when a daemon backgrounds itself and completely detaches the k5start must be run as a separate daemon but in the same PAG. This is done by running the init-script through /usr/bin/pagsh rather than /bin/bash

Support has been added for building on SL5 from source packages which were generated on F13. This doesn't normally work due to the differences in the version of rpmlib used. The "fix" is a bit of a hack involving installing the source using rpm with the --nomd5 option and then using rpmbuild to generate a new SRPM which can be passed into mock.

There are a few outstanding issues before the service can be announced to COs for testing. Particularly the website needs some improvements such as navigation menus. The build daemons need some work to add support for sending reports via email. The generic daemon infrastructure needs some support for log rotation. Currently they remain attached to the old log file.

We need to think about what hardware will be required for the service. The master is now in a VM, ardbeg, which is handling the database, website and incoming queue processor. Ideally, we need at least one machine per platform (e.g. SL5 and F13) so when SL6 comes along we will want 3 machines. This should give us reasonable performance (CPU/memory/disk IO) without costing too much. For the testing process we will look into using some spare desktop machines (probably Dell 745).

We also need to find a permanent home for the AFS directory where the jobs and results are stored. Currently this is in the MPU group space which is probably not ideal. It will be cleaned out regularly so it is not going to become huge but even so we will need a reasonable amount of holding space in case of big jobs (such as porting to SL6).

Hopefully the beta-release will be announced to COs before the end of the week.

Replacement for VMware Server

Alastair has sent the virtualisation report out to COs and has had some further comments back. Once he has made any necessary changes based on those responses he plans to also send it to IS for comments.

The poor support for KVM in SL5 means we cannot really start on producing a KVM-based local service until SL6 is released. In the meantime he will do some testing using an F13 machine.

Miscellaneous Development

This is a new component for configuring the pkgsubmit command. A while ago support was added to pkgsubmit for using alternative configuration files so this component can configure multiple platforms. This is normally only needed for the new PkgForge service.

This LCFG component has been updated to add a global repository list. Previously each chroot had a separate list of repositories which meant that, although we have a fixed list of repositories available, they had to be specififed separately for each mock chroot. It is still possible to override and extend the repository list for each chroot. These changes make it much simpler to use the mock component. This was done as part of the PkgForge project.

Kenny has submitted a patch for the LCFG installroot to add kernel command line option parsing. Alastair will look at applying the patch.


nagios monitoring of DR packages server
Alastair checked and the nagios monitoring of the disaster-recovery packages server, sauce, was already in place. There is also a VM using it as the package server to ensure it all works.

VMs on bakerloo
There is still one VM, epping, using the server bakerloo. This was because of insufficient space on metropolitan, that has now been fixed so we should move this VM as soon as possible to free-up bakerloo for development work.

HP model and "toohot"
Chris has added support for the HP server model to the "toohot" check script.

Multipath FC
Alastair has added some notes to the Fibre Attach documentation on the LCFG wiki. This explains how to get multipath to detect new partitions without needing to reboot the machine.

VMWare and network devices
Alastair has found some notes on how to add new ethernet devices and get VMWare to use them without being restarted. They are linked from the MPU Internal Procedures page. It was noted that we need to add the OSPF support to central at some point.

MPU kit requirements
What new servers are MPU going to need this year? Stephen suggested replacements for the LCFG slaves and also noted that the PkgForge service will need some hardware. There doesn't seem to be much else we need this year.

VM server utilisation
Alastair asked if we could cope with just one backup VM server at KB? He suggested that to spread the load more evenly we could move the various test VMs and lightly used servers, such as DIYDICE, to one of the VM servers in KB.

Pandemic documents
We need to review the MPU pandemic documentation before 9th March. Stephen will circulate a list.

Component schema versions
We need to document the procedures for using LCFG component schema versions which were agreed at the Operational Meeting held on 12th January 2011.

svn branching for components
Stephen will document how to do a branch in the LCFG svn for a component project directory and how to use the build tools to set appropriate versions.

This Week

  • Alastair
    • All to look at kit requirements
    • Moose learning + PkgForge review
    • Submit a package to PkgForge

  • Chris
    • All to look at kit requirements
    • Chase RAT re epping DONE
    • Submit a package to PkgForge
    • practical submit project

  • Gordon
    • Modifying boot component for cron related bug
    • Submit a package to PkgForge

  • Stephen
    • All to look at kit requirements
    • Produce list of MPU pandemic documentation for allocation at next meeting *

