MPU Meeting Wednesday 2nd May 2012

LCFG Server Refactoring

There haven't been any further comments on the report.

Simple KVM Service

We now have piccadilly running with the stable release. We should move the more important VMs to it from northern. The obvious ones are porto and ikiw. We will move porto and we will suggest to Services that ikiw be moved.

Alastair has been testing a detecttimeslew script for a couple of weeks now on some VMs. It hit a glitch on 1 May due to a slight problem with the date format but that's been fixed. The idea is that with this script we should be able to pause guests before a move rather than having to shut them down completely as at present.

Alastair will develop a procedure for cold-migrating KVM guests from one host to another.

Alastair will also arrange SAN access for northern.

Alastair won't submit the project for sign off until there's a fully supported KVM server up and running in the Forum.

Server Upgrades

The packages export service has moved from cochin to porto. Chris will delete cochin.

Stephen has been putting more work into the OpenAFS build server. It now builds packages and signs them with no problems.

Stephen has now completely moved PackageForge master to pinemarten, an SL6 VM. He will delete the old VM ardbeg.

Now that the main LCFG master is on SL6 we should upgrade the LCFG DR server sauce to SL6 too. Chris will do this. It's important to remember to preserve the current contents of the sauce disk space as it would take a long time to regenerate it.

Server Hardware

Chris's blob post attracted a helpful reply from Dell. It seems that the official way to handle Linux firmware and bios upgrades is to do them from a controlling Windows machine. We'll give this a go but are unenthusiastic as we don't use Windows in our sysadmin work.

Chris hit problems getting firmware-tools installed on test servers because of RPM conflicts with existing packages: the firmware-tools RPMs seem to require other packages by RPM name rather than by capability name, which produces unnecessary conflict.

He'll try for a more lightweight way to get the numbers of both the currently installed bios and firmware versions and the most recent versions.

Miscellaneous Development

Authen-Krb5-Admin on CPAN
We have been using local patches of this perl module for a while, but Stephen has now become its official maintainer on CPAN. He has now reworked our patches for upstream to make them support SL5 as well as SL6. Toby has tested them with Prometheus. Stephen will upload the new version to CPAN.
LCFG Build Tools on Mac OS X
Kenny and Stephen have done some work on this. The lcfg-reltool commands osxpkg and devosxpkg now do something useful: they use PackageBuild to build an OS X package. Previously the commands did nothing and Mac LCFG packages were built with CPack. To support the use of PackageBuild, lcfg.yml has a new attribute called orgident. The pkgident attribute adds this to the package name. Stephen has also updated the LCFG Build Tools modules on CPAN as installation from CPAN is the easiest way to get them on to Macs.
Stephen has created this. It works in the same way as the _options list. The same package name can be used in both lists now in order to sort dependencies from mixed repositories.


Speed of LCFG Slaves
We talked about what happened when the online exam infrastructure was tested. This was held up by LCFG slave delays: a mistaken SVN commit shortly after 8am, and its correction committed shortly afterwards, didn't finish working their way through the slaves until half past eleven that morning. The test slaves on virtual machines managed the rebuilds surprisingly more quickly (~25 minutes as opposed to several hours). Subsequent CO debate turned up a useful observation: we shouldn't need to use RAID on the LCFG slaves as they don't hold master data, and it seems to be the RAID hardware that slows down the I/O on the existing slaves. After talking about it at MPU we're open to the idea of running slaves on VMs, perhaps one on each VM server. However this will not be feasible until the new and faster VM servers arrive in (hopefully) a couple of weeks. In the meantime Chris will set up a test LCFG slave VM on bakerloo to find out how much slower at making profiles that will be than circlevm12. Stephen also suggested a way of short-circuiting the profile rebuild process, strictly for use in emergencies, which could have neatly avoided most of Monday's problem.
CEG has decided that the User Support Unit should take it over.
SL 6.2 Server Upgrade
We need to reboot our SL 6.1 servers to upgrade them to SL 6.2. Chris will find out which servers need doing and circulate a list. A check for RPM conflicts can be done before committing the machine to an irreversible upgrade by first commenting out DICE_STICK_WITH_SL61 then doing a test run of updaterpms.
Bugzilla cookie warning
Chris will add one to the LCFG bugzilla.
Blob cookie warning
Alastair will take a look.
Not a nelson
We need to review the entire contents of nelson with respect to maybe supporting, moving to SL6, or discontinuing the not-services as appropriate.
KVM blob article
There's scope for a Systems Blog article on the KVM service.

This Week

  • Alastair
    • Develop procedure for cold-migrating KVM guests from host to host - try diydice (to piccadilly SAN volume)
    • SAN access for northern - requires a visit to KB to reattach to main SAN - now on fibre and vgs created - need reboot to finish off - can do once suspend on stable KVMs (16th May)
    • finish off perl-dbix.h header
    • apply pressure on services to remove Nexsan kit from IF/AT SAN
    • Start inventory project
    • Package up Neil's whererpms
    • Review outstanding LCFG bugs
    • network component in perl
    • Look at wrt cookie usage

  • Chris
    • power off cochin
    • upgrade sauce to SL6
    • server hardware project
    • Add guest to bakerloo to act as LCFG slave to compare performance with circlevm12 lambethnorth
    • Review outstanding LCFG bugs
    • Coordinate SL6.2 upgrades
    • Cookie warning on

  • Stephen
    • power off ardbeg
    • Process other units' responses about their perl-AFS module usage (which functions etc)
    • Finish Theon work
    • Cookie warning on
    • LISA paper
    • Take Security project to May dev meeting for start

-- AlastairScobie - 02 May 2012

