MPU Meeting Tuesday 20th November 2009

Power Management Project

Chris wrote a newsletter article. He'll now write a final report and put the project forward for closure at the December development meeting.


Alastair has successfully tested apache reload on brendel and he's added yum support on new repositories.

He'll get refreshpkgs ready to run on another AFS server in case brendel dies. He'll discuss this with George and Simon.

We need to change to the new repository structure. This has to be coordinated carefully with Kenny; Alastair will speak to him about it. The general approach would be to override at both the DICE and IS levels then change the LCFG level.

AFS Component

Simon has reviewed the code and found bits and pieces to tighten up. In particular the nagios monitoring script was wrong and needs rewriting. It didn't help that the documentation he was working to was out of date - Simon has promised to improve the OpenAFS docs!

Running the AFS component on the KDCs has been problematic because of George's iptables rules: some of the AFS components can't talk to each other. The problem would disappear if all were on KDCs but at least one is on a test machine which isn't on the right wire to be trusted by the KDC rules. George has now turned off some firewalling on the KDC. Stephen is going to document this.


Chris has deployed the tibs component on alexandria. So far so good. It still doesn's manage to "stoptibs" but it does start it OK and configure it correctly.

Stephen suggests that Chris update lcfg-ngeneric to 1.2.34-1 on alexandria to get its bug fixes for Perl components. Chris is going to schedule some TiBS downtime and test stopping tibs with the fixed ngeneric.

The TiBS server crashed. We suspect the QLogic driver. Chris will ask Craig to send MPU the crash report.

LCFG Server Refactoring

Nothing much has happened.

Pandemic Planning

Stephen still needs to write up some notes on the PXE service.

Miscellaneous Development

repackage scli
This will not now be done - it's too much trouble for the limited benefit it would bring us.

Monitoring the LCFG level
A virtual machine (ashkenazy) is on the inf level, configured as a server and LCFG server and tracking the stable release. Several actions were agreed:
  • Alastair will make it follow the testing release instead.
  • Alastair will let Chris know how best to test the LCFG level using this machine. (I already have: check that its profile compiles; check that updaterpms runs on it; check the status of the LCFG compilations it itself does - Chris.)
  • Alastair will check the functioning of the LCFG server web status pages on ashkenazy.
  • Chris will add test descriptions to the release procedures.

New version of updaterpms
Alastair reckons that it will flag as a success the successful removal of a package with broken scripts. Stephen will roll out this version. Stephen has already added it to the develop release.

gbios & 915resolution
Stephen thinks that the best fix for the gbios component would be a separate component to run 915resolution appropriately. However if this is only ever going to be used on SL5 it'll not be worth doing as we hope to upgrade to SL6 in the summer. Alastair will try Fedora 12 to check that this won't be needed there.

Chris upgraded bugzilla.inf to the latest security release of the 3.0 series.

Stephen has changed it to do a reload rather than a stop and start.

Stephen fixed a bug in the cron component. The crontab command fails to delete the crontabs of user accounts which no longer exist. The component now copes with this failure elegantly and deletes such crontabs itself.


Chris has written a sleep article.

Alastair has written a DIY DICE article.


Phil Wadler's machine
Stephen has set it up.

A third VM server
The first two VM servers are full but Alastair isn't confident that our existing disk setup would perform adequately with a third one added. We'd need faster disks. Alastair will discuss this with Craig.

New servers
We have three new servers, two of which are to be sited at KB. The KB wiring is currently not ideal - understandably, as it had to be done in a hurry. Once Ian has persuaded people to tidy it up Chris will see about installing our two new servers there. Tim wants to be involved in this install for pandemic training. For info, the SAN at KB is the old style type so there'll be no multipath. The infrastructure to support ethernet bonding is also not yet complete. However neither of these setbacks should delay the deployment of the two new servers.

Operational figures
In the last three quarters we've spent 30% of our time on operational work. This is too much. For next time we'll all consider possible time allocation buckets which might expose just what's swallowing all that time.

New Operational Arrangements
We've agreed a new way of allocating the operational work: Stephen will tackle it on Mondays and Tuesdays, Chris on Thursdays and Fridays, and on Wednesdays we'll only tackle emergency operational work. We'll still cover for each other when we're off, obviously.

OS Updates
Chris wants a refresher course from Stephen on doing the OS updates. Stephen has agreed but adds that he doesn't do the updates every week anyway - he does them either when something crucial comes along or when a lot of updates have mounted up.

Stephen will look at the SELinux problem.

This needs a patch applying to it to make it do something clever involving templates. Stephen turns out to have been an RT hacker in a past life so he'll tackle this. (Most of RAT is about to go on paternity leave.) Tim and Graham will know what needs doing.

Project Allocation

CEG agreed these project allocations for the next few months for MPU:

boot component rewrite
Alastair - should be a small project
Linux install redevelop
Alastair - there's not a pressing need for this for SL6 but it does need to be done
RHEL6-based LCFG port
Chris - if RHEL6 is in beta by 1 Feb it'll be that, otherwise it'll be whichever Fedora is then rumoured to be the closest to RHEL6.
Fedora 11/12 LCFG port
Chris and Iain - with Iain concentrating on the DICE level bits. 8 weeks.
LCFG Core Refactor
Stephen and Simon. 6 weeks, 5 for Stephen.
AFS component
Stephen - 1 week
TiBS component
Chris - 1 week
Server hardware interaction
Chris - concentrating first on Nagios monitoring of RAID disk state; 2 weeks.
This leaves Stephen a bit short of things to do; Alastair will revisit the list and sort out 3-4 more weeks' project work for Stephen.

This Week

Alastair will:

  • consider new operational buckets
  • Look at Fedora 12
  • Discuss the repository change with Kenny
  • Recalculate Stephen's project hours/work.

Chris will:

  • go on a perl course
  • work on the TiBS project
  • go over the OS updates process
  • install servers at KB, wiring permitting
  • open / flesh out the server hardware project
  • consider new operational buckets

Stephen will:

  • TiBS code review
  • push out the new updaterpms
  • om changes
  • patch ISS RT
  • work on the AFS project
  • consider new operational buckets

-- ChrisCooke - 24 Nov 2009

Topic revision: r1 - 24 Nov 2009 - 17:22:04 - ChrisCooke
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies