MPU Meeting Tuesday 24th March 2009

Buildtools Project

Stephen has been writing up the closing project report.

The report suggests that Chris and Stephen move from taking weekly turns to cover RT to dividing the week between them, switching on Wednesdays, so each gets 2½ days in which they can concentrate on their project. Each handles RT during their half of the week, dealing with what they can and leaving what needs to go to the other. We thought this sounded like a great idea.

Power Management Project

Chris has been testing for, discovering and fixing bugs. In particular:

  • With the i810 driver under 5.2 the GDM login screen crashes just as it does with a virtual console switch. However SL5.3 solves the problem.
  • Now the GDM login screen is usable after suspend/resume, it has become possible to discover a resume problem: the pm-utils code pops up a warning about errors on suspend or resume when a user logs in after resume. Stephen suggests checking the exit status of each of the pm-utils scripts. Chris will investigate.
  • The code deals with a machine not having anyone logged in, and with someone logged in and running interactive shells. But what if someone is busy with something other than interactive shells, something which doesn't raise the load average? The machine may be told to go to sleep while the machine isn't really idle. Alastair and Stephen remembered James Jarvis facing this situation and coming up with something clever, so Chris will ask James. The Condor kernel hack may also help.

RPM Submission Project

Alastair has started work on this again and has been familiarising himself with the Perl AFS module and exploring possibilities. He still needs to change the milestones on the project page.

AFS Project

Nothing much yet.

TiBS Project

Nothing much yet.

Miscellaneous Development

Packaging Guidelines
Stephen has reviewed them and thinks they're fine, except that the RPM numbering policy could be more explicit, especially for when we make local adaptations to RPMs from elsewhere. Chris will make this change then publicise the changes and invite comments from COs.

DIY DICE under static VMware
Carol has tested this out and it works. She found a flaw in the instructions caused by the VMware configuration interface having been changed in a minor point release: Alastair will change the instructions.

DIY DICE under roaming VMware
Carol has tried this out and it didn't work out of the box. Alastair's theory is that he must have done a step manually; he will go back and look for it.

Hardware Component
Thanks to Stephen it now has support for creating modprobe.d files using new resources.

Virtual DICE Documentation
Alastair has checked it and it seems OK.

Cron Component on Roaming Machines
It doesn't get run often enough to do daily or more frequent cron jobs in a satisfactory way. Current roaming DIY DICE machines are using the cron component's manual method to get round this. If DIY DICE on roaming machines takes off we'll take another look at this. One possibility is to use anacron. It would also be good to automatically nag a user who hasn't updated in a while.

updaterpms
Alastair fixed the updaterpms bug.

5.3
There's been a change in RPM, something to do with signatures, and it's causing trouble. There's a problem with mock: something in the first hundred or so packages to install launches a process which doesn't exit cleanly, so mock kills it, and this kills routing! The problem is dodging Stephen's efforts to hunt it down.

Virtual machine hosting
  • district is another new virtual machine host server. It'll be the hot spare. We'll also use it for testing.
  • Stuff we need to do: we need to get suspends of virtual machines working. We also want to be able to configure with LCFG the order in which virtual machines are powered down and up.
  • We're going to buy a fourth virtual machine host server. This one will be based at KB and will host virtual servers providing the skeleton off-site service which will take over in the event of a Forum disaster.

Operational

Split
Following the move from KB, split needs to be reinstalled on the AT wire in FH. It's in the rack but probably needs wiring up. Chris will do it.

Moving disk blob on central
Only centaur is still to move. Unfortunately it's a Beowulf head node so it has to wait for Ian to arrange some maintenance time for the Beowulf cluster. Alastair will talk to Ian.

autoreboot
Graham found that the autoreboot component wasn't sending mail as advertised. Stephen has fixed the problem. RAT is planning to use it on the RAT servers, running it in "nag and mail" mode instead of rebooting mode. We think this a splendid idea and may copy it.

cron
There were a few problems with the new version of the LCFG cron component. These were related to the manual method used on laptops and also the cron allow/deny files. The problems have been fixed and a new version shipped out to all stable machines. See LCFG bug#123 and LCFG bug#124 for full details.

ARCH and Virtual Box
Tim was having problems building Virtual Box. The problem was tracked down to a rogue old ARCH variable which should really be eliminated.

Prague
prague was stuffed with RPM processes and updaterpms hadn't run since February sometime. Stephen fixed it and Chris rebooted it. Neither Nagios nor RT had given any indication of a problem. Suitable nagios passive checks would have shown up a problem far earlier; we should really get remctl working properly. Stephen thinks this should be just a few days work. We'll add it to the wee projects list.

This Week

Alastair will:

  • Unstall rpmsubmit project
  • Move the central disk blob from satabeast to new EVO array (just centaur to do)
  • Fix the roaming documentation

Chris will:

  • Test/Fix the sleep component
  • Reinstall split DONE
  • Start thinking about TiBS

Stephen will:

  • Finish the buildtools report DONE
  • Look at the sleep code DONE
  • Test SL5.3
  • Start thinking about the AFS project - NewAfsComponent

-- ChrisCooke - 25 Mar 2009

Topic revision: r6 - 30 Mar 2009 - 09:43:43 - StephenQuinney
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies