MPU Meeting Tuesday 14th July 2009

Power Management Project

This is waiting for an SL5.3 lab to be available.


Simon's cache manager code gives a 50% performance increase. There's a new RPM server master called brendel. The new rpmsubmit is now almost ready to ship.

AFS Component

There's now a fileserver called mermaid. Simon is going to use it for his performance testing.


Chris has packaged TiBS into an RPM and has developed the lcfg-tibs component to the point that it reproduces some of the main TiBS configuration files from LCFG resources when the component gets a Configure call. He blogged about this progress on Monday and Tuesday.

Chris's tibs RPM installs the software into /opt, the idea being that the component will then install it into its final resting place. Alastair and Stephen suggested ways of reformulating the package to make it "own" the tibs files in their final destination instead.

LCFG Server Refactoring

Stephen has been working on the test suite.

Miscellaneous Development

Multipath Nagios Module
Alastair has developed a multipath nagios module. It's now in use on the MPU's virtual server hosts. He'll publicise it.

Project Prioritisation
From now on we'll be feeding our priorities into CEG who will manage the prioritisation with CSG. Our next prioritisation will be in August.

Alastair's virtualisation mini-talk will happen tomorrow, 15 July.

om improvements
the resources and code are now all in place. Currently everything is functionally as it was before but things are done in a better way. For the stable release we'll halt at this stage for a few weeks to let things bed in and let problems show up. For the develop release Stephen will change the resource to bring the AFS module into play, and let that run for a few weeks to expose possible problems. The AFS module does a setpag to divorce the om session from the calling user's environment.

This machine has been deployed for something else so can no longer be used by Alastair for temporary testing. However we've ordered another EVO array so the testing can be done using that instead. We're hoping that we'll be able to keep the EVO array(s).


The inf level
Carol encountered problems with her work on inf/lcfg level virtual machines because the inf level was using too much local infrastructure from the DICE level. Alastair has pared the inf level back somewhat.

Alastair has upgraded its memory.

Server moves
Stephen and Carol have moved some servers from the fibrechannel racks to make space there. Three MPU servers are still to be moved. It's Chris's turn to help.

HP 7900
Alastair has tested these. For DICE we'll have to buy them with ATI graphics cards as the onboard graphics uses DisplayPort, which has only just made it into Linux's development kernel, never mind any RedHat kernels. This rules out buying the smallest 7900 form factor for DICE, although we're buying these for admin MDP use. Chris will now need to test a 7900 for power management support.

PXE Upgrade
Stephen will update PXE later this week so that it supports newer models.

Client / NFS Lock
Stephen will move nfslock to later in the boot process so that it won't grab the client component's ports.

Routing problem
Alastair will talk to George about the routing problem (whereby the routing component throws away the DHCP-derived static route and starts a route discovery daemon in the background, letting other components start in the meantime, often before a satisfactory route has been found - so that for instance during installs updaterpms is often delayed for several minutes or on virtual machines can even fail entirely when it can't find the RPM repository).

Moose update and subsequent problems
Stephen updated Moose to help Toby and Prometheus. The more recent Moose is somewhat more strict and rather noisily flagged some hitherto unnoticed bugs in LCFG buildtools. Stephen has fixed the problems.

Stephen has stripped out fc6 support from the LCFG level. For the moment fc5 is still there: we still have three fc5 machines.

SL5.3 goes into stable this week.

Cosign v3
Stephen has tweaked the LCFG slaves and websvn appropriately. Dresden's setup is somewhat more complex. It would be simplified considerably if the DIY DICE functions were moved to a separate (virtual?) server. We might ultimately also split the LCFG export functionality onto multiple virtual machines, the minor extra hardware and running costs being outweighed by improved management simplicity.

Kernel rebuild request
RT 42837 asks for an increase in the DICE kernel's MAX_ARG_PAGES. Stephen is going to do this as part of the next kernel rebuild.

DIY DICE and VirtualBox
We need a VirtualBox version of the DIY DICE install instructions. Alastair will coordinate this with Tim.

Read access to DICE headers
Corin would like this to help him with DIY DICE. Stephen will adjust the webdav and websvn access for him.

Cron manual method problem
The cron component has a problem with the "manual" method. Stephen will investigate.

Next Meeting

The next meeting will be on Tuesday 28 July.

This Week

Alastair will:

  • Review time management categories
  • Package scli
  • Talk to Carol about the LCFG level work.
  • Work on rpmsubmit project.
  • Propose sign-off of virtual server project.
  • Talk to George about routing problem.
  • Publicise the multipath nagios module.
  • Talk to Tim about a VirtualBox version of DIY DICE install instructions.

Carol will:

  • Set up LCFG/inf level VM to monitor LCFG level.
  • Move servers (with Chris). DONE

Chris will:

Stephen will:

  • Review time management categories
  • cron manual method DONE
  • move nfslock DONE
  • DICE headers read access DONE
  • PXE update
  • AFS test server
  • LCFG refactoring

-- ChrisCooke - 14 July 2009

Topic revision: r5 - 23 Jul 2009 - 10:48:12 - ChrisCooke
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies