MPU Meeting Tuesday 25th May 2010

LCFG Server Refactoring

  • The macro code has been much improved. It was generating bad Perl but it's now been made consistent and safe. The mutation code still needs to be rewritten to be more like the validation code which is good.
  • Much of the new stuff for the June release is still in gerrit. Final clean-up and finishing off work and validation still needs to be done. Simon has seen most of the work and is happy with it.
  • A lot of $_ use has been cleaned up - use of $_ isn't recommended as changes to code can have unclear and disastrous consequences later.
  • Buildtools really needs git support - this would make it far easier to build the new server code on F12+.
  • It will of course be necessary (and not hard) to make the server work on F12 or above; however there will then be more work involved in making it take advantage of new facilities offered by the newer Perl there.

Server Hardware

Chris is working on a passive nagios check of RAID status. By great good luck Alastair (and Stephen and Simon) have done much of the hard work already: Chris reckons that he more or less just needs to copy lcfg-multipath, whip out the multipath status check and replace it with some MegaCli information gathering. The plan is to check the status of each physical and virtual disk, to associate each possible status with a nagios status (OK, warn or critical) then contact nagios accordingly. Alastair and Stephen both suggest making it a more general hardware monitor facility to which more checks can be bolted on later.

Installroot

It's working for f12_64. Some warts still need to be fixed. We plan to use the new installroot for SL5.5 (since we'll have to build a new one for 5.5 anyway). Alastair has documented his work. Stephen will kill two birds with one stone by using Alastair's new docs to make an SL5.5 installroot. A prerequisite for this is Alastair rebuilding the installroot SRPMs for SL5.

F12

  • The Kerberos host key creation problem has been sorted out with Toby's help. It was failing because the hostname was being set to the short name at install time rather than to the fully qualified name. Once done at install time this can't be changed afterwards. Once fixed in the install process, the kerberos component does the right thing.
  • The F12 package lists were sorted out.
  • We need to go through and sort out the "extra" MPU components and set them up for DICE level.
  • Stephen went through and sorted out most of the i386 packages we need for 64 bit.
  • We need to get versions in sync across platforms - they're drifting a bit.
  • MPU and RAT met and sorted out the ed and dice package lists for F12+. The standard package lists noew get updates applied. The RAT ones don't.
  • Stephen will give RAT some code to generate up to date lists of package dependencies without wildcards - which RAT can then run every week to get up to date but wildcard-free dependency lists.
  • We need to define rules to generate the package lists. We could possibly use yum groups more than we do at present: for instance the desktop list could simply consist of the entire membership of various yum groups (for example for KDE).
  • Chris's Fedora 12 blog entries
  • Stephen's Fedora 12 blog entries
  • LCFG bug tracker top-level bug to simplify tracking the F12 project progress.

F13

Alastair has started work on the F13 LCFG port.

  • Client is done.
  • Updaterpms is done.
  • Alastair is working on the boot component. How to hook it in to Upstart? Upstart underwent a major release change from F12 to F13, and documentation is scant. Alastair is experimenting.
  • Most other things should hopefully be simple, and easily adaptable from the F12 work.
  • Alastair made the F13 repositories. When he made them he encountered difficulty which seems to stem from the updated Perl AFS module.

Miscellaneous Development

Stephen's lcfg-network patch
Alastair has incorporated, tested and deployed this.
Criticality resource
Stephen has added a resource with which to define machine criticality. It's called sysinfo.criticality and it can be set to low, medium or high.
New SL5 kernels
The new SL5 kernel has gone to stable, along with openafs 1.4.12. This openafs update is important and we should reboot machines for it sooner rather than later. In addition a newer kernel, the one from SL5.5, is now on develop machines and will be rolled out with SL5.5.

Operational

telford disk space
Alastair and Stephen sorted this out but now we need space for the F13 mirror too, so it needs sorting again smile
bpbeast
It's now in service, providing VM disk space for central. However bpbeast has no dual path fibre channel; Alastair will add this. We should also move metropolitan_'s VM disk space to _bpbeast.
Package volume release
Alastair checked with Craig: the AFS package volumes are not released in the grand AFS nightly release. (Which is good.)
PXE move
Stephen has moved PXE and NFS from tummy to schiff.
Dormant servers
Stephen turned off the test machines blurt, bottle and budapest since we weren't using them.
Power buttons
Stephen investigated the power button handling for self-managed machines and George incorporated his findings on the self-managed server room page.
AMD
Stephen turned off AMD on all MPU servers except the build hosts, with no apparent ill effects.
Goodbye FC5 and FC6
Stephen deleted all remaining FC5 and FC6 LCFG support, giving the largest ever release diff.
SL5.5
SL5.5 is here. Include dice/options/test-updates.h to test it out.
CPP
The package slave headers for apacheconf were wrong. It's not clear exactly when/where a CPP macro will work. Stephen fixed this case by introducing an intermediate file component variable and referring to that instead rather than directly to a CPP macro.
VM server storage tender
Alastair has gone to tender for storage for the Forum-based VM server hosts. By about the end of June we should have much more and faster storage.
New Servers
We need to discuss our server requirements before putting in an order for new servers. Procurement Scotland HP 2U servers can be bought without a tender; we'll evaluate one and consider how well it would do us for various jobs. Stephen commented that one possible use would be a proper not-a-service server, which we could really do with, to get the current not-a-service machine off his desk.

This Week

Alastair will:

  • work on F13
  • DONE create a develop bucket for SL5 and SL5_64
  • think about new server requirements
  • dual path fibre for bpbeast
  • sort out SRPM web access (new scheme)
  • DONE update installroot related RPMs for SL5

Chris will:

  • work on RAID status monitoring
  • tidy up DICE level MPU stuff on F12
  • move Forumtracker from prague to a VM guest.

Stephen will:

  • Reboot MPU machines
  • Sort out disk space on telford
  • Move tummy to the junk room
  • Look at MPU DICE level components

-- ChrisCooke - 25 May 2010

Topic revision: r5 - 04 Jun 2010 - 13:13:34 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies