MPU Meeting Tuesday 16th November 2010

LCFG Server Refactoring

Stalled

Software Build Farm

Stephen's been making header files.

For now budapest will be the master server (the one hosting the database and processing submitted packages) but this won't be the final location.

Also just for now, the AFS tree for the service is in MPU group space.

The master runs as the account pkgforge. Each build daemon will run as a pkgbuild account. This makes it possible to control which bits of the system have access to which bits of the database and AFS. Stephen has worked out some suitable ACLs.

Currently pgident is used to map IDs to database IDs for build daemons. However for real (especially admin) users Graham's pgluser software will be used. It integrates with roles & capabilities so will be easy and convenient to manage.

Server Virtualisation

Alastair has applied for a virtual server. The service is still experimental so it may take several days to come through. This makes us realise that once the service is properly up and running the turnaround time for the commissioning of new virtual servers will need to be perhaps 24 hours or less for the service to be usable.

Alastair will circulate his suggested virtualisation requirement list to COs and Chris will see if he can add to them.

Alastair has been playing with KVM and has got some support for it into LCFG fairly straightforwardly. The only noticeable problem so far has been that the virtmanager gui crashes.

There are issues with bridging support so we'll need Panos's patches to the network component.

Disks are presented as vd rather than sd. This puts us under extra pressure to reform the aged fstab LCFG infrastructure sooner rather than later. Grub also doesn't understand vd disks.

Alastair and Chris need to meet to apportion the work.

Miscellaneous Development

RHEL 6
Alastair has created a RHEL 6 host. It's called rhel6.
SL6 Port Ideas & Observations
  • SL6 is expected to be released in March 2011.
  • When we port to SL6 (or CentOS 6) we should this time be able to make much more use of external repositories such as EPEL, meaning that we'll need to do a lot less of our own package building.
  • For DICE, we should this time port only what we're asked to, rather than porting everything from the last release.
  • The choice between SL and CentOS may come down to the release date.
  • SL intends to release Alpha and Beta releases as it goes along so we'll have something to work with before the final release.
  • Alastair will raise the issue of choice of RHEL6 clone during his spot at the LCFG Users' Day next month.
Multiple templates for tcpwrappers
Stephen's change to tcpwrappers is working its way slowly through the system. We can now use different templates for hosts.deny and hosts.allow files. This was needed for fail2ban.
Bug in openafs component
Stephen has fixed a long-standing bug in the openafs component. At the end of the install process it would start openafs, then change the cell file to inf.ed.ac.uk, then reboot. However this reboot is unnecessary if it hasn't yet started the daemon. The surplus reboot has now been eliminated, so that's one less reboot for our DICE installs.
Kenny's installroot patch
Kenny has provided a patch for the installroot (LCFG bug 349). Stephen suggests the possibility of querying the release ID from the sysinfo resources in the installroot profile. In addition the installer images are really only changed once or twice a year, not frequently, so this perhaps isn't a big problem.

Operational

Dell memory replacement
Dell came through with the new memory very quickly. Alastair has replaced the memory in metropolitan. It now boots happily. Chris will do the same for northern and piccadilly.
R710 PXE bug
While working on metropolitan Alastair encountered a PXE bug: it's not possible to respond to the PXE prompt on R710s. Stephen says that it's possible to get around this by setting a default PXE response with a pxeclient resource.
KB Sauce
Chris has installed sauce at KB but enabling bonding broke its networking. It's up and running but with broken networking if anyone wants to take a look.
Panos's patch
Alastair will take a look at this. He needs to check whether or not the bridge support will clash with any other things such as bonding support.
Old SAN Space
The Services Unit has been told that it can reclaim our old SAN space.
Package Forge Architecture
Stephen has made a start on the Package Forge architecture description.
bakerloo to metropolitan
Chris has moved two buildhosts and lcfgtest from bakerloo to metropolitan. Two MPU machines still need to be moved.
ashkenazy
Its passive nagios monitoring should be removed. This should fix its broken package list.
bakerloo ping failures
It keeps failing its nagios ping test then succeeding a few seconds later. Chris remembers something similar happening with at least one RAT machine; he'll find out which one.
exam branches
We just wanted to note that when we make branches for lab machines, especially for exams, we base them on a stable release then alter them as we need to. We try never to use the develop release for user-facing machines when we can avoid it.
f13 without desktop header
Stephen has made a start on further tidying of the package lists to make f13 work without the desktop header.
refreshpkgs fixed
Stephen has fixed it: it now starts after AFS.
Bugs list
Gordon has been looking at outstanding LCFG bugs. Stephen and Alastair gave him some ideas on how to sort them into categories and check them.
LCFG bugzilla defaults
Chris will alter the default severity of LCFG bugs to something more sensible (less severe).

This Week

  • Alastair
    • Circulate virtualisation requirements to COs
    • Experiment with guest on IS virtualisation service (stalled on IS providing VM)
    • Meet with Chris to discuss virtualisation project
    • pkgsubmit patch
    • Panos's network component patch (for bridge interfaces) - integrate, test and ship
    • Apply Kenny's installroot patch - integrate, test and ship
    • Set default pxeclient option for metropolitan
    • Respond to RT 50730
    • Close off old RT tickets

  • Chris
    • memory and IPMI for northern and piccadilly
    • fix sauce
    • moving guests from district to metropolitan
    • check virtualisation requirements
    • with Stephen move central to AT
    • Meet with Alastair to discuss virtualisation project
    • RAT stuff

  • Gordon
    • Produce CSV file with outstanding bugs and add column with LCFG categories (as in @lcfg_f13_lcfg.rpms) and circulate.

  • Stephen
    • Change ownership of Paul bugzilla tickets to nobody
    • with Chris move central to AT
    • Tidy nagios monitoring of ashkenazy
    • Look at bakerloo nagios monitoring issue
    • Buildfarm project

-- AlastairScobie - 16 Nov 2010
-- ChrisCooke - 19 Nov 2010

Topic revision: r10 - 20 Nov 2010 - 18:48:37 - ChrisCooke
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies