MPU Meeting Tuesday 14th September 2010
LCFG Server Refactoring
Stalled.
Software Build Farm
The project has started.
Stephen had a prototype already running so the work consists of steadily finalising various bits of it.
He now has a working basic client tool which lets you submit build jobs.
The architecture: one machine has a daemon or cron job (yet to be decided). This will look at the queue of
submitted build jobs, and validates them to an
accepted queue, registering jobs as being required. Daemons on each build machine then look for jobs to run.
Installroot
Alastair has deployed the F13 installer built on F13. The release scripts have been modified. He needs to tidy the installroot and PXE documentation to reflect changes.
F13
We'll need f13_64 at some point. Stephen reckons that the core of this would be two days' work. The RAT packages would also need effort on top of this, but thanks to Iain's work most of them will just need to be built and submitted.
Alastair will mark priorities against each thing still to be done so we can order them appropriately.
Miscellaneous Development
- New storage array
- There are two problems:
- Management
- Alastair couldn't get the management application to send emails or SNMP traps on critical events such as power failure. The management application runs on Linux. It fails on DICE Sl5 and on native SL5. He's not too worried though as we can also monitor the array through scripting, initially a cron/email combination then proper Nagios checks.
- Multipathing
- Alastair testing the multipathing and found that you need a proprietary additional driver to do failover to an additional controller. This driver needs to be loaded in initrd. Alastair tried their initrd and couldn't get it to work. However, simple multipathing does work, and if a controller fails we can manually move to another controller. Things can be set up so that to do this you just need to enable extra switch ports.
Operational
- DL180s
- Alastair and Ian have checked them out.
- Ian has IPMI Serial Over LAN working (although it needs tidying and documenting).
- Alastair has bought an extra ether card for (HP) sauce. Since eth0 is being used for the (IPMI SOL) serial console, ethernet bonding is done over eth1 and eth2.
- Alastair will try an install with SOL to make sure it works.
- Alastair has rolled out a new
lcfg-fstab
which supports cciss
controllers. He's changed the scsiroot
header so you don't have to reference cciss
directly. You instead use sda
in your fstab
resources as normal and the component translates these for cciss
devices.
- The HPs are better than Dells in some ways:
- You can do firmware upgrades (e.g. RAID, BIOS) on a running machine without disturbing it, then just reboot to use the new firmware.
- There's also a better and more Linux-friendly RAID controller application.
- The HPs have health monitoring (small, tidy RPMs - not like OMSA) which would be easy to connect to Nagios.
- We'll now deploy sauce at KB. It's to be the MPU's DR machine there.
- We'll need to add Nagios support for the HP monitoring. (This is now in the Wee Projects list.)
- Nagios RAID monitoring time interval
- Chris pointed out that the Nagios passive check of RAID status happens every minute. It could be every 15 minutes, lessening the load on the Nagios server. Chris will make the change.
- Installroot DHCP problems
- The new installroot works with our DHCP server but not with some others. Alastair will produce a suitable fix.
- Thunderbird
- Alastair and Iain have sorted something out and Iain's putting it in service this week.
- Metropolitan VM storage
- Chris tidied up the mess in metropolitan's blob.
- Proper login screens
- Alastair will tackle this. The most important thing is perhaps to see what RAT need for the exam environment.
- Stable release
- We had package conflicts when
*-*
accidentally escaped into a stable release. We need to tweak the release testing procedures to outlaw *-*
in package lists. RAT noticed that it also occasionally appears in headers. This can be allowable in exceptional circumstances but is discouraged, except when removing software.
- RAT package list troubles
- RAT reported to us that it has been having trouble with its package lists:
- fixed vs floating
- some packages need to have fixed versions but others don't, yet the current package list arrangement forces all packages to be fixed. Stephen suggests splitting the RAT lists into separate "fixed" and "floating" lists and loading the "floating" ones after package updates have been applied. Perhaps the teaching packages would tend to be in the "fixed" category.
- devel
- Packages needed purely for building other packages - e.g. in
BuildRequires
in a spec file - can be put into dice_f13_devel
. This will make them available on COs' and build machines but not elsewhere. This should cut down the conflicts and complication we sometimes experience with these packages.
- perl test modules
-
dice_f13_devel
is also good for perl test modules: they're useful to have available but not on every machine.
- Gordon
- Gordon will join the MPU for the rest of 2010.
Projects for the last third of 2010
The project list has been finalised:
On the subject of the VMware Server replacement, Alastair saw a presentation from Graeme Wood about IS's virtual server service. The service looks quite good and it may well be possible for us to use it. Points:
- You need to use Windows for access to the console and for management. IS are willing to look at providing a Windows terminal service.
- They'll carry our VLANs.
- Live migration will be available.
- They'll have live replication available within the year - disk and live memory - for instant switchover.
- The initial price seems high but they're talking of reducing it. For what you get perhaps it's not so high.
- The primary service will be at KB with replication infrastructure at AT. This may have network implications, but only for virtual servers which use the network an awful lot.
This Week
Alastair will:
- Finish off the HP D180 support
- Finish off the IBM storage array support
- Assign priorities to F13 tasks
- Tidy the installroot and PXE documentation.
Chris will:
Stephen will:
- work on the Software Build Farm
- finish putting F13 onto the LCFG website
- finish the Package list test scripts
--
ChrisCooke - 17 Sep 2010
Topic revision: r1 - 17 Sep 2010 - 10:15:50 -
ChrisCooke