MPU Meeting Tuesday 23rd October 2012

Server Upgrades

Chris had a few problems with the bugzilla support for multiple instances on a single host. To avoid spending lots of extra time on new development he has focussed on just getting a single bugzilla instance configured for LCFG. Various configuration changes need to be merged back into the LCFG headers. Need to devise a plan for the upgrade and schedule a day for it to be carried out. We agreed that the simplest way forwards was to put it onto a separate VM, at least for now.

Stephen will look at upgrading budapest which hosts and in November.

Server Hardware

There is now an rfe map named goodfirmware which maps server models to the available firmware update versions and files.

There is also a web interface on the ordershost server which lists the machines that require firmware updates, this needs packaging. There will be a lot more information available after the package is installed on all servers as part of the next stable release.

Chris has now worked out how to detect the Drac using IPMI but not how to detect the particular version of the firmware.

Security Enhancements

The scanning for "suspicious files" in temporary directories was disabled. This feature conflicts with tmpwatch which is used to regularly clear out old files based on the access time. Each daily scan of the temporary directory updated the file access time so nothing was ever being removed.

Stephen has been working on adding daily reports which give details of SSH authentication failures so that we can try to spot attacks. He also found an issue with copying data from syslog entries into PostgreSQL varchar fields which have fixed sizes. The data was not being truncated when it was too long which caused the data import scripts to fail.

The collection of audit logs on a remote host and the higher frequency monitoring for critical issues is being tested on juice with the data coming from the test SSH server shrew.

Stephen is currently working on a component to configure the AIDE (Advanced Intrusion Detection Environment) software. This is mostly done but needs documentation, it will then take a while to try various configurations so that we can get something suitable for our servers.


Nothing happened.

Miscellaneous Development

IBM storage array
The firmware has now been upgraded, this gives us the option of enabling "active-active" mode on the controller and having load-sharing and auto-failover. This results in there being 4 paths, two of which are "ghost" paths which are understood by multipath but not everything else. In particular it is necessary to filter the devices list for LVM so that it ignores them. When we are only using fibre-channel based storage this is simple but when there is also local storage being used we need to know the UUID. There is also the problem that when making these changes the initramfs needs to be rebuilt using dracut. Alastair attempted to add support for this to the LCFG kernel component but hit problems, these appear to be related to separate usage of the RPM2 Perl module. This doesn't fix the issue of udev doing unfiltered probes at boot time. We now get a lot of error messages at boot time even though the machine actually boots correctly, it does make it much harder to spot any real issues. This raises the question of whether we could use a modified udev configuration on our servers, no idea how easy this would be.

network component
Alastair fixed an issue with the network component templating which Stephen had introduced when he altered the schema some time ago to add boolean validation to a resource.

KVM and conserver
Ian Durkacz has set up conserver so that we can use it for KVM domains. This relies on each console server machine being able to access the KVM servers via SSH as the conserver user.

LCFG logserver
Whilst doing some work on the security project Stephen noticed that the LCFG logserver hangs on to deleted files. When checking the code he noticed that most of the temporary file handling was quite nasty and in some cases unnecessary. He has improved all the temporary file handling code but this has not fixed the deleted file bug.

LCFG ngeneric
The Perl version of the ngeneric code (LCFG::Component) now provides access to the _METHOD information in the same way as the shell code does.

updaterpms and redirects
Thanks to a patch from Kenny MacDonald updaterpms will now correctly follow http redirects.


The KVM server now has fibre-channel access for testing so that we can deploy metropolitan.

KVM server configs
We now have 8 KVM servers and lots of common configuration is in the LCFG source profiles. This should be moved into an MPU KVM header (i.e. not the general config header).

dice installbases
It has become clear that when we are doing a transition from one minor release to another (e.g. SL6.2 to SL6.3) we need to provide dice installbase profiles for both minor releases to allow easy installs of either.

New LCFG slaves
The LCFG slaves have moved to KVM and are named bol and metsu

KVM domain criticality
Can we use the sysinfo.criticality resource to show which guests need to be live-migrated to another server before a hardware reboot and which can just be suspended for the duration? We should take this to the next Operational Meeting.

This Week

  • Alastair
    • kernel component - try using require rpmlib instead of use in each function using rpmlib calls
    • Document ssh keys mgmt - home page and windows
    • Check with Dave Aspinall and Henry Thomson re openidNeither using it
    • Meet with Chris
    • Systems blog article on the KVM Service
    • do something about KVM server for end-users
    • Report libvirt empty LVM group issue to Redhat, unless fixed in 6.3
    • create an MPU KVM server header
    • rename metropolitan default bridge to br0
    • chase wrt T2 project paragraphs Chased
    • merge release with platform bucket,create new virtualisation bucket

  • Chris
    • Schedule LCFG bugzilla upgrade (on a VM)
    • Package up server hardware project
    • Empty bakerloo
    • Document ssh keys mgmt - macos
    • Personal development topics
    • Meet with Alastair
    • Gently nag remaining vmware users

  • Stephen
    • Take perl-AFS issue to operational meeting
    • Finish Aide component
    • Document ssh keys mgmt - linux
    • Tidy out testing defaults
    • LCFG master -> schiff
    • Hogwood -> SL6.3
    • Office desktops -> SL6.3
    • LCFG layer -> SL6.3

-- AlastairScobie - 23 Oct 2012

Topic revision: r12 - 30 Oct 2012 - 09:32:48 - StephenQuinney
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies