MPU Meeting Tuesday 24th January 2012

AFS automation project

Going for Sign-Off.

LCFG Server Refactoring

The final phase of testing using the MDP LCFG server has now been completed. This brought to light a couple more bugs (bug#521, bug#522) which have been resolved. The name of the package for the server code has been changed to LCFG-Compiler to avoid confusion between the old name (LCFG-Server) and the LCFG component package (lcfg-server). Hopefully this doesn't break anything but we should check on the new DIY DICE server.

Wake-On-Lan

Chris has been working on the project report in preparation of requesting sign-off. He has added a service catalogue entry for the web interface.

Stephen suggested that it would be interesting to check the apache logs to see how many users are accessing the web interface each day to assess the popularity of the new service.

Alastair noted that he can wake up his HP DC7900 which has the latest SL6 kernel installed. It has BIOS revision 1.16 which is not the latest available. He will try it again without the add-on ATI graphics card installed to see if that makes any difference.

Simple KVM Service

We now have a KVM server in the Forum (bakerloo) which has the old bioboy storage array directly attached. Alastair needed to make some multipath configuration changes to get this working so he finished off the deployment of the LCFG mpath component written by Gordon. He noted that the multipath and mpath components should be merged. The multipath failover needs to be tested, because it is directly attached and neither the bioboy or the FC card support disabling a port he will have to actually pull out a fibre to see what happens.

We will now recommend bakerloo to other COs as the best KVM server for important VMs and keep northern for the less important testing and development VMs.

Chris will move the wake.inf service to a new SL6 VM running on bakerloo to remove another VMWare VM.

Server Upgrades

The DIY DICE service has been upgraded to SL6 and is now on sernabatim which is a VM running on the KVM server northern. Alastair mentioned that it should be added to the list of VMs for northern on the MPU wiki.

We can also upgrade to SL6 the inf-level LCFG server, (currently on ashkenazy), which is used for weekly testing. Chris will install a new VM running SL6 and Stephen will then check the two are generating identical profiles before we completely switch to SL6.

We can now begin the upgrades of our slave servers to SL6. Chris will do mousa first, at some point after the stable release for this week. We will then wait a couple of weeks and do more profile comparison checks to ensure all is well before upgrading trondra.

Alastair has upgraded to SL6 the VM used for monitoring the IBM storage array. It previously had to be a self-managed machine because it was not easy to install the huge java-based software onto a DICE SL5 machine. With the much smaller SL6 server installations it is now simple to get the monitoring software installed. The new VM is named giz (the previous one was zig) and is running on northern. It now benefits from all the standard DICE nagios monitoring so we will know if there are any problems.

Miscellaneous Development

check_multipath
The nagios multipath check script has now been updated to support the new --stdout option. Whilst making the change Chris also added in a random delay before sending OK messages to make it work in the same way as the other passive check scripts. Hopefully this will help smooth out the load on the nagios server.

pam_console
It seems that the pam_console support has never worked on SL6. This is because a vital piece of configuration, which was the default on SL5, was not set. The LCFG auth component has been updated to manage the /etc/security/console.handlers file to fix this issue.

USB device permissions
When some USB devices are connected the user requires certain permissions to be set on the device file. This has to be done using udev but we only want certain permissions to be set on certain devices and we would like them to be reset when a user logs out. The best solution appears to be to have an action triggered by udev to run the pam_console_apply script for the new device when it is connected.

Operational

SL6 PXE installer
We need to check if we are using the latest SL6 PXE installer.

CO clinic
There will be a CO clinic on Thursday 26th January, 10 until 11. Stephen does not usually work on Thursday mornings so Chris will be covering.

localhome for servers
Chris has made some changes to the localhome header to make it more useful. He has also added a wiki page to collect the list of servers which will use localhome. We can start switching the MPU servers after the next stable release.

SL6 kernel bug
The SL6 kernel on develop machines has a rather nasty local root exploit (CVE-2012-0056). See this blog article for full details. As it is a local exploit we are not concerned about it being on develop machines which are only accessible by COs but we should not use it on any multi-user machines. Roll-back to the previous kernel version is possible if necessary.

dice/options/virtual_hardware.h
Which should look at whether it is a good idea to move the contents of the dice/options/virtual_hardware.h header down into the lcfg level as most of it is generally applicable. Stephen will raise the issue at the LCFG Deployers Meeting on Thursday.

This Week

  • Alastair
    • kvm project - virsh wrapper
    • Try dc7900 wol with ATI card removed
    • Test bakerloo path failure
    • Merge mpath with multipath
    • Arrange figgy (with replacement RAID card) to go to KB
    • Ask Alison to sort out the spare memory - antistatic bags etc
    • Discuss with George - how rack desktop servers in AT and IF

  • Chris
    • CO Clinic on Thursday
    • Upgrade mousa to SL6 with new compiler
    • Move openldap-bdb-locks-increase from LCFG server profiles to LCFG slave header
    • Replace ashkenazy with new KVM based SL6 server (on northern) - and compare profile generated with that generated by ashkenazy
    • Finish WOL writeup
    • Continue work on localhome dirs for servers
    • PD - local url shortener implementation (for not-a-service)

  • Stephen
    • Update the PXE installroot to use the latest kernel too.
    • Raise lcfg/options/virtual-hardware at LCFG deployer
    • Consider PD - concrete task
    • RAT

-- AlastairScobie - 24 Jan 2012

Topic revision: r8 - 06 Feb 2012 - 16:42:36 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies