MPU Meeting Tuesday 6th September 2011

AFS automation project

Nothing happened.

LCFG Server Refactoring

Simon reviewed the changes which add some basic tests. Stephen has written a patch to add support for a CPP macro which specifies the LCFG server API version. This will make it easier to create additional macros which work with both the new and old versions of the mutators.

Install Scripts

Chris has been reviewing the responses from his message to Mac users. There is a complete spectrum of responses from some people who would like us to manage the whole thing through to others who prefer to do it all themselves. Some users liked the idea of us providing configuration scripts as long as they could read them first so they knew what was being done and, if necessary, would be able to reverse the changes. A couple of topics which got mentioned a lot were printing and AFS/Kerberos. A recurring theme was the request for better documentation, preferably discoverable through a single location.

Chris has also been investigating how to automate various tasks. He will now begin writing up the report for this project as the aim was just to do an evaluation. Once the report is complete Chris will blog about it so that users can see the results of the project.

Simple KVM Service

Alastair has tested running lots of concurrent KVM instances on circle, he has got it up to 20 with the load only at 0.03 with memory usage of about 14GB. We need a basic machine-creation command-line script which can take a template, select a UUID and MAC address and generate a VM. For more complicated configurations it would be possible to use the graphical virt-manager tool. We also need some local documentation which pulls together all the information required by COs to start installing their own VMs.

Work has been done on the LCFG network component to add support for specifying a verbatim configuration. This has not yet been deployed to the develop machines, to have maximum testing time before a stable release is made the new version will be added after the next testing release has been made on Monday.

Miscellaneous Development

Light-weight install problem with DNS
Alastair has discovered a problem on light-weight SL6 server installations that results in the LCFG client not getting updates. It is caused by the DNS zone transfer not having completed before the installation of the packages has finished. With a full desktop install this is not an issue as it takes a long time to install all the packages. We now need to be able to wait for the zone transfer to complete before we can safely reboot. The issue is particularly bad for the LCFG client as once it has failed to lookup the aliases lcfg2 and lcfg4 it never recovers but the problem could also affect other daemons which need to do DNS lookups at start-time.

webots
We need webots to run on SL6 but it does not work at all on machines using the intel graphics chip, only the 7900s with ATI cards can cope. Even on ATI the webots software doesn't always work properly, we need to look into whether the closed-source proprietary version of the driver will work any better. This is very urgent as we need a solution before the first semester begins.

LCFG kdm
The LCFG kdm component now has an option to restrict who may do a shutdown, this will be useful for the meeting room machines. The component has also been removed from the lcfg/defaults.h header and added to lcfg/options/desktop. which will avoid it being installed but in a stopped state on minimal/server installs.

SL6.1
All DICE and MDP SL6 machines have now been upgraded to SL6.1 and the PXE installer has been updated. To finish the transition all the SL6.0 package lists and headers have been deleted to avoid confusion. There was some discussions at the LCFG Deployers meeting on how to make these transitions smoother. Next time we plan to allow access to both the current and previous minor-release package repositories for updaterpms, we also plan to switch ed_sl6_env.rpms to a minor-release dependent version. There is also some hope that GeoSciences will produce a script which uses yum to generate package lists which have the dependencies automatically solved, this could help a lot particularly with the options lists.

lcfg-nagios problem
There is a problem with using the RemctlSend module on SL6. Stephen has taken a look at the code and it appears to be wrong but previously worked on SL5 because Moose was less strict in how it handled attributes. The documentation for the module was also misleading, a patch which fixes both issues has been reviewed by Simon and submitted.

LCFG fstab
All code changes have been done to support GPT and mount-by-UUID, we now need these to be the default on SL6.

Operational

openafs 1.6.0
The first stable release of openafs 1.6.0 has now been made. Simon has built packages so it should be fairly easy to get it installed everywhere. Before we can have 1.6.0 as the default version we need to focus on getting the perl AFS module building and working on x86_64 machines. Simon has some patches which should help get it building, we also need to review all our local patches.

apache update
There is an important update for apache, this has now been installed pretty much everywhere and the daemon restarted.

circle memory
It looks like circle has the same issue with dodgy memory that has affected some of our other servers, Alastair will report the problem to Dell.

bakerloo RAID battery
bakerloo is reporting that it has a duff RAID battery, Alastair will investigate.

sleep custom setup
Chris has created a custom sleep configuration for a user. We agreed that it is useful to find out how users want/expect the sleep component to work but we don't want to do this too often. In most cases a user should be able to use om to disable and enable the sleep component as necessary for long-running jobs and suchlike. They can also use the cron example to deactivate the sleep component during their normal working hours.

This Week

  • Alastair:
    • Talk to Graham Newton about BIOS settings
    • Set SL6 to use GPT and mount-by-UUID as default
    • Read DR docs
    • Update network component after next testing release
    • KVM project - machine creator script and docs
    • circle - Report memory problem
    • bakerloo - Investigate RAID battery issue
    • Discuss DNS problem with George

  • Chris:
    • Finish off the submit project
    • Time Figures
    • Start report on install scripts project
    • Upgrade northern BIOS

  • Gordon:
    • alternatives component

  • Stephen:
    • Time Figures
    • Look at ATI proprietary graphics driver
    • LCFG refactor project

-- StephenQuinney - 06 Sep 2011

Topic revision: r5 - 12 Sep 2011 - 15:52:41 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies