MPU Meeting Tuesday 29th May 2012

Simple KVM Service

No activity.

Server Upgrades

Chris tested sauce's various functions and found two problems with the RPM slave functionality:
  1. The web address redirects to This is unhelpful as doesn't serve packages! Stephen suggested looking for a missing chunk of configuration.
  2. serves packages, but the sites directory is inside the rpms directory instead of being presented at the same level. Stephen said that that this was intentional. Since our Disaster Recovery instructions say to simply mutate cache.pkgs to dr.pkgs something needs to be changed; Chris will try just planting a symlink to make the current structure work with the documentation.

Server Hardware

Not much activity.

Security Enhancements

Stephen has been finishing off the audit component. It now tidies up the auditd plugins directory, deleting files where needed. This is important as auditd blindly assumes that any file in its plugins directory is a plugin, which leads to errors when it comes across backup files and suchlike.

Stephen's main current priority has been to get some reports generated from the audit logs. The software which searches the auditd logs is called "ausearch". This has a programming interface in the shape of a C library. Stephen has tried to map this library into Perl using XS but the library doesn't map well into the Perl or Python way of doing things. For now he is resorting to running ausearch from the command line. This brings its own problems since the audit logs have a structure which is difficult to parse, leading to the use of very long and fragile regular expressions. A robust solution must involve the use of the programming interface somehow - after we have some working report capability. He may try writing his own C library wrapper which exposes the required functionality to Perl.

Miscellaneous Development

boot and client components deadlock
If a machine is switched off (not just sleeping) when a new profile is made for it, and the introduction of the new profile requires a configure of the boot component, but the boot component meanwhile needs to do a reboot, a deadlock situation is reached. The client component asks the boot component to configure, and waits until it has done so; the boot component meanwhile decides to reboot the machine, and asks all the components (including client) to stop, and waits until they have done so. This happens pretty rarely but it happened this week. We need to fix it as a mini-project, perhaps by introducing emergency timeouts so that no component hangs for ever. The immediate fix for a deadlocked machine is a forced reboot courtesy of Alt-SysRq S U B.
Dell PowerEdge R720
Alastair has installed a new R720 jubilee. His next job is to get its IPMI set up properly.
Chris added support for the R720 to the toohot script. He also made a toohot LCFG component to replace the script. The component is being installed over the coming weeks.


The LCFG wiki now has a cookies warning. Stephen will make it a little more prominent.
Although it hasn't found serious security problems it has alerted us to real issues so is earning its keep. At the weekend it found what it thought might be a trojan - thousands of processes appearing and disappearing quickly. It turned out to be caused by someone doing research into filesystems. Having found out what was going on Stephen was able to pass on some tips on how to run the research without causing problems.

This Week

  • Alastair
    • Produce T1 report
    • Document hot-migration and migrate back northern guests
    • Document cold-migration
    • Investigate CPU pinning - can this be done on fly without guest reboots? - seemingly so
    • Add instructions on how to increase memory (from cli)
    • Check and deploy Stephen's boot component fixes - pinned to develop release
    • Reboot all spare circle VMs
    • Directly attach atabeast to central
    • R720 IPMI
  • Chris
    • Fix sauce functionality wrt RPM slave
    • Time LCFG slave with CPU pinning to dedicated core (with Alastair)
    • Server hardware project

  • Stephen
    • Provide figures for T1
    • Migrate telford data from IBM array to EVO
    • Continue with security project
    • Produce report of LCFG bugs for MPU
    • Reboot hogwood for SL6.2 kernel
    • Process other units' responses about their perl-AFS module usage (which functions etc)
    • Speak to Graham about Theon work

-- AlastairScobie, ChrisCooke - 29 May 2012

Topic revision: r11 - 07 Jun 2012 - 11:02:21 - AlastairScobie
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies