MPU Meeting Tuesday 29th May 2012
Simple KVM Service
No activity.
Server Upgrades
Chris tested sauce's various functions and found two problems with the RPM slave functionality:
- The web address dr.pkgs.inf.ed.ac.uk redirects to backup.lcfg.org. This is unhelpful as backup.lcfg.org doesn't serve packages! Stephen suggested looking for a missing chunk of configuration.
- sauce.inf.ed.ac.uk/rpms serves packages, but the
sites
directory is inside the rpms
directory instead of being presented at the same level. Stephen said that that this was intentional. Since our Disaster Recovery instructions say to simply mutate cache.pkgs
to dr.pkgs
something needs to be changed; Chris will try just planting a symlink to make the current structure work with the documentation.
Server Hardware
Not much activity.
Security Enhancements
Stephen has been finishing off the audit component. It now tidies up the auditd plugins directory, deleting files where needed. This is important as auditd blindly assumes that any file in its plugins directory is a plugin, which leads to errors when it comes across backup files and suchlike.
Stephen's main current priority has been to get some reports generated from the audit logs. The software which searches the auditd logs is called "ausearch". This has a programming interface in the shape of a C library. Stephen has tried to map this library into Perl using XS but the library doesn't map well into the Perl or Python way of doing things. For now he is resorting to running ausearch from the command line. This brings its own problems since the audit logs have a structure which is difficult to parse, leading to the use of very long and fragile regular expressions. A robust solution must involve the use of the programming interface somehow - after we have some working report capability. He may try writing his own C library wrapper which exposes the required functionality to Perl.
Miscellaneous Development
- boot and client components deadlock
- If a machine is switched off (not just sleeping) when a new profile is made for it, and the introduction of the new profile requires a configure of the boot component, but the boot component meanwhile needs to do a reboot, a deadlock situation is reached. The client component asks the boot component to configure, and waits until it has done so; the boot component meanwhile decides to reboot the machine, and asks all the components (including client) to stop, and waits until they have done so. This happens pretty rarely but it happened this week. We need to fix it as a mini-project, perhaps by introducing emergency timeouts so that no component hangs for ever. The immediate fix for a deadlocked machine is a forced reboot courtesy of
Alt-SysRq S U B
.
- Dell PowerEdge R720
- Alastair has installed a new R720 jubilee. His next job is to get its IPMI set up properly.
- toohot
- Chris added support for the R720 to the toohot script. He also made a toohot LCFG component to replace the script. The component is being installed over the coming weeks.
Operational
- Cookies
- The LCFG wiki now has a cookies warning. Stephen will make it a little more prominent.
- chkrootkit
- Although it hasn't found serious security problems it has alerted us to real issues so is earning its keep. At the weekend it found what it thought might be a trojan - thousands of processes appearing and disappearing quickly. It turned out to be caused by someone doing research into filesystems. Having found out what was going on Stephen was able to pass on some tips on how to run the research without causing problems.
This Week
- Alastair
-
Produce T1 report
- Document hot-migration and migrate back northern guests
- Document cold-migration
-
Investigate CPU pinning - can this be done on fly without guest reboots? - seemingly so
-
Add instructions on how to increase memory (from cli)
-
Check and deploy Stephen's boot component fixes - pinned to develop release
-
Reboot all spare circle VMs
-
Directly attach atabeast to central
- R720 IPMI
- Chris
-
Fix sauce functionality wrt RPM slave
- Time LCFG slave with CPU pinning to dedicated core (with Alastair)
- Server hardware project
- Stephen
-
Provide figures for T1
-
Migrate telford data from IBM array to EVO
- Continue with security project
-
Produce report of LCFG bugs for MPU
- Reboot hogwood for SL6.2 kernel
- Process other units' responses about their perl-AFS module usage (which functions etc)
- Speak to Graham about Theon work
--
AlastairScobie,
ChrisCooke - 29 May 2012