MPU Meeting Thursday 19th November 2015


Alastair has written a script for importing the JSON data from the clients. The headless clients were showing errors from the monitor detection module so Alastair has disabled that check on servers. He will now check for errors from any other modules.

Alastair has begun looking at how to process inbound supplier reports from HP which come in via email. The standard approach using the callremctl script will not work due to the size of the emails. One idea is to stash the email in AFS and then use remctl to call a script on the inventory server. Stephen suggested using curl to submit it to a CGI script over https using GSSAPI authentication. Alastair has written a Perl script to parse the MIME structure of the email.

LCFG Client Refactoring

Nothing happened.

SL7 Server Base

Chris has been blogging.

Chris has spent some time trying to install SL7 onto the HP DL180 sauce without much luck. He will send the details of the problems to the mpu list so that we can all take a look.

Miscellaneous Development

OSX El Capitan
Stephen has been working with Kenny MacDonald to add support to LCFG for the changes in OSX El Capitan. In particular, local software now can only be installed beneath /usr/local. The build tools now use a different directory locations map and that is reflected in the sysinfo resources for that platform. The om environment initialisation module was tweaked to set a PATH variable which better suited both OSX and SL7. See bugs bug#907, bug#908, bug#909, bug#911, bug#912, bug#913, bug#914, bug#915, bug#916, bug#918, bug#919.

LCFG booting context
It was found that the booting context was not being cleared at the end of the boot process and instead was being cleared 10 minutes later when the client did its regular profile check. This turned up a number of issues. 1) The lcfg-clearbootctx service was calling setctx instead of the context tool which meant it set a pending context but did not prod the client to apply the change. 2) It appears that the context command was being called at the wrong time before the client component had started. This still needs more investigation though as it should have been working correctly. 3) The booting context is not being cleared at the end of the boot process but rather it occurs some time after the client component has started but before all components are finished. Can we move it closer to the end? 4) The client context method did not always return zero when successful which systemd logged as a failure. See bug#917 for some details.

systemd component
A bug was found in the way the saved resources were loaded from a YAML file. If this process failed then the component would fail. It is now done in an eval with error handling. There is some debate about whether the stop_on_remove facility is working correctly, we need a good example of the problem (see bug#910).

lcfg check scripts
We found that the weekly check scripts were only looking at profiles on the stable release. This was a problem because the lab machines were on an exam-specific branch and were thus ignored. The scripts have now been modified to just ignore profiles on the develop release which avoids adding noise to the results.


We host the 2015 LDAP Conference in the Forum. It seemed to be very successful.

The rrdtool header was tidied to fix conflicts with various headers which pulled it in as a dependency for some perl modules.

This perl module was added for RT#74850

This old minor release has been dropped.

wire headers
Stephen has reviewed George's proposed changes for the wire headers and has sent a couple of comments. In particular, we use the NEEDS_GATEWAY macro for our "inf" profiles when porting to new platforms.

Stephen has added some internal documentation for the SL7 SwitchDesk facility.

nvidia drivers
The nvidia drivers have been updated, no new default series this time.

epel updates
SL7 epel updates were applied, a problem with alpine was discovered and it has been reverted to the previous version.

rdxprof hanging
We have long known about problems with the LCFG client becoming stuck with two processes running. This is caused by components written in shell calling applications such as om which end up with file descriptors not being closed. Bugs have been filed against various components. We would like to be able to detect these problems, possibly using the client report script. Stephen suggested a few strategies which might work but we need an example of a machine with the problem so we can verify the best way to spot the problem, Stephen will ask COs to tell us when the find one.

maven 3 and git 1.9
The software collections for maven 3 and git 1.9 have been added to SL6 machines.

SL6 build hosts
Both SL6 build hosts have been reinstalled so that they have larger disks and more memory. This should save us from having to regularly chase COs to clear out old stuff.

SL6 sleep problems
We have discovered that SL6 machines have recently started having problems with waking up with no network access for the first few minutes. For now sleep has been disabled on all stable SL6 machines. We should try to identify the cause of the problem even if we then decide just to leave them all permanently awake. There are messages from rdiscd in syslog about failed ioctl calls, can we use this to find when the problem first started?

This week

  • Alastair
    • Inventory project
      • continue working through InvProjectWorkFlow
      • consider what next can be integrated into existing system, if anything
      • Check for systemic errors from clientreport
        • Look now that servers don't check monitors
      • Document clientreport
      • Document order sync code
      • Continue work on hpreport processing script
    • @home - look at using rsync from site.pkgs instead of mirroring from upstream
    • Remove default pool if ops meeting agrees
    • Experiment with different window managers under VNC (making the assumption that performance under NX will be similar)
    • Think of a use for 'atom'
    • Understand how NetworkManager works wrt init scripts
    • Deploy encrypted /tmp and swap
      • Continue work on script to modify existing machines
        • modify to be an installroot script
        • modify to wipe swap and /tmp
    • SL7 base server
      • Localhome - once /home is a symlink once more
      • check metropolitan USB and CD
      • Continue work with FC and LVM
        • investigate interaction between multipath and UDEV
        • check nagios notices if FC cable removed
    • look at Activities List

  • Chris
    • Inventory project
      • continue working through InvProjectWorkFlow
      • Look at clientreport modules for replacing firmwarereport
    • pkgsearch for SL7
      • reimplement as a yum web front end (yum search for keyword produce an html file of links to cgi to do yum info)
      • Need support multiple platforms
    • Liaise with George over iDRAC documentation
    • SL7 -
      • hwmon (HP raid and H200 raid)
      • Continue testing DL180 (try otaka)
      • Finish off looking at R620 (belter)
    • RT tickets close
    • Create an MPU blog
      • create a couple of SL7 server blog articles
    • Continue investigating SL6 sleep problem
    • look at Activities List
    • Look at USB device ownership (learn about udev)

  • Stephen
    • LCFG client refactor stage 1
      • schedule debrief meeting
    • LCFG client refactor stage 2
      • document API
      • complete combining packages
      • blog article
    • Think about PD - Interested in ZeroMQ
    • Investigate kernel component pipe moan by using shell commands instead of RPM module
    • Look at George's lcfg-dns proposal
    • Look at Reminders
    • Look at whether we can compare release strings on client so that clientreport can report back (will show whether profile has been successfully received and stored) - with Alastair
    • look at Activities List

-- AlastairScobie - 19 Nov 2015

