MPU Meeting Tuesday 12th January 2016

Inventory

Nothing happened.

LCFG Client Refactoring

Stephen has written more of the API documentation. When finished, he'll blog about it. He's also fixed more bugs.

The code could already generate status files, but now it can also read them back in. This has proved to be a handy way to provide data for tests. This has led to the discovery of a few more bugs.

SL7 Server Base

The hwmon component now reliably checks HP RAID status and read-only volume mounts on SL7. This leaves only one hwmon check untested: Dell PERC H200 RAID status. We cannot test this because we lack test equipment - we have a handful of Dell servers equipped with H200 RAID but none of them can be spared. However the underlying sas2ircu software is available for SL7 and appears to use the same syntax as on SL6 - albeit with some extra commands - so the H200 RAID check should work.

We will abandon the eth0 style of NIC names in favour of the Red Hat convention. Stephen has written an explanatory article on the MPU blog.

Alastair has rewritten (and is testing) the localhome component to use autofs.

Stephen is now starting work on the apacheconf component (MPU blog post, ideas wiki page) and would like any last minute ideas now please.

Miscellaneous Development

updaterpms check
The check script runs fine at 9am, but not at 8am. It's not clear why! For now we'll shrug and accept this.
encrypted partitions
Stephen has provided a couple of macros (FSTAB_ENCRYPT_TMP and FSTAB_ENCRYPT_SWAP) to handle encryption while hiding the disk name (e.g. vda or sda). Alastair has written an fstab conversion script for existing machines. Chris will check it before we deploy it. It's designed to run from cron. It will wipe the swap partition if it can, and almost entirely fill the tmp partition (temporarily!) to wipe that of deleted data.
systemd accounts
the latest incarnation of the systemd package needs two extra user accounts. Stephen has added them to the list (of accounts which are made automatically).
SL7 kernel downgrade
The latest upstream SL7 kernel (3.10.0-327.3.1.el7) has had to be backed out because VirtualBox didn't work with it, apparently because of a change to an undocumented internal kernel API. SL7 is back on the 3.10.0-229.20.1.el7 kernel for the time being. The latest VirtualBox does work with 3.10.0-327.3.1.el7 but can't be installed for two reasons: firstly our VirtualBox version is fixed until the end of teaching and secondly the new version requires software from SL7.2 which isn't yet out.
Lots of software updates
Now that a new SL release (7.2) is almost out, much of it is finding its way into the package upgrades for the current release (7.1).

Operational

AMD GPU and Catalyst driver on SL7
Stephen has been handling a request for an AMD GPU card for an SL7 box. It'll need the Catalyst driver, which we haven't tried to use on SL7 until now. The configuration/installation process seems the same as before: a script generates a custom Catalyst RPM. However we won't know whether or not the card and driver work with SL7 DICE until the card arrives and we get a chance to try it out.
NX
NX continues to be somewhat fragile. Chris rebooted nx.inf.ed.ac.uk last week to fix a mystery resource shortage.

This Week

  • Alastair
    • Inventory project
      • continue working through InvProjectWorkFlow
      • consider what next can be integrated into existing system, if anything
      • Check for systemic errors from clientreport
        • Look now that servers don't check monitors
      • Document clientreport
      • Document order sync code
      • Continue work on hpreport processing script
    • @home - look at using rsync from site.pkgs instead of mirroring from upstream
    • Remove default pool if ops meeting agrees
    • Experiment with different window managers under VNC (making the assumption that performance under NX will be similar)
    • Think of a use for 'atom'
    • Deploy encrypted /tmp and swap
      • (After chris checks script) - Chris has been pointed at script
    • SL7 base server
      • Localhome functionality - use mkhome_dir instead?
      • check metropolitan USB and CD
      • Continue work with FC and LVM
        • investigate interaction between multipath and UDEV
        • check nagios notices if FC cable removed
      • Fix the bonding nagios script to scream if fewer than 2 slaves active for each bonded group
      • Look at defining a macro to set real device names for eth0 and eth1 (parameterised)
        • use Stephen's new disable old-style naming scheme
        • double check bonding still working on metropolitan
        • try on sauce
        • more experimenting required (and documenting)
    • Why is updaterpms check script returning 0 on weekly test - does it always do this when run from crontab - nope, works fine if reschedule for 4pm. Try at same time (8am), but different day.
      • Works at 9am so fix permanently at 9am!
    • Schedule MPU meeting to discuss systemd ordering
    • Continue building computing.help honeypot
    • Rotate drupal logfile on computing.help and devproj

  • Chris
    • Inventory project
      • continue working through InvProjectWorkFlow
      • Look at clientreport modules for replacing firmwarereport
    • pkgsearch for SL7
      • reimplement as a yum web front end (yum search for keyword produce an html file of links to cgi to do yum info)
      • Need support multiple platforms
    • Liaise with George over iDRAC documentation (look through ops reports to remind)
    • SL7 -
      • Mark up which servers we can't check 'hwmon' on (as no spare kit)
      • diskfull
      • test out rsync / rmirror (both client and server ends) - liaise with Neil
      • Schedule SL7 MPU server upgrade project meeting
      • Blog about hwmon and hwraid
    • RT tickets close
    • Continue investigating SL6 sleep problem
    • Schedule MPU stargazing meeting
    • Check fstab convert_encrypt script
    • December figures

  • Stephen
    • LCFG client refactor stage 1
      • schedule debrief meeting
    • LCFG client refactor stage 2
      • document API
      • blog article (once documentation complete)
    • Think about PD - Interested in ZeroMQ
    • Investigate kernel component pipe moan by using shell commands instead of RPM module => waiting on 7.2 => activities list
    • continue thinking about apacheconf
      • blog
    • SL7 server
      • blog about legacy network naming
    • Fixup reminders
    • rkhunter config needs fixing
    • December figures

-- AlastairScobie - 12 Jan 2016

Topic revision: r8 - 19 Jan 2016 - 13:59:05 - ChrisCooke
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies