MPU Meeting Tuesday 13th June 2017

Inventory

The wandering time report is now sent from the mp-unit address to cos.

Alastair has now set up backups for Tartarus.

Alastair has been working on integrating KVM guests with the inventory. These commands have been added (though they may be renamed at some point):

  • ii registerguest
  • ii unregisterguest
  • ii moveguest
He's still to add a script on the KVM servers which updates the inventory, and add support to kvmtool.

Chris has been working on a dsu module for clientreport. We decided that the module should just take all the information which dsu gives out, even if some of that information is already found by other means (for example BIOS version, NIC firmware).

LCFG client refactoring

Stephen has been continuing his work on qxprof and sxprof. Merging data now happens far more neatly. One of qxprof's more elaborate jobs is to output results in a format suitable for being evaluated by the shell. When using this, people often want two sets of shell variables, for old resource values and new resource values. The new qxprof should now support this properly. The next stage will be to look at how sxprof should best pass resource values into templates.

MPU SL7 Server Upgrades

They're all done.

Additional disk encryption

Chris is setting up a RHEL 7.4β VM so we can have a look at NBDE.

Miscellaneous development

Nothing this week.

Operational

  • bruegel (aka student.ssh.inf.ed.ac.uk and ssh.inf.ed.ac.uk) has been rebooted and is now running SL7.3. Everything seems to work properly, including NFS and autofs.
  • We intend to reboot hammersmith (nx.inf.ed.ac.uk) this Friday at about 8am.
  • staff.login.inf.ed.ac.uk is only lightly used and the hardware has reached the end of its life, so we're going to merge this service into staff.nx.inf.ed.ac.uk. This will happen in conjunction with the reboot of staff.nx (jubilee) which we'll schedule for 9am on Tuesday 20th June.
  • The stable release's Informatics kernel has been updated to the most recent version (3.10.0-514.21.1.el7). This version has been on test in the develop release for a week or two and has seemed fine. This change in kernel version will provoke a widespread reboot of DICE desktops, late next week in the student labs and early the following week for other DICE desktops. To take advantage of this, Stephen has updated a number of other components:
    • OpenAFS will be updated to 1.6.20.2, the most recent stable version. The next stable version 1.6.21 is almost here, but we've looked at its changelog and decided that we don't need to deploy it right away.
    • The NVidia modules have been updated to the latest releases.
    • We've mirrored the latest AMD Catalyst Pro driver. This only affects one machine.
    • VirtualBox will be upgraded to 5.1.22.
    • The VirtualBox Additions will be upgraded to 5.1.22 as well. Previously DICE was on version 4 so that's a significant change.
  • Former server schiff has been moved to the junk room.
  • Alastair's "wandering time" report has prompted a review of how NTP behaves on sleeping desktops. NTP has not been managing well on SL7 desktops which sleep, so we've reconfigured the behaviour slightly to help it. After their upcoming reboot, SL7 DICE desktops will refrain from sleeping for a while after each boot to give NTP time to settle, and will suspend and resume the ntp component appropriately at sleep time.
  • A couple of machines recently wedged up because their serial consoles were overwhelmed with error messages. See item 7 of The Infrastructure Unit report to the 14/06/2017 Operating meeting for details. We're going to take a look at logging levels and perhaps also other factors to see what we might be able to do about this.
  • SL7 installs now come with lots of messages from the LCFG systemd component about unit files not existing - see Bugzilla:3298 for a screenshot. There's nothing to worry about - the messages are being produced in an early stage of the install before the unit files have been created. The behaviour will be logged as a minor bug.

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Continue work on RESTful API - TartarusRESTAPI
      • Document REST API
      • Write more of the ii commands and document as writing.
      • Start work on final report!
      • How represent VMs
      • Continue with REST API testing framework
      • Consider what else needs done other than docs and tidying and backups
      • Blog something....take dev meeting talks
    • Deploy encrypted /tmp and swap conversion script
      • Need to warn users that Gnome3 may pop up a window about /tmp being full (when script is run)
      • Do in early July
    • Schedule MPU meeting to discuss systemd ordering
    • Check sysmans (et al) have 'nograce'.
    • Take a look at RT #78875
    • Look at /etc/hosts - dns issue (IPV6?)
      • work out what we need to fix current problem
    • Circulate info on RH7.3 systemd changes we may wish to consider
    • RT actions (as agreed)
    • Deploy disable-module header on all computing.help servers
      • Defer until return from hols in July in case of problems
    • Upgrade student.nx (hammersmith) to RH7.3 on Friday 16th June 08.00
    • look at console vs journal-or-kmsg for systemd

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • Discuss with Iain support for Clusters etc
    • Look at 7.4 beta - tang/clevis - to see if this technology does what we want
    • Upgrade qemu version on KVM servers once they're on 7.3

  • Stephen
    • LCFG client refactor stage 2
      • testing and documentation
      • qxprof / sxprof re-implementation
    • LCFG server symlink to exam branches - produce reporting script and discuss with Graham
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Investigate George's multiple network interfaces SL7 issue (eg consoles server)
      • waiting on George breaking metropolitan
    • Draft a position note on shell components under SL8 and possible ways forward
    • Produce some text for systemd mount bug (to submit to RH)
    • RT actions (as per agreed list) once 7.3 fully deployed
    • Take issue of disable per user journald logs on certain servers to OPS
    • Schedule jubilee downtime to move to SOL and upgrade to 7.3
    • Consider PD work for after LCFG client
    • Check IPV6 ssh connectivity to NX servers
    • Have a look to see if there is any way of modifying printk behaviour so that it can drop stuff if a serial console is blocking
    • File bug against lcfg-systemd - spurious warnings about missing targets at first boot.
    • look at console vs journal-or-kmsg for systemd

-- AlastairScobie - 13 Jun 2017

Topic revision: r8 - 24 Sep 2019 - 13:50:24 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies