MPU Meeting Tuesday 7th February 2017

Inventory

Alastair has been working on MAC address support for the API.

We had a discussion about how to deal with the locations and MAC addresses of KVM guests. Support for KVM guests will almost certainly not appear in the first release of the inventory system, but it has to be considered. Stephen suggested that the problem might become simpler if the two issues of keeping track of MAC addresses (the inventory system will be the master source of those) and of the locations of KVM guests could be dealt with separately.

MPU SL7

Stephen has upgraded the LCFG master. There was a bit of a hiccup: all the pre-upgrade tests of access to infdb happened over IPv4, but after the upgrade, access was attempted via IPv6. This didn't work, so the inventory headers were missing, so the first attempt to rebuild all profiles failed. Stephen bootstrapped the system by restoring the inventory data from backup then zapping the slaves' server caches, forcing another complete profile rebuild. This got LCFG back up and running, after which the IPv6 access problem could be addressed.

Chris is still working on Bugzilla.

computing.help.inf.ed.ac.uk is still on SL6 because we still don't yet understand why the SL7 version lagun, and some other SL7 hosts, don't have a /var/log/journal - and then sometimes do, after a reboot.

Miscellaneous development

Support for SL 7.3 is pretty much complete. However 7.3 suffers from serious bugs. We shouldn't use it, except for testing, until those have been fixed.
  • We're definitely seeing sssd RedHatBug:1396912. We have therefore had to turn off enumeration in sssd. This means that a complete list of user accounts cannot be obtained from sssd, which breaks some common user-visible software facilities.
  • We're concerned about sssd RedHatBug:1392444.
  • We're also seeing bugs with dracut. One is that /dev/shm shared memory is inaccessible to normal users. Stephen has patched dracut to fix this.
  • Some good news - 7.3 seems to offer far better support for SkyLake machines such as the HP G2. Certainly sleep now works properly on a test G2, with its screen managing to resume afterwards. We haven't yet tested 7.3 sleep on a Lenovo P310 to see if that's now working too.

Stephen fixed the check_network script. It now looks for the files which indicate bonding status in both the old and the new places.

Stephen fixed the ruby22 software collection. It didn't appear to have worked on SL7 before.

Operational

The NX servers have been replaced.
  • The old servers northern and piccadilly are now off.
  • The console on jubilee has stopped working - we'll need to fix it.
  • The process limit on the NX servers has been increased so that thunderbird can have hundreds of threads.

Chris successfully swapped the /var and /var/cache/afs partitions on circle. It was a slightly dodgier experience than he had expected because, with the machine being booted via PXE and displaying via the serial console, the terminal emulation didn't seem to be reliable enough to allow typo-free editing of fstab files. Stephen suggested that the process might be scripted? Chris will publish the procedure he followed and will think about automating it.

Stephen will draft a document about support for LCFG components written in shell.

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Continue work on RESTful API - TartarusRESTAPI
      • Document REST API
      • Further encourage people to use API and ii commands
      • Write more of the ii commands and document as writing.
      • Speak to George about macaddr/space feed
      • Start work on final report!
      • Chase Tim about theon acccess credential for feed Use new credential
      • Convert from mod-auth_kerb to mod-auth_gssapi (See Stephen for details)
      • How represent VMs
    • Deploy encrypted /tmp and swap conversion script
      • Do during Festival of Creative Learning week (w.b. 20th Feb)
      • Need to warn users that Gnome3 may pop up a window about /tmp being full (when script is run)
    • Schedule MPU meeting to discuss systemd ordering
    • submit polkit bug to redhat - with Stephen (check with 7.3)
    • Think how to regularly report on machines with no /var/log/journal
    • MPU SL7
      • Upgrade computing.help servers
        • Make lagun live
          • Remember proper certs for computing.help master
      • Decommission old 'hilfe' server
    • Check sysmans (et al) have 'nograce'.
    • Take a look at RT #78875
    • Look at RT and SL7RT
    • Look at differences for new 7.3 libvirt (2.0.0)
    • Try 7.3 on P310

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • MPU SL7
      • Continue with bugzilla
      • Look at wake backend (running on Inf servers)
    • DICE encryption
      • Continue thinking and researching
    • Roll out fixed sleep code
    • Reschedule MPU futures meeting
    • Update PackagesSiteMirror
    • Schedule gaivota downtime to investigate LVM/IBM VG issue
    • Look at differences for new 7.3 libvirt (2.0.0)
    • Look at RT and SL7RT
    • Think about whether we can use NX service for staff.login/student.login
    • Double check KB KVM servers have live/jcmb-server-room.h included with appropriate JCMB_RACK define
    • Reinstall circle with improved partitioning scheme (larger /var, mount partition on /var/lib/libvirt)

  • Stephen
    • LCFG client refactor stage 1
      • schedule debrief meeting
    • LCFG client refactor stage 2
      • testing and documentation
      • blog article (once documentation complete)
    • LCFG server symlink to exam branches - produce reporting script and discuss with Graham
    • submit polkit bug to redhat - with Alastair (check under 7.3)
    • Investigate George's multiple network interfaces SL7 issue (eg consoles server)
      • waiting on George breaking metropolitan
    • Look at differences for new 7.3 libvirt (2.0.0)
    • Look at RT and SL7RT
    • Reboot badger (to get /var/log/journal)
    • Think about whether we can use NX service for staff.login/student.login
    • SL7 PXE installroot will need updating after stable release on 08/02/2017
    • Draft a position note on shell components under SL8 and possible ways forward

-- AlastairScobie - 07 Feb 2017

Topic revision: r8 - 24 Sep 2019 - 13:50:24 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies