MPU Meeting Wednesday 2nd August 2017

Inventory

Alastair has improved the API for the web interface. He has now started working on finishing support for KVM guests.

LCFG Client refactoring

Stephen has extended the package list tests. This led to some work on improving support for error messages - ensuring that the message content is relevant and meaningful wherever the error occurs, and that it is passed correctly right through to where it needs to be.

MPU SL7

This project has gone for closure.

Miscellaneous development

Chris has been experimenting with virtual hosts, adding a couple of them to respond with an informative message to requests to a service which no longer exists. When a number of virtual hosts are on the same physical host, with the various separate names of that host given with simple CNAMEs, it's possible for the wrong vhost to match the request and respond. You have to be careful of the order in which you put the vhosts: the most general, widely applicable vhosts should come before the other more specialised ones, and in general those with a particular port number (for instance SSL on port 443) should precede those which don't specify a port number.

We haven't forgotten that we're working towards having our disks totally encrypted. With this in mind Chris has been experimenting with an interesting new encryption technology, Network Bound Disk Encryption, a.k.a. tang and clevis. It's in RHEL 7.4 and in recent Fedoras. You can see his report so far at MPUTangAndClevisTrial.

Stephen has added a 30 second timeout to Nagios remctl calls. It seems to work well, and should solve the problem of hundreds of nagios calls queueing up and swamping a machine.

Alastair has looked into marking a VM as disabled via libvirt, so that it wouldn't be started by default after a host reboot. This is difficult; only horribly hacky solutions have presented themselves so far.

Operational

There's another new version of VirtualBox. Version 5.1.26 fixes a severe problem with host-only networking. In addition the X problem in VBoxAdditions seems to have been fixed so the latest version of that has been listed for installation.

Stephen has been looking at the kernel printk log level. On DICE desktops we include "quiet" by default in the kernel command line, and this gives you a log level of 4 (print anything greater than WARN). However on DICE servers we don't add "quiet" to the kernel command line, so servers have a log level of 7 (report everything greater than DEBUG). While looking into this he discovered that it's reckoned to be a very, very bad idea to direct journald output to the console. Don't do this except in dire, dire need. It slows everything down horribly.

Stephen has done lots of MPU server reboots. The reboots we still need to do are of a few ssh, NX or login servers and of almost all of the KVM servers. Rebooting the KVM servers always means a lot of work, either in announcing the downtime of multiple services or in migrating lots of VMs. In this case we must reboot all KVM servers before the end of August, so we intend to get them rebooted with the minimum of work - by default we'll shut VMs down for the reboot unless it would be clearly less hassle to migrate them. Chris will organise the timetable for this and we'll all help with the reboots.

With an eye on the KVM server reboots and on the upgrade to the qemu packages which 7.3 brings, Chris has tested VM migration between 7.2 and 7.3 KVM servers. It seems fine in both directions.

Alastair has almost completed the introduction of disk encryption (for swap and /tmp) to DICE desktops. There are only about ten machines to go, half of which are allocated to computing staff, so he'll be contacting the relevant folk about that.

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Continue work on RESTful API - TartarusRESTAPI
      • Document REST API
      • Write more of the ii commands and document as writing.
      • Start work on final report!
      • How represent VMs
      • Continue with REST API testing framework
      • Consider what else needs done other than docs and tidying and backups
      • Blog something....take dev meeting talks
    • Deploy encrypted /tmp and swap conversion script
      • Need to warn users that Gnome3 may pop up a window about /tmp being full (when script is run)
      • Do by July 26th July
      • Now down to 10 desktops, mainly COs
    • Schedule MPU meeting to discuss systemd ordering
    • Check sysmans (et al) have 'nograce'.
    • Take a look at RT #78875
    • Look at /etc/hosts - dns issue (IPV6?)
      • work out what we need to fix current problem
    • Circulate info on RH7.3 systemd changes we may wish to consider
    • RT actions (as agreed)
    • Deploy disable-module header on all computing.help servers
      • Defer until return from hols in July in case of problems
    • Is there a route via libvirt to mark a VM as being disabled ?
      • Only option looks like adding something like DISABLED to the name of the first disk image
      • Is it possible to add additional fields into the XML file which a local script could interpret?
    • Look at Stephen's 'Thoughts on shell components'
    • Reboot jubilee to get latest kernel

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • Decommission warwick (pkgforge) (need holding page somewhere)
    • Start timetabling KVM upgrades to 7.3
      • with Stephen and Alastair assisting

  • Stephen
    • LCFG client refactor stage 2
      • testing and documentation
    • LCFG server symlink to exam branches - produce reporting script and discuss with Graham
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Investigate George's multiple network interfaces SL7 issue (eg consoles server)
      • waiting on George breaking metropolitan
    • Draft a position note on shell components under SL8 and possible ways forward
    • Produce some text for systemd mount bug (to submit to RH)
    • RT actions (as per agreed list) once 7.3 fully deployed
    • Take issue of disable per user journald logs on certain servers to OPS
    • Schedule jubilee downtime to move to SOL
    • Consider PD work for after LCFG client
    • File bug against lcfg-systemd - spurious warnings about missing targets at first boot.
    • Reboot ssh servers to get latest kernel

-- AlastairScobie - 02 Aug 2017

Topic revision: r9 - 24 Sep 2019 - 13:50:24 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies