MPU Meeting Tuesday 18th July 2017

Inventory

Alastair has been working on adding a simple web interface for querying the inventory. It uses the same Catalyst code as the REST API but with a different view for generating HTML. The design of the REST API has made it a little awkward to design some parts of the web interface where multiple values may be specified for query parameters but it's good enough for now.

LCFG Client Refactoring

Stephen has been doing more testing.

MPU SL7

devproj
We still need to archive the devproj web pages which are on birman, Alastair will talk to Neil about where the files can be stored.

pkgsearch
We should add a holding page for this service on another machine so that warwick can be turned off.

Disk Encryption

Chris has been using Fedora to try a client/server pair. It was suggested that to keep things simple the VMs should just use local accounts and not worry about Kerberos/LDAP configuration.

Miscellaneous Development

Wandering time
the report no longer shows out-of-date results.

Starting VMs
The recent KVM server outage has raised a couple of issues related to starting the VMs. Firstly, how do we know if a VM is meant to be running? Is there any to mark a VM as "disabled" such that it should not be restarted? Also, is there any way we can manage the order in which VMs are restarted? For simplicity, we just did them in alphabetical order last time.

nagios remctl
We have noticed that many servers have hung remctl processes. The code which uses remctl to send nagios passive check results needs to include a timeout. Stephen will take a look, it should be fairly easy to do.

Operational

vbox video
The latest vbox additions (5.1.24) has issues with X when running a DICE VM on a DICE host. Alastair will investigate.

Dell R530
Chris has added a hardware header for the new Dell R530 server.

amarela disk
A failing disk on amarela has been replaced.

HP G2 sleep
This has been re-enabled now the desktops are all using SL7.3.

asus servers
Some of the RAT asus servers have serial console issues. It seems that this is related to the BMC firmware version. There is an updater which uses freedos, Chris is looking at the problem.

dsu
There is a new version of dsu. We will warn COs that it is available but it has not yet been tested.

Kernel and openafs updates
The kernel and openafs packages will be updated for all DICE machines after the stable release on 26th August.

SL7.3 update
We need to agree on a date by which all machines will be upgraded from 7.2 to 7.3.

SL6 EOL
We plan to drop access to SL6 for users at the end of September.

qemu update
Can we reliably migrate VMs back and forth between KVM servers running SL7.2 and SL7.3?

printk log levels
What should be the standard log level for printk on servers? Stephen will raise the issue at the next Operational meeting.

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Continue work on RESTful API - TartarusRESTAPI
      • Document REST API
      • Write more of the ii commands and document as writing.
      • Start work on final report!
      • How represent VMs
      • Continue with REST API testing framework
      • Consider what else needs done other than docs and tidying and backups
      • Blog something....take dev meeting talks
    • Deploy encrypted /tmp and swap conversion script
      • Need to warn users that Gnome3 may pop up a window about /tmp being full (when script is run)
      • Do by July 26th July
      • Now down to 10 desktops, mainly COs
    • Schedule MPU meeting to discuss systemd ordering
    • Check sysmans (et al) have 'nograce'.
    • Take a look at RT #78875
    • Look at /etc/hosts - dns issue (IPV6?)
      • work out what we need to fix current problem
    • Circulate info on RH7.3 systemd changes we may wish to consider
    • RT actions (as agreed)
    • Deploy disable-module header on all computing.help servers
      • Defer until return from hols in July in case of problems
    • look at console vs journal-or-kmsg for systemd
    • Take MPU SL7 server upgrade for closure
    • Sort out devproj static dump and host on groups.inf.ed.ac.uk (speak to Neil)Has been running on groups.inf since November
      • Delete birman vm
    • Is there a route via libvirt to mark a VM as being disabled ?
      • Only option looks like adding something like DISABLED to the name of the first disk image
    • Temporarily pull vboxvideodrv from DICE xfree config until VirtualBox vboxvideo module installs correctly.
      • look to see if there's a bug report we can follow - Virtualbox Bug 16907
      • Decided to downgrade to 5.1.10 vboxadditions - this works fine

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • Look at 7.4 beta - tang/clevis - to see if this technology does what we want
    • Test a VM migrates ok from 7.2 (eg on waterloo) to 7.3 (circle)
    • Decommission warwick (pkgforge) (need holding page somewhere)

  • Stephen
    • LCFG client refactor stage 2
      • testing and documentation
    • LCFG server symlink to exam branches - produce reporting script and discuss with Graham
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Investigate George's multiple network interfaces SL7 issue (eg consoles server)
      • waiting on George breaking metropolitan
    • Draft a position note on shell components under SL8 and possible ways forward
    • Produce some text for systemd mount bug (to submit to RH)
    • RT actions (as per agreed list) once 7.3 fully deployed
    • Take issue of disable per user journald logs on certain servers to OPS
    • Schedule jubilee downtime to move to SOL
    • Consider PD work for after LCFG client
    • Check IPV6 ssh connectivity to NX servers
    • Have a look to see if there is any way of modifying printk behaviour so that it can drop stuff if a serial console is blocking - not with current EL7 kernel
    • File bug against lcfg-systemd - spurious warnings about missing targets at first boot.
    • look at console vs journal-or-kmsg for systemd
    • Nagios RemctlSend needs a timeout for its call to remctl

-- AlastairScobie - 18 Jul 2017

Topic revision: r11 - 24 Sep 2019 - 13:50:24 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies