MPU Meeting Wednesday 6th February 2019

Inventory

The handling of the location information has been changed. When a manual location is set the time will also now be recorded. If at some later point a switch reports a different location, and that is more than 90 minutes newer, then the automatic location will be used.

Alastair held a meeting with the rest of the computing team to discuss the new inventory, it was interrupted by Forum power issues so it will continue this week. A few things which came up are:

  • Need a way to search for the history of a hostname.
  • Barcode stickers will be attached to all machines.
  • Self-managed machines with dynamically allocated IP addresses will not need hostnames.

LCFG profile security

An overview of the new directory permissions has been published on the LCFG wiki - ProfileSecurity. Resources for the dns component had to be tweaked so that the querylogs are stored in /var/log since the directly is owned by the named user.

There is a new lcfg group which contains all the computing team. That ensures that the computing team retains read-access to the LCFG data after the code changes are made. At some point we should consider if we really want that or whether we should only allow access for root (maybe depends on the risk model for each service?).

Installs have been successfully tested with the new client code included in the installroot, installbase and client profiles.

Alternative DICE desktop

The lcfg-reltool command now has support for building Debian packages using the deb and devdeb commands which work in a similar way to the rpm and devrpm commands.

An initial attempt has been made at packaging the LCFG core, client and dependencies and rdxprof runs although there are no components so it can't do anything useful yet.

There are now some documentation pages on the LCFG wiki - BuildToolsDebian and DebianPackaging. They are intended to be a "cookbook" rather than a complete reference, more details will be added as we go along.

Miscellaneous Development

dice-check
The script for checking the lab machines was improved to make it easier to map hardware model information to the appropriate LCFG header.

pathfix
This has been modified to remove all references to /home which should help fix the getcwd problem that is caused by the combination of afs and autofs. There are further changes to be made but they need further consideration.

Operational

SL7 kernel
There is a new INF_TEST kernel for testing on develop machines, this is the first of the 957 series from SL7.6.

Software collections
New support for php 7.2 and nodejs 8. Tried to do devtoolset-8 but it needs lots of packages from fastbugs so we will leave it until SL7.6 is ready.

golang
The Go language support has been updated to 1.11.4

nvidia 410 driver
Still having problems with this driver. It does now build but seems to segfault when used by X. Maybe it needs the latest kernel from SL7.6? We need to try that...

HP G3 mini
There were problems with PXE booting this model for installs. It turns out that it does NOT support UEFI for PXE booting so that has been reverted to legacy mode.

SSH root access
root access is now allowed with the prohibit-password option.

xrdp
Two new servers have been ordered at great expense. Graham has been testing TLS 1.3 support for xrdp on mizar

desktop root partition size
We discussed whether this should be increased and, if so, to what size. Stephen will investigate our current usage with a new df clientreport module.

drupal
There was a security update for drupal

virtual dice
Inf unit want to close all external access to our LDAP service. This means that virtual dice either needs to require the use of the Informatics VPN or switch to guest mode only.

Pandemic docs
Progress has been made on updating the MPU pandemic docs.

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Start work on final report!
      • and give details on how Tartarus tables are accessed to Ian D for inclusion in his privileged access discussion paper
      • Look at postgresql replication (do after shipping)
      • Add tartarus info to SwitchToSelfManaged
      • Need tests for API /orders and need new tests to check for correct authorisation
      • Make lcfg header generation live (need to check what will be deleted when we do this - big discrepancy between old inventory and new)
      • Look at user support form - how does that lookup hostname?
      • Look at whether there is an easy library way for Chris to grab the macaddr of a machine given the hostname
    • Schedule MPU meeting to discuss systemd ordering
    • Take a look at RT #78875
    • Look at /etc/hosts - dns issue (IPV6?)
      • work out what we need to fix current problem
    • Implement change to kvmtool to allow KVMs to be marked as disabled
      • looked at this - looks like the metadata tag isn't passed through libvirt (prior to 4.0.0), so can't be read/written by kvmtool
      • put on activities list to do once upgrade to libvirt-4.0.0
    • Look at Stephen's 'Thoughts on shell components'
    • Start looking at https and computing.help (remove assumption that https means want cosign login)
      • wait on Neil's efforts with EdWeb
    • Investigate systemd reboot bug on gaivota and add some more debugging (store tree diff somewhere)
    • drupal username collection re GDPR
      • configure live server to run the user expiry script
      • Fixup email domains for existing accounts and check fix for domain setting to inf.ed.ac.uk is in place on live service
      • need to ship fixed cosign module on live service
    • Inventory stuff re GDPR
    • Check with Tim / George about capability for login to student machines - where are we
      • Tim says that we should create a capability that is given to the base cohort and set that capability to no-grace
    • Useful? - a script which checks how fast a machine's console log is growing (eg huge number of dbus problems on hammersmith)
      • suggest to Ian D
    • Blog on projects
    • KVM pcid
      • Investigate spectre / meltdown wrt VMs
      • Which CPU is needed for each group..
Following config worked on 'brent' (hosted on vermelha). We might need to consider whether we want "match='exact'" wrt migrations.
<cpu mode='host-model' match='exact'>
<model fallback='allow'>IvyBridge</model>
<vendor>Intel</vendor>
<feature policy='require' name='pcid' />
</cpu>
      • Update: looked at this. We should be safe to set CPU model to host-model on clusters where the CPU is identical across the cluster (KB and AT). However we can't where the CPU's aren't identical (IF) - here we should be able to set a base minimum machine (SandyBridge ?). We'd need to check that migration works. Recent versions of virsh allow you to specify the hosts in the cluster and ask for a CPU model description which will work across all the cluster. Setting the base minimum to SandyBridge on 'oyster' fixed one of the Spectre flaws, but not all. It looks like we need a more up-to-date qemu-kvm to fix all the remaining flaws. * Wait until 7.6ish is settled re KVM software versions and try above again
    • Remove IBM disk array from stack
    • Produce some notes from OSS
    • Read George's mail of 8th November wrt DPIA
    • Try latest VDICE on Windows 10 machine at home (research guest login delays)
    • Update Pandemic pages - computing.help
    • Read final report NX replacement project
    • Review the three encryption computing.help pages

  • Stephen
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Produce some text for systemd mount bug (to submit to RH)
    • Take issue of disable per user journald logs on certain servers to OPS
    • Consider PD work for after LCFG client ...
      • looking at Ceph
    • Look at where we're using ALL in access.conf
    • Finish off NX replacement project (#389)
    • Continue with RT ticket clearout as discussed in October
    • Read George's mail of 8th November wrt DPIA
    • Firmware update - deneb and steen
    • Reboot staff.ssh (hare)
    • Complete tartarus clientreport module errors report
    • Update Pandemic pages - Security, LCFG,
    • Add a 'df' module to clientreport
    • Review https://computing.help.inf.ed.ac.uk/sl7
    • Move afsbuild server (juice) from Forum to AT

-- AlastairScobie - 06 Feb 2019

Topic revision: r5 - 23 Sep 2019 - 13:33:41 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies