MPU Meeting Wednesday 6th March 2019

Inventory

No progress, except that Alastair has been thinking about doing an initial clientreport run on new machines, including self-managed. It could be included in the LCFG installroot. We'll want PXE to work on self-managed subnets anyway so that we can offer Ubuntu self-install.

LCFG Profile Security

Nothing this week.

SL7.6 Update

Nothing much this week. The installer testing still needs to be done.

Miscellaneous Development

  • Stephen has written new locking code for ngeneric. It's in a new module, LCFG::Lock. It's object-oriented, and by default it unlocks if it goes out of scope. It should be less buggy and far more secure than the old method. It's being tested, and it'll appear in the develop release in due course.
  • Stephen has fixed a few minor problems with the new LCFG build tools code.
    • When used for Centos packages, buildtools has no access to the LCFG repository. This is a problem because podselect isn't available in the base distribution - we add it.
    • If a generated documentation file is empty, it's now deleted immediately after having been generated.
    • Having a component script is once again optional when using the lcfg_add_component CMake macro.
  • There's a new BOOT_PARTITION cpp macro for use in disk partitioning. It takes as an argument the partition number (which should usually be 2) and it expands to the correct type of boot partition to support either legacy boot (bios_grub) or UEFI boot (/boot/efi).
  • Kenny MacDonald has submitted a patch to updaterpms which adds https support for the PreFetch function. See Bug:1120 for details.

Operational

  • We've taken delivery of two new machines which will take over the xrdp.inf.ed.ac.uk service. Chris is doing the initial installation. They'll be called lute and archlute.
  • Once xrdp has moved to lute and archlute we may redeploy hammersmith as an additional remote compute server for xrdp users.
  • We discussed remote desktop provision more generally. We still intend to make separate remote desktop provision for distance learning students. We also raised the possibility of using a Citrix desktop virtualisation solution to provide a more performant service for staff, whose use patterns can stress the xrdp servers more than students tend to do.
  • We also discussed the possibility of providing more desktops with local GPUs, so that software development could be done on them rather than on a Slurm-controlled cluster. (See also Help:cluster-computing.)
  • The default root partition size on desktops has been raised from 80GB to 100GB.
  • The xrdp server hammersmith has been having performance problems. We're continuing to investigate them and experiment with possible solutions. One of them is to reduce logging to the console, by setting the console logging to the current Red Hat standard, where most messages only go to the journal. We'll look into setting this by default, perhaps for SL7.6, but with LCFG component systemd services set to log to the console.
  • Stephen will make his regular DICE lab status reports accessible as a web page.

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Start work on final report!
      • and give details on how Tartarus tables are accessed to Ian D for inclusion in his privileged access discussion paper
      • Look at postgresql replication (do after shipping)
      • Add tartarus info to SwitchToSelfManaged
      • Need tests for API /orders and need new tests to check for correct authorisation
      • Make lcfg header generation live (need to check what will be deleted when we do this - big discrepancy between old inventory and new)
      • Look at user support form - how does that lookup hostname?
      • Look at whether there is an easy library way for Chris to grab the macaddr of a machine given the hostname
    • Schedule MPU meeting to discuss systemd ordering
    • Take a look at RT #78875
    • Look at /etc/hosts - dns issue (IPV6?)
      • work out what we need to fix current problem
    • Implement change to kvmtool to allow KVMs to be marked as disabled
      • looked at this - looks like the metadata tag isn't passed through libvirt (prior to 4.0.0), so can't be read/written by kvmtool
      • put on activities list to do once upgrade to libvirt-4.0.0
    • Look at Stephen's 'Thoughts on shell components'
    • Start looking at https and computing.help (remove assumption that https means want cosign login)
      • wait on Neil's efforts with EdWeb
    • Investigate systemd reboot bug on gaivota and add some more debugging (store tree diff somewhere)
    • drupal username collection re GDPR
      • configure live server to run the user expiry script
      • Fixup email domains for existing accounts and check fix for domain setting to inf.ed.ac.uk is in place on live service
      • need to ship fixed cosign module on live service
    • Inventory stuff re GDPR
    • Check with Tim / George about capability for login to student machines - where are we
      • Tim says that we should create a capability that is given to the base cohort and set that capability to no-grace
    • Useful? - a script which checks how fast a machine's console log is growing (eg huge number of dbus problems on hammersmith)
      • suggest to Ian D
    • Blog on projects
    • KVM pcid
      • Investigate spectre / meltdown wrt VMs
      • Which CPU is needed for each group..
Following config worked on 'brent' (hosted on vermelha). We might need to consider whether we want "match='exact'" wrt migrations.
<cpu mode='host-model' match='exact'>
<model fallback='allow'>IvyBridge</model>
<vendor>Intel</vendor>
<feature policy='require' name='pcid' />
</cpu>
      • Update: looked at this. We should be safe to set CPU model to host-model on clusters where the CPU is identical across the cluster (KB and AT). However we can't where the CPU's aren't identical (IF) - here we should be able to set a base minimum machine (SandyBridge ?). We'd need to check that migration works. Recent versions of virsh allow you to specify the hosts in the cluster and ask for a CPU model description which will work across all the cluster. Setting the base minimum to SandyBridge on 'oyster' fixed one of the Spectre flaws, but not all. It looks like we need a more up-to-date qemu-kvm to fix all the remaining flaws. * Wait until 7.6ish is settled re KVM software versions and try above again * Need to disable hyperthreading on all KVM servers
    • Move IBM disk array to B.03 and mark as junk
    • Produce some notes from OSS
    • Read George's mail of 8th November wrt DPIA
    • Try latest VDICE on Windows 10 machine at home (research guest login delays)
    • Review the three encryption computing.help pages
    • Produce an Legitimate Interest Declaration and Privacy Statement for tartarus
      • consider what can be removed once a user has left the University

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • Look at MPUActivitiesList
    • Look at RT
    • Continue work on SL7 coordination final project report (currently pending other units completing)
    • User training materials project #403
    • Continue with RT ticket clearout as discussed in October
    • Chris initial install of XRDP servers (one at AT, one at Forum)
      • disable hyperthreading
    • Produce a 'guest only' version of Virtual DICE

  • Stephen
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Produce some text for systemd mount bug (to submit to RH)
    • Take issue of disable per user journald logs on certain servers to OPS
    • Consider PD work for after LCFG client ...
      • looking at Ceph
    • Look at where we're using ALL in access.conf
    • Finish off NX replacement project (#389)
    • Continue with RT ticket clearout as discussed in October
    • Read George's mail of 8th November wrt DPIA
    • Firmware update - deneb and steen
    • Reboot staff.ssh (hare)
    • Complete tartarus clientreport module errors report
    • Update Pandemic pages - Security, LCFG
    • Add a 'df' module to clientreport
    • Move afsbuild server (juice) from Forum to AT
    • Produce report based on clientreport 'old locks'
    • Report on data access problem (need a nickname)
    • Discuss how deploy two general purpose XRDP servers (with LCFG community)
    • Produce an Legitimate Interest Declaration and Privacy Statement for svn history and LCFG profile history
    • Increase standard desktop root partition to 100GB
    • Manage change to systemd.defaultstdout being journal
      • including raising at LCFG deployer meeting
    • Produce quick and dirty lab report web page on LCFG master (instead of email)

-- AlastairScobie - 06 Mar 2019

Topic revision: r5 - 23 Sep 2019 - 13:33:42 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies