MPU Meeting Wednesday 17th October 2018

Inventory

The live tartarus service has been moved to the Forum, this was a useful test of the process of restoring from backup. At the same time Alastair took the opportunity to tidy the LCFG profile and headers. Alastair noted that the x509 component does not show an error when it fails to create the certificate files, he will log a bug.

The handling of the modify flag for orphaned entries has been fixed.

Alastair has written some more reports to replace those from the old inventory along with some useful scripts for fixing problems.

We decided that all clientreport information which is older than 3 months should be culled along with any that doesn't apply to DICE systems (most likely because a machine has been switched to self-managed). This avoids holding onto out-of-date/irrelevant data.

Need to decide what should still be done before the project is completed.

Virtual Desktop

We continued in our efforts to produce a stable service for undergraduates.

Stephen applied the polkit policy to block suspend/hibernate so that these options are not displayed for users (they didn't work).

Misc Development

LCFG core
Stephen has been working on the LCFG core in an attempt to make it faster for components with huge numbers of resources ( ~50000). He has implemented a number of small changes for the reading of resources from Berkeley DB and status files. He has also sped-up the process of finding the differences between two profiles. These changes are being tried on the console servers. He has also started work on a more substantial reworking of the data structures used to hold the resources for a component. That will change the approach from using a single-linked list over to using a hash.

lateupdates report
Stephen has continued to improve the usefulness of the lateupdates report.

Operational

drupal
Alastair cleared out the drupal users list using a drush script. He also fixed the issue with the wrong email domain being applied for new users. He encountered a bug in the cosign module which is related to a change in the behaviour of PHP. This is only seen by users on first login. Still need to ship the patched version.

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Start work on final report!
      • Consider what else needs done other than docs and tidying and backups
      • Blog something....take dev meeting talks
      • and give details on how Tartarus tables are accessed to Ian D for inclusion in his privileged access discussion paper
      • Look at postgresql replication (do after shipping)
      • Add tartarus info to SwitchToSelfManaged
      • Complete removal of non authenticated access to API and web
      • Need tests for API /orders and need new tests to check for correct authorisation
      • Make lcfg header generation live (need to check what will be deleted when we do this - big discrepancy between old inventory and new)
      • Look at user support form - how does that lookup hostname?
      • Produce a python library to provide people with a programmatic equivalent of ii query
      • Look at whether there is an easy library way for Chris to grab the macaddr of a machine given the hostname
    • Schedule MPU meeting to discuss systemd ordering
    • Take a look at RT #78875
    • Look at /etc/hosts - dns issue (IPV6?)
      • work out what we need to fix current problem
    • Circulate info on RH7.3 systemd changes we may wish to consider
    • RT actions (as agreed)
    • Implement change to kvmtool to allow KVMs to be marked as disabled
      • looked at this - looks like the metadata tag isn't passed through libvirt (prior to 4.0.0), so can't be read/written by kvmtool
      • put on activities list to do once upgrade to libvirt-4.0.0
    • Look at Stephen's 'Thoughts on shell components'
    • Look at MPUActivitiesList
    • Start looking at https and computing.help (remove assumption that https means want cosign login)
      • wait on Neil's efforts with EdWeb
    • Investigate systemd reboot bug on gaivota and add some more debugging (store tree diff somewhere)
    • drupal username collection re GDPR
      • configure live server to run the user expiry script
      • Fixup email domains for existing accounts and check fix for domain setting to inf.ed.ac.uk is in place on live service
      • need to ship fixed cosign module on live service
    • Inventory stuff re GDPR
    • Check with Tim / George about capability for login to student machines - where are we
      • Tim says that we should create a capability that is given to the base cohort and set that capability to no-grace
    • Useful? - a script which checks how fast a machine's console log is growing (eg huge number of dbus problems on hammersmith)
      • suggest to Ian D
    • Blog on projects
    • KVM pcid
      • Created MPUSpectreMeltdown
      • Put detection script somewhere for people to use
      • Which CPU is needed for each group..
Following config worked on 'brent' (hosted on vermelha). We might need to consider whether we want "match='exact'" wrt migrations.
<cpu mode='host-model' match='exact'>
<model fallback='allow'>IvyBridge</model>
<vendor>Intel</vendor>
<feature policy='require' name='pcid' />
</cpu>

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • Look at MPUActivitiesList
    • Look at RT
    • Continue work on SL7 coordination final project report (currently pending other units completing)
    • User training materials project #403
    • fix up perl-AFS-command in package lists
    • Continue with RT ticket clearout as discussed in October
    • Document the existence of the new pools on gaivota and girassol

  • Stephen
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Produce some text for systemd mount bug (to submit to RH)
    • Take issue of disable per user journald logs on certain servers to OPS
    • Consider PD work for after LCFG client ...
      • looking at Ceph
    • Look at MPUActivitiesList
    • Look at where we're using ALL in access.conf
    • Finish off NX replacement project (#389)
      • produce final report
    • Continue with RT ticket clearout as discussed in October
    • Produce plan for upgrading Forum KVM servers to SL7.5 (Stephen and Alastair to do)

-- AlastairScobie - 17 Oct 2018

Topic revision: r7 - 23 Sep 2019 - 13:33:41 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies