MPU Meeting Wednesday 21st February 2018

Inventory

With GDPR in mind, should we be making it possible for anyone to find out about machine allocations and purchase histories? Probably not. Alastair has therefore been looking into getting rid of the unauthenticated open interface. It had been kept because the authentication made a query about 50% slower - but we can live with that.

The authorisation config for the API was hard coded but now Tartarus loads its configuration from a file, using Catalyst's Config Loader.

User Security Training

Nothing this week.

Virtual Desktop

Stephen has been writing up his work on this at https://blog.inf.ed.ac.uk/squinney/tag/xrdp/.

Chris has written user documentation for several platforms and will add a page for Windows 10 users once he gets Windows 10 up and running.

Miscellaneous Development

  • The .so libraries in the LCFG Core packages had hard-wired version strings, which was misleading. Stephen has fixed things so that CMake gives the libraries version numbers identical to the package version. He also changed the docs package to noarch so that it can be installed anywhere. This means it can be installed on the website and the docs browsed over HTTP.
  • The lab check script is now mailing its results to the MP unit.
  • rfe now has an extra ACL so that a local user can read lcfg files without their being granted sysman capabilities.

Operational

  • lcfg/jubilee has been tidied up. lcfg/hammersmith will be tidied in conjunction with RT:87475 and the regularisation of the printing configuration on login servers.
  • The new LCFG client is now on all MPU machines - except for the LCFG slaves whose component versions script currently uses the old API. It'll be ported to the new one. The slaves are individually pinned to the v3 client in their profiles.
  • Chris has moved two "600GB" disks from metropolitan to circle and has configured them as an additional storage pool for VMs. However "circlepool2" cannot yet be used because circle is testing the latest libvirt (version 4.0.0) and kvmtool doesn't yet work with this. Alastair has been trying to make a matching version 4.0.0 perl-Sys-Virt package and will look at the kvmtool / libvirt 4.0.0 problem.
  • The Tartarus IPv6 changes are now permanent.
  • sysmans nograce <--- sorry, minute-taking failure; no idea what this was about
  • Alastair has been looking into how one might disable a VM using kvmtool. At the moment it's tricky (you can add the necessary to the XML using virsh or kvmtool edit but you can't get the metadata) but it'll be easy once we're on libvirt 4.0.0.
  • Alastair has added a LCFG_SYSTEMD_MAXRETENTIONSEC macro which sets the maximum retention period for journald. For DICE it defaults to one month. Also, systemd's journald SplitMode will be the default unless LCFG_SYSTEMD_DISABLE_SPLITMODE is defined.
  • The configuration of the computing.help servers is now compatible with IPv6.
  • amarela and vermelha seem to have been fine with IPv6 so we can now enable it on the other KVM servers too.
  • computing.help is using the latest Drupal 7, which came out in June last year. We're wondering when Drupal 8 will be out and when there will be a Drupal 8 EdWeb available. (After the meeting a new version of Drupal 7 was announced.) Chris will make a project to port computing.help to Drupal 8, or preferably EdWeb Drupal 8.
  • MPUManualEntitlements lists services which depend on the manual configuration of entitlements.
  • Alastair has been looking into whether our KVM guests need the PCID kernel feature in order to perform adequately while operating with Meltdown mitigation microcode, and how to give them that feature if so. He has devised snippets of XML config which seem to do the job. It's suggested that the INVPCID kernel feature should also be used if available. All of our production KVM servers support PCID but only three also support INVPCID. Alastair will carry on looking for solutions. Another question he's looking at is whether our VMs' VCPUs should all be configured as virtual IvyBridge processors, or something else. MPUSpectreMeltdown collects together our current wisdom on these problems.

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Start work on final report!
      • Consider what else needs done other than docs and tidying and backups
      • Blog something....take dev meeting talks
      • and give details on how Tartarus tables are accessed to Ian D for inclusion in his privileged access discussion paper
      • Look at postgresql replication (do after shipping)
      • Add tartarus info to SwitchToSelfManaged
      • Complete removal of non authenticated access to API and web
      • Need tests for API /orders and need new tests to check for correct authorisation
    • Schedule MPU meeting to discuss systemd ordering
    • Take a look at RT #78875
    • Look at /etc/hosts - dns issue (IPV6?)
      • work out what we need to fix current problem
    • Circulate info on RH7.3 systemd changes we may wish to consider
    • RT actions (as agreed)
    • Implement change to kvmtool to allow KVMs to be marked as disabled
      • looked at this - looks like the metadata tag isn't passed through libvirt (prior to 4.0.0), so can't be read/written by kvmtool
      • put on activities list to do once upgrade to libvirt-4.0.0
    • Look at Stephen's 'Thoughts on shell components'
    • Look at MPUActivitiesList
    • Start looking at https and computing.help (remove assumption that https means want cosign login)
      • wait on Neil's efforts with EdWeb
    • Chase Alison about LCFG check monitoring ( start doing again )
    • Investigate systemd reboot bug on gaivota and add some more debugging (store tree diff somewhere)
    • Report on this at next ops meeting that have changed journald configuration (MPU report)
    • Discuss with Neil - drupal username collection re GDPR
    • Inventory stuff re GDPR
    • Check with Tim / George about capability for login to student machines - where are we
    • Add %slaac to hulp and lagun after 21/02/18
    • Useful? - a script which checks how fast a machine's console log is growing (eg huge number of dbus problems on hammersmith)
      • suggest to Ian D
    • Blog on projects
    • KVM pcid
      • Created MPUSpectreMeltdown
      • Put detection script somewhere for people to use
      • Which CPU is needed for each group..
Following config worked on 'brent' (hosted on vermelha). We might need to consider whether we want "match='exact'" wrt migrations.
<cpu mode='host-model' match='exact'>
<model fallback='allow'>IvyBridge</model>
<vendor>Intel</vendor>
<feature policy='require' name='pcid' />
</cpu>
    • Look at why kvmtool doesn't work on circle (running libvirt 4.0.0)

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • Look at MPUActivitiesList
    • Look at RT
    • Continue work on SL7 coordination final project report (currently pending other units completing)
    • libvirt - test for memory leaks (wrt console servers) Ian will test it for memory leaks after the 17 January stable release
    • User training materials project #403
    • Blog on projects
    • Check whether Neil's printing fix (as was on hammersmith/jubilee) is applied generally (eg, so will be available on xrdp ) The fix was just to provide a clean way of disabling printing - which we don't want to do on these machines. Currently printing seems fine on Forum-based servers but not on Tower-based ones (RT:87985).
    • Produce project proposal to upgrade computing.help to Drupal 8 (hopefully using EdWeb)
  • Stephen
    • LCFG client refactor stage 2
      • Bring LCFG v4 client project to closure
    • RT actions (as agreed)
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Produce some text for systemd mount bug (to submit to RH)
    • Take issue of disable per user journald logs on certain servers to OPS
    • Consider PD work for after LCFG client ...
      • looking at Ceph
    • Look at MPUActivitiesList
    • On metropolitan, find fast baud rate we can drive the real physical consoles. (This so we can decide whether to use physical consoles for KVM servers).
    • Look at where we're using ALL in access.conf
    • If in Forum server room, review MPU rack usage
    • Agree with RAT how software package requests are handled - waiting on Graham documenting
    • Start off NX replacement project (#389)
      • Complete Documentation
      • Introduce test service for staff users on metropolitan
    • Upgrading MPU servers to 7.4
      • NX servers - jubilee (and move to SOL)
    • Decommission DL180s in AT previously used Ceph testing
    • Check whether websites are still using Allow/Deny configuration
      • Check individual .htaccess files
    • Look at LCFG entitlements - SVN, rfe

-- AlastairScobie - 21 Feb 2018

Topic revision: r10 - 23 Sep 2019 - 13:33:41 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies