MPU Meeting Tuesday 30th October 2018

Inventory

The main task before project completion will be to finish the documentation. Alastair will also organise a meeting with the front line computing support team. In particular it should cover best practices and should iron out misunderstandings. Chris has to finish his firmware reports, too.

Virtual Desktop

There have been problems with the new remote desktop service, at least on xrdp.inf.ed.ac.uk.
  1. It has often not been possible to start a new session. There's a bug: when old sessions are expired, not all of their session files are deleted, and this stops the daemon from re-using the session slot for a new user. Stephen has made a script which deletes these neglected files, freeing up expired sessions for re-use.
  2. Inexperienced students have been running very demanding software practicals directly on xrdp.inf.ed.ac.uk instead of on a more appropriate machine such as student.compute.inf.ed.ac.uk. This has kept most of the machine's CPU cores fully occupied and thus greatly slowed down its response time to interactive sessions.
    • To counteract this we're experimenting with applying resource control measures. We've had some success imposing maximum resource quotas on user sessions using systemd control groups. We're going to look at how these might be imposed on all user sessions automatically, for example using PAM.
    • We'll also tackle the user education angle - for instance we'll look into displaying motd-type messages to those starting or connecting to remote desktop sessions.
    • We hope to buy new hardware soon.
    • Once we have new hardware, we may reserve the current machine for distance learning students and the new hardware for other students.
  3. Apart from these specific problems, the current xrdp.inf hardware is six years old and already retired from the role it was bought for, and while still reasonably specified, it's not particularly powerful. It was always intended to be only a temporary xrdp.inf until more capable hardware was purchased.

Misc Development

Prompted by the difficulties encountered on the console servers, Stephen has comprehensively gone through the LCFG Core code to make it faster. A component's resources are now stored in a hash rather than in a linked list. When a component has 30,000 to 40,000 resources, this results in far faster processing - in the worst case the code is now eighty times faster (!). The next task will be to test the code on a console server VM and assess its performance there. If that goes well, the code will be deployed on all machines on the develop release.

Operational

  • Stephen will switch the remaining SL 7.4 machines to 7.5 on Monday via the usual removal of the DICE_STICK_WITH_SL74 macro from their profiles.
  • Chris will help along the MPU's own 7.4 upgrades by tackling one of the three remaining KVM servers next week.
  • Must resolve more RT tickets!
  • Stephen will find a suitable way to run the sceptre/meltdown susceptibility script on all of our hosts, with the aim of producing a basic boolean result for each host.
  • Slides are available from OSS Europe 2018. Alastair and Stephen's favourite talks included Effective Virtual CPU Configuration with QEMU and libvirt; other libvirt-related talks; the Kernel Report; nemu (the replacement for qemu).
  • Perhaps we should find some way to use the standard Linux kernel rather than Red Hat's offering? One to ponder.

Next meeting

Wednesday 14th November at 2:15pm.

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Start work on final report!
      • Consider what else needs done other than docs and tidying and backups
      • Blog something....take dev meeting talks
      • and give details on how Tartarus tables are accessed to Ian D for inclusion in his privileged access discussion paper
      • Look at postgresql replication (do after shipping)
      • Add tartarus info to SwitchToSelfManaged
      • Complete removal of non authenticated access to API and web
      • Need tests for API /orders and need new tests to check for correct authorisation
      • Make lcfg header generation live (need to check what will be deleted when we do this - big discrepancy between old inventory and new)
      • Look at user support form - how does that lookup hostname?
      • Produce a python library to provide people with a programmatic equivalent of ii query
      • Look at whether there is an easy library way for Chris to grab the macaddr of a machine given the hostname
    • Schedule MPU meeting to discuss systemd ordering
    • Take a look at RT #78875
    • Look at /etc/hosts - dns issue (IPV6?)
      • work out what we need to fix current problem
    • Circulate info on RH7.3 systemd changes we may wish to consider
    • RT actions (as agreed)
    • Implement change to kvmtool to allow KVMs to be marked as disabled
      • looked at this - looks like the metadata tag isn't passed through libvirt (prior to 4.0.0), so can't be read/written by kvmtool
      • put on activities list to do once upgrade to libvirt-4.0.0
    • Look at Stephen's 'Thoughts on shell components'
    • Look at MPUActivitiesList
    • Start looking at https and computing.help (remove assumption that https means want cosign login)
      • wait on Neil's efforts with EdWeb
    • Investigate systemd reboot bug on gaivota and add some more debugging (store tree diff somewhere)
    • drupal username collection re GDPR
      • configure live server to run the user expiry script
      • Fixup email domains for existing accounts and check fix for domain setting to inf.ed.ac.uk is in place on live service
      • need to ship fixed cosign module on live service
    • Inventory stuff re GDPR
    • Check with Tim / George about capability for login to student machines - where are we
      • Tim says that we should create a capability that is given to the base cohort and set that capability to no-grace
    • Useful? - a script which checks how fast a machine's console log is growing (eg huge number of dbus problems on hammersmith)
      • suggest to Ian D
    • Blog on projects
    • KVM pcid
      • Created MPUSpectreMeltdown
      • Put detection script somewhere for people to use
      • Which CPU is needed for each group..
Following config worked on 'brent' (hosted on vermelha). We might need to consider whether we want "match='exact'" wrt migrations.
<cpu mode='host-model' match='exact'>
<model fallback='allow'>IvyBridge</model>
<vendor>Intel</vendor>
<feature policy='require' name='pcid' />
</cpu>

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • Look at MPUActivitiesList
    • Look at RT
    • Continue work on SL7 coordination final project report (currently pending other units completing)
    • User training materials project #403
    • Continue with RT ticket clearout as discussed in October
    • Upgrade one of the Forum KVM servers to SL7.5
    • Investigate whether we can add a login banner for the XRDP servers (for message of the day)

  • Stephen
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Produce some text for systemd mount bug (to submit to RH)
    • Take issue of disable per user journald logs on certain servers to OPS
    • Consider PD work for after LCFG client ...
      • looking at Ceph
    • Look at MPUActivitiesList
    • Look at where we're using ALL in access.conf
    • Finish off NX replacement project (#389)
      • produce final report
    • Continue with RT ticket clearout as discussed in October
    • Produce plan for upgrading Forum KVM servers to SL7.5 (Stephen and Alastair to do)
    • Create tartarus client report module for spectre/meltdown (or modify 'os' module)

-- AlastairScobie - 30 Oct 2018


This topic: DICE > WebHome > ManagedPlatformUnit > MPUnitMeetings > MPunitMeeting20181030
Topic revision: r7 - 23 Sep 2019 - 13:33:41 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies