MPU Meeting Thursday 27th September 2018

Inventory

IS appears to have better data on HP computers than we have. Alastair plans to speak to someone in IS to find out why.

Stephen added a clientreport module which checks for user crontabs in /var/spool/cron. Nothing alarming was found. However, this made us realise that we could do with a report on crontab files owned by people who have left.

Stephen has enhanced the updaterpms late updates report. Now in addition to the delay since the last successful run, it also mentions error and warning states. It's linked from https://tartarus.inf.ed.ac.uk. It will be further tweaked so that host names in the report link to their client report.

Virtual Desktop

NX has been retired. Both remote desktop servers now offer XRDP. Stephen has started writing the technical documentation for this service.

Stephen has increased the permitted number of threads so that thread-heavy applications such as firefox can run successfully.

Some people are still getting a US keyboard layout on login.

SL7.5

It turns out that when a major R upgrade appears, every single R module has to be rebuilt - the modules in different major releases are not binary compatible. Thankfully package forge made this job far easier than it might otherwise have been - Stephen just threw the modules at it and almost all of them built automatically, over a period of twelve hours or so.

Stephen has written the final report for this project.

At the last Operational meeting it was agreed that any remaining 7.4 machines would be upgraded automatically to 7.5 at the end of October, by removal of their DICE_STICK_WITH_SL74 macros. This would exclude particular known problem cases such as the console servers and the Forum-based KVM servers, if these were still problems at that time.

Misc Development

critical-shutdown
The console servers are now cacheing inventory location data. This will help the critical-shutdown script to keep operating even when it can't contact tartarus.
logrotate configuration
Thanks to Neil for pointing out a problem with the logrotate configuration. Stephen has now reworked the ngeneric logrotate template so that it tries even harder to avoid duplicate definitions - since logrotate boaks when it encounters these. It now merges then uniqs its configuration.
cron component
Stephen noticed that when an account was added to cron.allow all other non-root accounts were blocked from owning cron jobs. This is a feature of cron but it's unhelpful in the context of an LCFG component, so he has modified the component so that in these circumstances it also add the owners of all other cron jobs. Similarly for cron.deny. We're working towards having a default deny cron policy on certain servers.

Operational

  • Chris has upgraded the OS and firmware of the MPU KVM servers at Appleton Tower and King's Buildings. Alastair and Stephen will do the same for those in the Informatics Forum. Apart from these we have very few MPU servers still left on 7.4 - just tartarus and computing.help.
  • Stephen reported that getting the LCFG config right for the HP G3 desktop mini is proving tricky because although most of them are equipped with NVME, some aren't and have spinning discs.
  • The G4 desktop mini and the G4 desktop also need support.
  • Stephen suggests (and will create) a page on the LCFG wiki for new models, on where we can collect technical information about them - BIOS settings we've had to change, that kind of thing.
  • The G4 will be available imminently and is expected to default to 16GB of memory. (This was our default amount of memory already for desktops in Informatics.)
  • Some new CDT students have been bought Lenovo P320 computers. This has been a problem for us because their hardware differs from the Lenovo P310 models we've seen hitherto - for instance the graphics card outputs are all mini displayport, and we had no cables of this type. Computing staff weren't involved the purchasing of these machines; we need to ensure that we will be consulted about such purchases in future.
  • The facebook server is still not working. This has been taking a great deal of Stephen's time.
  • There are security updates for OpenAFS.
  • The kernel security update goes out tonight.
  • Chris pointed out that the packages server needs an experimental version of the perl-AFS-Command package. Stephen suggested that Chris tell Toby that his experimental new version is fine and to ask him to submit it to the lcfg bucket.

Next Week

RT tickets!

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Start work on final report!
      • Consider what else needs done other than docs and tidying and backups
      • Blog something....take dev meeting talks
      • and give details on how Tartarus tables are accessed to Ian D for inclusion in his privileged access discussion paper
      • Look at postgresql replication (do after shipping)
      • Add tartarus info to SwitchToSelfManaged
      • Complete removal of non authenticated access to API and web
      • Need tests for API /orders and need new tests to check for correct authorisation
      • Make lcfg header generation live (need to check what will be deleted when we do this - big discrepancy between old inventory and new)
      • Look at user support form - how does that lookup hostname?
      • Produce a python library to provide people with a programmatic equivalent of ii query
      • Look at whether there is an easy library way for Chris to grab the macaddr of a machine given the hostname
    • Schedule MPU meeting to discuss systemd ordering
    • Take a look at RT #78875
    • Look at /etc/hosts - dns issue (IPV6?)
      • work out what we need to fix current problem
    • Circulate info on RH7.3 systemd changes we may wish to consider
    • RT actions (as agreed)
    • Implement change to kvmtool to allow KVMs to be marked as disabled
      • looked at this - looks like the metadata tag isn't passed through libvirt (prior to 4.0.0), so can't be read/written by kvmtool
      • put on activities list to do once upgrade to libvirt-4.0.0
    • Look at Stephen's 'Thoughts on shell components'
    • Look at MPUActivitiesList
    • Start looking at https and computing.help (remove assumption that https means want cosign login)
      • wait on Neil's efforts with EdWeb
    • Chase Alison about LCFG check monitoring ( start doing again )
    • Investigate systemd reboot bug on gaivota and add some more debugging (store tree diff somewhere)
    • Report on this at next ops meeting that have changed journald configuration (MPU report)
    • Discuss with Neil - drupal username collection re GDPR
      • write a script to remove users who haven't used computing.help in, say 90 days (except COs) -
        • Proposed solution - script which reads SQL to decide which accounts to remove -> using 'drush user_cancel' to do the account removal. This removes the account quietly (no expiry email to user) and changes ownership of any content to 'anonymous'
          • script written. waiting for stable release of 10th October to install drush on live server so can run script on live server
      • fix the email address issue (currently defaults to umich.edu) Just required an undocumented variable configured appropriately. Will need to go back and change existing accounts.
      • Fix bug in cosign.module where it complains "Creating default object from empty value in cosign_route() (line 182".
        • need to ship fixed cosign module on live service
    • Inventory stuff re GDPR
    • Check with Tim / George about capability for login to student machines - where are we
      • Tim says that we should create a capability that is given to the base cohort and set that capability to no-grace
    • Add %slaac to hulp and lagun after 21/02/18
      • Done - need to check no ill effect
    • Useful? - a script which checks how fast a machine's console log is growing (eg huge number of dbus problems on hammersmith)
      • suggest to Ian D
    • Blog on projects
    • KVM pcid
      • Created MPUSpectreMeltdown
      • Put detection script somewhere for people to use
      • Which CPU is needed for each group..
Following config worked on 'brent' (hosted on vermelha). We might need to consider whether we want "match='exact'" wrt migrations.
<cpu mode='host-model' match='exact'>
<model fallback='allow'>IvyBridge</model>
<vendor>Intel</vendor>
<feature policy='require' name='pcid' />
</cpu>
    • Look at why kvmtool doesn't work on circle (running libvirt 4.0.0)
      • oyster now running libvirt-4.0.0, but can't get kvmtool to misbehave
    • Read and comment on Stephen's notes on the LCFG security project
      • Don't understand why second kinit fails (and is allowed to fail)
    • Remove IBM disk array from stack
    • Read Chris's blog on ThoughtsOn403
      • unpublished?
    • Look at moving stuff from the immediate todo back to the main Todo list and then we can prioritise that list
    • Look through the entitlements / no grace period issue
      • look through access.conf and work out how the entitlements are constructed
    • Move tartarus server to Forum/AT from KB
    • Upgrade live versions of computing.help and tartarus to SL7.5
      • scheduled for Sunday 30th evening
    • Order up an HP 800 G4 SFF once available

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • Look at MPUActivitiesList
    • Look at RT
    • Continue work on SL7 coordination final project report (currently pending other units completing)
    • User training materials project #403
    • Complete SL7.5 Virtual DICE
    • Ask Toby to move afs-command from develop bucket to a stable bucket so stable machines can make use of

  • Stephen
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Produce some text for systemd mount bug (to submit to RH)
    • Take issue of disable per user journald logs on certain servers to OPS
    • Consider PD work for after LCFG client ...
      • looking at Ceph
    • Look at MPUActivitiesList
    • Look at where we're using ALL in access.conf
    • Finish off NX replacement project (#389)
      • produce final report
    • Modify tartarus lateupdates report so that hostnames are shown as links to their clientreport
    • Produce plan for upgrading Forum KVM servers to SL7.5 (Stephen and Alastair to do)

-- AlastairScobie - 27 Sep 2018

Topic revision: r14 - 23 Sep 2019 - 13:33:41 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies