MPU Meeting Wednesday 12th June 2019

Inventory

Nothing to report.

LCFG Profile Security

Machines on the develop release are now getting their profiles via https. There's no authentication yet - that'll be added later.

Next week (the week of Monday 27th June) we'll be upgrading DICE desktops to SL 7.6. The week after that we'll switch those desktops to getting their profiles via https.

After that Stephen will look into introducing GSSAPI authentication as well.

SL7.6 Update

Final testing of 7.6 will be done on Monday on student lab machines, before rolling it out to all DICE desktops.

There was a problem with the LCFG client upgrades - some machines were ending up with a non-functional LCFG client. The lcfg-client package uses RPM triggers, whereby it can restart the client component whenever certain other packages are changed by RPM. It had been assumed that these triggers would run once at the end of a batch of RPM upgrades, but it turns out that they're executed immediately after each relevant package upgrade. With smaller upgrades this is fine, but when the entire LCFG client is upgraded, the immediacy of the triggers results in the client being restarted by the upgrade of a library, before the client code itself has been upgraded - and since the old client code can't work with the new library, it's left in a broken state, which persists with subsequent trigger calls. Stephen has fixed this for now by ensuring that the trigger stops the client, kills any rdxprof processes that still exist, then starts the client again.

Alternate Desktop Platform

The next big part of this project is package management. Stephen has created a component to manage apt. This just configures repositories - it doesn't do the package management. It operates in either of two modes: you can use it simply to add repositories or to impose a complete repository configuration.

There will be a separate component or tool to actually update the deb packages. Like the apt component it will operate in either of two modes: either it'll just add packages which aren't already present, or it'll impose a given set of packages on the machine (like updaterpms does), removing any installed packages which aren't in the list.

Stephen has also made Ubuntu os headers and macros, and repository definitions.

Miscellaneous Development

Stephen has changed the lcfg-ngeneric schema to add a new ng_umask resource to specify a default umask to be used when a component is called. At the moment it defaults to nothing, meaning no change of umask.

The lcfg-ngeneric schema also has a new ng_tmpldir resource. This specifies a list of directories which the template processor will search for templates. Since Template Toolkit templates can include other templates, our templates had been full of OS conditionals - but the new resource means that these conditionals can be removed from the templates, simplifying them considerably. It also means that templates, or template fragments, can now be more widely applicable, and can be shared between components.

Chris has tried out KSM, kernel same-page sharing, on our test KVM server. It might be suitable for the server running the Hadoop cluster VMs?

Operational

Stephen has been looking into BIOS cloning of the latest G4 desktop model. Some interesting points:
  • This BIOS seems to be far less buggy than previous ones.
  • When cloning BIOS settings you have to be careful to only clone a cut-down set of essential settings, rather than all of them - otherwise you risk overwriting things like the asset tag, which is unique to each machine.
  • He's done the G4 SFF and the G4 SSD SFF. This means that he's had the chance to compare the performance of a G4 with and without SSD - and it turns out that an SSD makes a big difference to the performance of AFS. In normal use, AFS file access is about ten times as fast when the cache is on SSD.
  • He'll do the G4 Tower next.

This round of KVM server upgrades is now complete. However we've been left with some non-functioning serial consoles, and some machines which still need hyperthreading disabled, so we'll be doing more downtime soon. On the subject of the non-functioning serial consoles, there are two schools of thought. One speculates that a BIOS or firmware update has cleared one of the configuration settings needed by our IPMI serial consoles system. The other says that when this has been seen on other servers, a power-cycle (one triggered via IPMI is sufficient) clears the problem. Either of these means more KVM downtime soon.

We're expecting the replacements for the KB-based KVM servers to arrive any week now.

Stephen and Graham have been looking into eliminating any ssh access capability from the staff role. The necessary changes have been identified and will be implemented after the end of the exam board.

This Week

  • Alastair
    • Inventory project
      • Documentation - end user
      • Documentation - code
        • clientreport (eg how to add modules)
        • order sync code
        • HPreport processing script
        • link in from MPU top page
      • Start work on final report!
      • Provide details on how Tartarus tables are accessed to Ian D for inclusion in his privileged access discussion paper
      • Add requirement to computing.help project stuff to reimplement new computing help form using REST API
      • Produce an Legitimate Interest Declaration and Privacy Statement
        • records machine to user allocation (with their UUN, cname, sname, user category)
        • records who requests which order (usually just uun, but can be cname+sname)
        • records who makes a change in inventory (just uun)
        • consider what can be removed once a user has left the University
          • any rows in the 'person' table where 'upstream' is false and where there isn't an 'item' row with a matching 'allocated_to' field should be deleted by a periodic script. Arguably 'category' should be set to NULL where 'upstream' is false?
      • Decommission ordershost
        • need to replicate kvmreport mechanism on Tartarus (or somewhere)
          • submit data via clientreport mechanism
        • take snapshot of files (no need to take snapshot of SQL as this is automatically recreated from orders files)
        • power off for 3 months prior to deleting to see if anything breaks
      • Document Tim's theon old inv snapshot and what its purpose now is. Also modify invquery to remark that data is historical only.
    • Take a look at RT #78875
      • WON'T LOOK UNLESS A BIG ISSUE (Ask Tom)
    • Look at Stephen's 'Thoughts on shell components'
    • Investigate systemd reboot bug on gaivota and add some more debugging (store tree diff somewhere)
    • drupal username collection re GDPR
      • Perioidically run user expiry script every month until August 2019 and if no problems configure to run automatically
    • Check with Tim / George about capability for login to student machines - where are we
      • Tim says that we should create a capability that is given to the base cohort and set that capability to no-grace
    • Meet Tim with Chris to review RAT involvement
    • Look at what needs ticked off for XRDP project to close
    • Look at using php-5.6 on computing.help
    • Think about a separate XRDP server for Distance learning students

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • Look at MPUActivitiesList
    • Look at RT
    • Continue work on SL7 coordination final project report
    • User training materials project #403
    • Produce a 'guest only' version of Virtual DICE based on SL7.6
    • With Stephen remove hammersmith
    • Meet Tim with Alastair to review RAT involvement
    • Investigate whether small Virtual DICE image is sufficient for 1st and 2nd year teaching
      • Check whether yum is configured to use all our repositories
      • investigate whether we could use yum groups to install additional software for each class (at least for the big classes)

  • Stephen
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Produce some text for systemd mount bug (to submit to RH)
    • Take issue of disable per user journald logs on certain servers to OPS
    • Consider PD work for after LCFG client ...
      • looking at Ceph
    • Look at where we're using ALL in access.conf
    • Finish off NX replacement project (#389)
    • Continue with RT ticket clearout as discussed in October
    • Read George's mail of 8th November wrt DPIA
    • Firmware update - steen
    • Reboot staff.ssh (hare)
    • clientreport
      • Complete module errors report
      • Add an 'old locks' report
    • Move afsbuild server (juice) from Forum to AT
    • Produce an Legitimate Interest Declaration and Privacy Statement for svn history and LCFG profile history
    • With Chris remove hammersmith
    • SL7.6 blog article and details of differences + announce Python 3 upgrades
-- AlastairScobie - 12 Jun 2019
Topic revision: r4 - 23 Sep 2019 - 13:33:42 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies