MPU Meeting Wednesday 24th July 2019

Inventory

  • The tartarus role field is now populated, via clientreport, from the LCFG resources profile.group and profile.comment concatenated together.
  • Next, the plan is to populate the manager field from LCFG's sysinfo.manager, again via clientreport.

LCFG Profile Security

Once the 29 July testing release has been made, 7.6 machines will be set to retrieve their LCFG profiles via HTTPS.

There was an issue with the Nagios servers not being able to get machine profiles via GSSPI authentication, but that's now been solved, so the next stage will be to turn on authentication for the LCFG client.

SL7.6 Update

A few package conflicts have been resolved.

Alternate Desktop Platform

Lots of file paths in Ubuntu are slightly different to their equivalents in RH/SL, so many manual changes were made to configuration files. These changes for Ubuntu have now been merged into the appropriate LCFG headers.

The package building service is working well.

A new ngeneric resource ng_service holds the name of the service being managed (since this often differs slightly between platforms - for example the Ubuntu equivalent to SL's sshd systemd service is ssh). Having the package name in an invariant place across platforms makes it easier to, for instance, code a restart of a service.

There's now an actual packages service for Ubuntu. It's set up so that apt will be able to fetch local packages from it. The HTTPS version of the service will use Let's Encrypt certificates, so that Ubuntu will trust the service right away, without having to have the University of Edinburgh root certificate installed.

apt prefers you to sign metadata. Stephen has been setting up the GPG signing to be automatic.

Miscellaneous Development

Stephen fixed Bug:1146 in the dconf component, so that it now creates the directories it needs if they don't already exist.

Operational

Richard Tobin's new server seems to be working properly. Its encryption setup isn't entirely automatic, but the machine's profile has instructions for us in comments. It uses a USB serial adapter, and this took a while to get working with DICE.

Alastair had some feedback from his work powering down and up the MPU machines for the great Saturday shutdown of the Forum server rooms for UPS replacement:

  • Chris's checklists mostly worked well.
  • One notable surprise was the length of time the KVM servers took to suspend a full load of VMs: girassol took ~30 minutes. (This made us realise that whenever we've suspended girassol previously we've already migrated off a good proportion of its VMs, leaving far fewer to be suspended.)
  • All three of the MPU's KVM servers in the Forum used the same switches (sr02 and sr03). This seems like an unnecessary point of failure. Could they be connected to a variety of different switches please?
  • Alastair took the opportunity to remove all of the SAN cables formerly used by MPU servers.
  • We forgot about jubilee so it wasn't shut down before the power work.
  • The cable details in lcfg/azul seemed wrong, could they be checked please?
  • All fibrechannel configuration has been removed.
  • We still need to turn off hyperthreading on certain machines.
  • The consoles for girassol and azul started functioning again after the power-down (as Neil had predicted).

Chris reported that oyster (the test KVM server in AT) had fallen over, and its subsequent boot attempt had halted at a console message like "Fatal error in last boot, press F1 to continue or F2 to enter Setup". We've never seen this before. The logs on the machine itself don't show any sign of trouble; entries seem entirely normal up to and including the final pre-outage entry at 15:40 on Saturday.

The underlying hardware being old, Stephen has pointed the student.login alias at lute, the less loaded of the two machines serving the xrdp.inf.ed.ac.uk alias. This echoes the previous integration of staff.login with staff.xrdp.

Chris reported that KSM seemed to be working well on girassol, and that before the shutdown the memory sharing had peaked at about 19GB. We agreed to spread KSM to the new KB-based KVM servers. While we're at it we'll upgrade them to SL 7.6.

Our latest idea for the old KVM servers at KB (amarela and vermelha) is that one will remain a KVM server but be dedicated to student VMs, and the other will become a dedicated XRDP server for Distance Learning students.

The replacement hardware for the LCFG master has arrived; Stephen will install it.

Next Week

Instead of a normal unit meeting we'll finalise our spending plan for the next three years.

This Week

  • Alastair
    • Inventory project
      • Documentation - end user
      • Documentation - code
        • clientreport (eg how to add modules)
        • order sync code
        • HPreport processing script
        • link in from MPU top page
      • Start work on final report!
      • Provide details on how Tartarus tables are accessed to Ian D for inclusion in his privileged access discussion paper
      • Add requirement to computing.help project stuff to reimplement new computing help form using REST API
      • Produce an Legitimate Interest Declaration and Privacy Statement
        • records machine to user allocation (with their UUN, cname, sname, user category)
        • records who requests which order (usually just uun, but can be cname+sname)
        • records who makes a change in inventory (just uun)
        • consider what can be removed once a user has left the University
          • any rows in the 'person' table where 'upstream' is false and where there isn't an 'item' row with a matching 'allocated_to' field should be deleted by a periodic script. Arguably 'category' should be set to NULL where 'upstream' is false?
      • Decommission ordershost
        • need to replicate kvmreport mechanism on Tartarus (or somewhere)
          • submit data via clientreport mechanism
        • take snapshot of files (no need to take snapshot of SQL as this is automatically recreated from orders files)
        • power off for 3 months prior to deleting to see if anything breaks
      • Document Tim's theon old inv snapshot and what its purpose now is. Also modify invquery to remark that data is historical only.
      • client report to take 'sysinfo.manager' and populate item.manager from this
    • Take a look at RT #78875
      • WON'T LOOK UNLESS A BIG ISSUE (Ask Tom)
    • Look at Stephen's 'Thoughts on shell components'
    • Investigate systemd reboot bug on gaivota and add some more debugging (store tree diff somewhere)
    • drupal username collection re GDPR
      • Perioidically run user expiry script every month until August 2019 and if no problems configure to run automatically
    • Check with Tim / George about capability for login to student machines - where are we
      • Tim says that we should create a capability that is given to the base cohort and set that capability to no-grace
    • Meet Tim with Chris to review RAT involvement
    • Look at using php-5.6 on computing.help
    • Think about a separate XRDP server for Distance learning students
      • use one of the old KB KVM servers
    • Check with Tim whether we still need service catalogue entry (eg for XRDP service) as part of project deliverables
    • Read SL7 coordination project final report

  • Chris
    • Look at RT
    • User training materials project #403
    • Produce a 'guest only' version of Virtual DICE based on SL7.6
    • With Stephen remove hammersmith
    • Meet Tim with Alastair to review RAT involvement
    • Investigate whether small Virtual DICE image is sufficient for 1st and 2nd year teaching
      • investigate whether we could use yum groups to install additional software for each class (at least for the big classes)
    • Improve Forum KVM server dependencies on server room switches (and check LCFG profiles match reality)
    • Enable KSM on KB KVM servers
    • Upgrade KB KVM servers to 7.6

  • Stephen
    • Take issue of disable per user journald logs on certain servers to OPS
    • Look at where we're using ALL in access.conf
    • Continue with RT ticket clearout as discussed in October
    • Read George's mail of 8th November wrt DPIA
    • clientreport
      • Complete module errors report
      • Add an 'old locks' report
      • 'Old kernels' report
      • Report on core files in / directory
    • Move afsbuild server (juice) from Forum to AT
    • Produce an Legitimate Interest Declaration and Privacy Statement for svn history and LCFG profile history
    • With Chris remove hammersmith
    • All 7.6 machines to use https for profiles from stable 7th August
    • Have a quick look at harpsichord and clavichord config
    • Point student.login to lesser loaded RDP server
    • Commission new LCFG master

-- AlastairScobie - 24 Jul 2019

Topic revision: r9 - 23 Sep 2019 - 13:33:42 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies