MPU Meeting Wednesday 16th May 2018

Inventory

Tartarus has gone live and the old system is now in read only mode. We still need to update the location of the lcfg data feed.

Alastair noticed that he still needs to add a ii dispose command for when machines are disposed, he doesn't think it will take too much effort.

Also there is an issue with the reuse of MAC addresses which occurs rarely, in particular we have a VM which is using a MAC which was originally allocated to a desktop machine which still appears in the supplier report. Can we workaround this using the fact that the machine has been disposed?

Alastair needs to have a meeting with Support once they've had some experience of using the ii tools. Also need to improve the documentation. Stephen needs to add a link to the client reports on the front page of the web interface and link from the ii detailed view.

Virtual Desktop

Nothing happened. Stephen plans to find some time next week to get this finished.

LCFG Profile Security

The support for Kerberos authentication in the installer has been reworked to kinit twice, the first time for the usual TGT and the second time for the kadmin/admin service principal. To support this approach the kdcregister tool has been modified to take a new command line option with which the path to a credentials cache can be specified (see bug#1068 for details).

A change has been made to the new sysconfig file which is used when managing rdxprof through systemd so that it is now a bit quieter and doesn't write info messages to the console. The boot order for the dns and client components has also been tweaked so that when there is a local named service it should be operational before the client is started and attempts to fetch profiles. When a machine is only using the local named this should help avoid a wait of 10 minutes before the first successful profile fetch after boot.

Misc development

Operational

R230 and toohot
The IPMI sensor numbers are different in the latest batch for Dell R230 servers. Might this be a change in the firmware? Chris has made it possible to override the sensor number in the hardware headers using a CPP macro. Could the resource be changed so that it takes either a number or a string which can be used to lookup the number using ipmitools?

Spending plan
Chris has updated the MPU spending plan for 2018 to 2012

PXE / rpmaccel service
The IF service is now on regulus, the AT service will soon be switched to maia. Once we're sure that hare is free Stephen will move the staff SSH service on to it and junk brendel. Stephen has added rsync access to the /export/linux/installroot directory so that it can be easily duplicated between servers. The pxeserver component now starts after the "stable" target is reached so that reboots for kernel upgrades are speedier.

dice-check kernel
The expected kernel version for lab machines was updated.

AMD GPU Pro
Stephen updated the amdgpu pro driver to the 18.10 release to support the backported security updates for SL7.4

logrotate of wtmp
Stephen noticed that wtmp was never rotated, he tracked this down to a logrotate config option which set the minimum size to 1MB. With that option removed the file will now be rotated monthly with 2 old logs retained.

openafs build
EL7.5 and F28 packages are now being built.

.htaccess
Stephen ran a search for .htaccess files on MPU machines and checked them for options which are not supported in Apache 2.4. We're now ready for the compatibility module to be disabled.

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Start work on final report!
      • Consider what else needs done other than docs and tidying and backups
      • Blog something....take dev meeting talks
      • and give details on how Tartarus tables are accessed to Ian D for inclusion in his privileged access discussion paper
      • Look at postgresql replication (do after shipping)
      • Add tartarus info to SwitchToSelfManaged
      • Complete removal of non authenticated access to API and web
      • Need tests for API /orders and need new tests to check for correct authorisation
    • Schedule MPU meeting to discuss systemd ordering
    • Take a look at RT #78875
    • Look at /etc/hosts - dns issue (IPV6?)
      • work out what we need to fix current problem
    • Circulate info on RH7.3 systemd changes we may wish to consider
    • RT actions (as agreed)
    • Implement change to kvmtool to allow KVMs to be marked as disabled
      • looked at this - looks like the metadata tag isn't passed through libvirt (prior to 4.0.0), so can't be read/written by kvmtool
      • put on activities list to do once upgrade to libvirt-4.0.0
    • Look at Stephen's 'Thoughts on shell components'
    • Look at MPUActivitiesList
    • Start looking at https and computing.help (remove assumption that https means want cosign login)
      • wait on Neil's efforts with EdWeb
    • Chase Alison about LCFG check monitoring ( start doing again )
    • Investigate systemd reboot bug on gaivota and add some more debugging (store tree diff somewhere)
    • Report on this at next ops meeting that have changed journald configuration (MPU report)
    • Discuss with Neil - drupal username collection re GDPR
      • write a script to remove users who haven't used computing.help in, say 30 days (except COs) - and fix the email address issue (currently defaults to umich.edu)
    • Inventory stuff re GDPR
    • Check with Tim / George about capability for login to student machines - where are we
    • Add %slaac to hulp and lagun after 21/02/18
    • Useful? - a script which checks how fast a machine's console log is growing (eg huge number of dbus problems on hammersmith)
      • suggest to Ian D
    • Blog on projects
    • KVM pcid
      • Created MPUSpectreMeltdown
      • Put detection script somewhere for people to use
      • Which CPU is needed for each group..
Following config worked on 'brent' (hosted on vermelha). We might need to consider whether we want "match='exact'" wrt migrations.
<cpu mode='host-model' match='exact'>
<model fallback='allow'>IvyBridge</model>
<vendor>Intel</vendor>
<feature policy='require' name='pcid' />
</cpu>
    • Look at why kvmtool doesn't work on circle (running libvirt 4.0.0)
    • Read and comment on Stephen's notes on the LCFG security project
    • Remove IBM disk array from stack * First ask RAT whether they might find the array useful
    • Read Chris's blog on ThoughtsOn403
    • Chase Tim about starting SL7.5 project
    • Bring forward the AT KVM server replacements
    • Look at moving stuff from the immediate todo back to the main Todo list and then we can prioritise that list

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • Look at MPUActivitiesList
    • Look at RT
    • Continue work on SL7 coordination final project report (currently pending other units completing)
    • libvirt - test for memory leaks (wrt console servers) Ian will test it for memory leaks after the 17 January stable release
    • User training materials project #403

  • Stephen
    • RT actions (as agreed)
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Produce some text for systemd mount bug (to submit to RH)
    • Take issue of disable per user journald logs on certain servers to OPS
    • Consider PD work for after LCFG client ...
      • looking at Ceph
    • Look at MPUActivitiesList
    • On metropolitan, find fastest baud rate we can drive the real physical consoles. (This so we can decide whether to use physical consoles for KVM servers).
    • Look at where we're using ALL in access.conf
    • Agree with RAT how software package requests are handled - waiting on Graham documenting
    • Start off NX replacement project (#389)
      • Complete Documentation
      • Introduce test service for staff users on _metropolitan_

-- AlastairScobie - 16 May 2018

Topic revision: r6 - 23 Sep 2019 - 13:33:41 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies