MPU Meeting Wednesday 16th May 2018
Inventory
Tartarus has gone live and the old system is now in read only mode. We still need to update the location of the lcfg data feed.
Alastair noticed that he still needs to add a
ii dispose
command for when machines are disposed, he doesn't think it will take too much effort.
Also there is an issue with the reuse of MAC addresses which occurs rarely, in particular we have a VM which is using a MAC which was originally allocated to a desktop machine which still appears in the supplier report. Can we workaround this using the fact that the machine has been disposed?
Alastair needs to have a meeting with Support once they've had some experience of using the
ii
tools. Also need to improve the documentation. Stephen needs to add a link to the client reports on the front page of the web interface and link from the
ii
detailed view.
Virtual Desktop
Nothing happened. Stephen plans to find some time next week to get this finished.
LCFG Profile Security
The support for Kerberos authentication in the installer has been reworked to kinit twice, the first time for the usual TGT and the second time for the kadmin/admin service principal. To support this approach the kdcregister tool has been modified to take a new command line option with which the path to a credentials cache can be specified (see
bug#1068 for details).
A change has been made to the new sysconfig file which is used when managing rdxprof through systemd so that it is now a bit quieter and doesn't write
info messages to the console. The boot order for the dns and client components has also been tweaked so that when there is a local named service it should be operational before the client is started and attempts to fetch profiles. When a machine is only using the local named this should help avoid a wait of 10 minutes before the first successful profile fetch after boot.
Misc development
Operational
- R230 and toohot
- The IPMI sensor numbers are different in the latest batch for Dell R230 servers. Might this be a change in the firmware? Chris has made it possible to override the sensor number in the hardware headers using a CPP macro. Could the resource be changed so that it takes either a number or a string which can be used to lookup the number using ipmitools?
- Spending plan
- Chris has updated the MPU spending plan for 2018 to 2012
- PXE / rpmaccel service
- The IF service is now on regulus, the AT service will soon be switched to maia. Once we're sure that hare is free Stephen will move the staff SSH service on to it and junk brendel. Stephen has added rsync access to the
/export/linux/installroot
directory so that it can be easily duplicated between servers. The pxeserver component now starts after the "stable" target is reached so that reboots for kernel upgrades are speedier.
- dice-check kernel
- The expected kernel version for lab machines was updated.
- AMD GPU Pro
- Stephen updated the amdgpu pro driver to the 18.10 release to support the backported security updates for SL7.4
- logrotate of wtmp
- Stephen noticed that wtmp was never rotated, he tracked this down to a logrotate config option which set the minimum size to 1MB. With that option removed the file will now be rotated monthly with 2 old logs retained.
- openafs build
- EL7.5 and F28 packages are now being built.
-
.htaccess
- Stephen ran a search for
.htaccess
files on MPU machines and checked them for options which are not supported in Apache 2.4. We're now ready for the compatibility module to be disabled.
This Week
- Alastair
- Inventory project
- continue working through TartarusWorkFlow
- Document clientreport (eg how to add modules)
- Document order sync code
- Document hpreport processing script
- Start work on final report!
- Consider what else needs done other than docs and tidying and backups
- Blog something....take dev meeting talks
- and give details on how Tartarus tables are accessed to Ian D for inclusion in his privileged access discussion paper
- Look at postgresql replication (do after shipping)
- Add tartarus info to SwitchToSelfManaged
- Complete removal of non authenticated access to API and web
- Need tests for API /orders and need new tests to check for correct authorisation
- Schedule MPU meeting to discuss systemd ordering
- Take a look at RT #78875
- Look at /etc/hosts - dns issue (IPV6?)
- work out what we need to fix current problem
- Circulate info on RH7.3 systemd changes we may wish to consider
- RT actions (as agreed)
-
Implement change to kvmtool to allow KVMs to be marked as disabled
- looked at this - looks like the metadata tag isn't passed through libvirt (prior to 4.0.0), so can't be read/written by kvmtool
- put on activities list to do once upgrade to libvirt-4.0.0
- Look at Stephen's 'Thoughts on shell components'
- Look at MPUActivitiesList
- Start looking at https and computing.help (remove assumption that https means want cosign login)
- wait on Neil's efforts with EdWeb
- Chase Alison about LCFG check monitoring ( start doing again )
- Investigate systemd reboot bug on gaivota and add some more debugging (store tree diff somewhere)
- Report on this at next ops meeting that have changed journald configuration (MPU report)
- Discuss with Neil - drupal username collection re GDPR
- write a script to remove users who haven't used computing.help in, say 30 days (except COs) - and fix the email address issue (currently defaults to umich.edu)
- Inventory stuff re GDPR
- Check with Tim / George about capability for login to student machines - where are we
- Add %slaac to hulp and lagun after 21/02/18
- Useful? - a script which checks how fast a machine's console log is growing (eg huge number of dbus problems on hammersmith)
- Blog on projects
- KVM pcid
- Created MPUSpectreMeltdown
- Put detection script somewhere for people to use
- Which CPU is needed for each group..
Following config worked on 'brent' (hosted on vermelha). We might need to consider whether we want "match='exact'" wrt migrations.
<cpu mode='host-model' match='exact'>
<model fallback='allow'>IvyBridge</model>
<vendor>Intel</vendor>
<feature policy='require' name='pcid' />
</cpu>
-
- Look at why kvmtool doesn't work on circle (running libvirt 4.0.0)
- Read and comment on Stephen's notes on the LCFG security project
- Remove IBM disk array from stack * First ask RAT whether they might find the array useful
- Read Chris's blog on ThoughtsOn403
- Chase Tim about starting SL7.5 project
- Bring forward the AT KVM server replacements
- Look at moving stuff from the immediate todo back to the main Todo list and then we can prioritise that list
- Chris
- Inventory project
- Continue work on clientreport modules for replacing firmwarereport
- Look at MPUActivitiesList
- Look at RT
- Continue work on SL7 coordination final project report (currently pending other units completing)
- libvirt - test for memory leaks (wrt console servers) Ian will test it for memory leaks after the 17 January stable release
- User training materials project #403
- Stephen
- RT actions (as agreed)
- submit polkit bug to redhat - with Alastair (still exists under 7.3)
- Produce some text for systemd mount bug (to submit to RH)
- Take issue of disable per user journald logs on certain servers to OPS
- Consider PD work for after LCFG client ...
- Look at MPUActivitiesList
- On metropolitan, find fastest baud rate we can drive the real physical consoles. (This so we can decide whether to use physical consoles for KVM servers).
- Look at where we're using ALL in access.conf
- Agree with RAT how software package requests are handled - waiting on Graham documenting
- Start off NX replacement project (#389)
-
Complete Documentation
-
Introduce test service for staff users on _metropolitan_
--
AlastairScobie - 16 May 2018