MPU Meeting Wednesday 18th April 2018

Inventory

Nothing.

User Security Training

Nothing.

Virtual Desktop

Not much progress (it's mostly done).

LCFG Profile Security

Stephen summarised his progress in a blog post as follows:

Having completed the work to add support for GSSAPI auth to the client for fetching profiles I’ve now moved on to the LCFG installer. Currently the installer attempts to fetch the LCFG profile for the machine just prior to the (I)nstall, (D)ebug, (S)hell, (P)atchup, (R)eboot prompt. That fetching is done by calling the client component install method which in turn calls rdxprof in one shot mode. Having previously ported the client component to the Perl LCFG::Component framework I had hoped this would “just work” but it turned out that a number of bootstrapping issues were only being avoided previously due to many hardwired paths in the shell ngeneric code. The Perl framework takes a different approach and prefers to use the LCFG sysinfo resources wherever possible, this improves platform independence and maintainability but presents a bootstrapping problem at the first stage of the install when we have not yet downloaded any profile and thus have no sysinfo resources… I wasn’t keen on performing major surgery on the Perl component framework so I decided that the simplest solution to this problem was to get the installer to call rdxprof directly. With this change the installer worked again but still required support for Kerberos authentication.

Adding support for Kerberos authentication has been done in a fairly simple way. I’ve added support for two new install kernel command line options: lcfg.kauth and lcfg.realm. When the lcfg.kauth option is specified the user is prompted to enter their principal name and the kinit program is run to do the authentication. The user may specify the full principal name, if the realm is not specified then either the lcfg.realm option or the upper-cased domain name is used (e.g. @LCFG.ORG). If the authentication fails then the user is prompted to re-enter the principal name (which defaults to the previously entered string) and password. Once the Kerberos authentication has succeeded the credentials will be automatically used by rdxprof when required for fetching the LCFG profile.

Miscellaneous Development

  • Stephen has fixed his dice-check lab machine status script so that it copes with the changes which have been made in the Tartarus API.
  • Stephen has fixed a bug in pkgsubmit.
  • Stephen fixed a bug in qxprof and sxprof - short options didn't work when they were bundled together (e.g. sxprof -a -b -c expressed as sxprof -abc)
  • Stephen cleaned up the build scripts for the LCFG Core. Its list of libraries had got out of date so now it uses pkgconfig to list the libraries which are actually required.

Operational

Iain has been getting failures with dsu. Perhaps one in five of his attempts to run it on cluster machines errors out with an inventory download failure. In one case Iain couldn't successfully run dsu on a machine while Chris could. A couple of possible fixes were suggested:
  • multipath was confusing the software's attempts to find the temporary virtual disks which it mounts.
  • the software was trying to find dell.com on a private non-routed cluster wire.
Unfortunately neither of these seems likely to be the case with Iain's machines.

Chris has been getting failures with migrations of KVM guests between Forum servers. Most guests will migrate, but a few fail within a minute or so of starting the migration with a "keepalive timeout" error. Disabling keepalive timeout (virsh -k 0) just changes this to an I/O Error.

One of the pkgforge servers had less build space than the other, so attempts to build a series of large packages have been intermittently failing. We're increasing the amount of space available.

RT:88303 - DBAN doesn't work on the G3s. Our version is ancient. We're going to look for a new version or for something better. We'll also look to see if (for servers) Dell's PERC RAID controllers have a disk wiping facility built in.

OpenAFS 1.8 packages are being built. Stephen's impression is that the new version contains lots of server-side changes but very little change at the client end. He plans to get the client side working on DICE over the summer. Another interesting point is that once both the server and client are on 1.8.0, we'll be able to use a better form of encryption.

RHEL 7.5 is out. SL has imported the RPMs but hasn't yet started deluging 7.4 with back-ported fixes.

The jubilee console is now at last on the SOL wire.

The new KVM server disks are now in service on girassol and gaivota. Both machines now have an extra 3+TB storage pool made of four 1.8TB 10k SAS disks.

The IBM SAN is now out of service. We'll wipe its disks then unrack it.

With GDPR in mind, Stephen is looking into restricting the output of commands such as last, w and ps so as not to intrude on users' privacy.

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Start work on final report!
      • Consider what else needs done other than docs and tidying and backups
      • Blog something....take dev meeting talks
      • and give details on how Tartarus tables are accessed to Ian D for inclusion in his privileged access discussion paper
      • Look at postgresql replication (do after shipping)
      • Add tartarus info to SwitchToSelfManaged
      • Complete removal of non authenticated access to API and web
      • Need tests for API /orders and need new tests to check for correct authorisation
    • Schedule MPU meeting to discuss systemd ordering
    • Take a look at RT #78875
    • Look at /etc/hosts - dns issue (IPV6?)
      • work out what we need to fix current problem
    • Circulate info on RH7.3 systemd changes we may wish to consider
    • RT actions (as agreed)
    • Implement change to kvmtool to allow KVMs to be marked as disabled
      • looked at this - looks like the metadata tag isn't passed through libvirt (prior to 4.0.0), so can't be read/written by kvmtool
      • put on activities list to do once upgrade to libvirt-4.0.0
    • Look at Stephen's 'Thoughts on shell components'
    • Look at MPUActivitiesList
    • Start looking at https and computing.help (remove assumption that https means want cosign login)
      • wait on Neil's efforts with EdWeb
    • Chase Alison about LCFG check monitoring ( start doing again )
    • Investigate systemd reboot bug on gaivota and add some more debugging (store tree diff somewhere)
    • Report on this at next ops meeting that have changed journald configuration (MPU report)
    • Discuss with Neil - drupal username collection re GDPR
    • Inventory stuff re GDPR
    • Check with Tim / George about capability for login to student machines - where are we
    • Add %slaac to hulp and lagun after 21/02/18
    • Useful? - a script which checks how fast a machine's console log is growing (eg huge number of dbus problems on hammersmith)
      • suggest to Ian D
    • Blog on projects
    • KVM pcid
      • Created MPUSpectreMeltdown
      • Put detection script somewhere for people to use
      • Which CPU is needed for each group..
Following config worked on 'brent' (hosted on vermelha). We might need to consider whether we want "match='exact'" wrt migrations.
<cpu mode='host-model' match='exact'>
<model fallback='allow'>IvyBridge</model>
<vendor>Intel</vendor>
<feature policy='require' name='pcid' />
</cpu>
    • Look at why kvmtool doesn't work on circle (running libvirt 4.0.0)
    • Read and comment on Stephen's notes on the LCFG security project
    • Look at how to wipe the IBM disk array disks
    • Read Chris's blog on ThoughtsOn403

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • Look at MPUActivitiesList
    • Look at RT
    • Continue work on SL7 coordination final project report (currently pending other units completing)
    • libvirt - test for memory leaks (wrt console servers) Ian will test it for memory leaks after the 17 January stable release
    • User training materials project #403
    • Read and comment on Stephen's notes on the LCFG security project
    • Check to see if any of the MegaRAID controllers support disk wipe from the BIOS

  • Stephen
    • LCFG client refactor stage 2
      • Bring LCFG v4 client project to closure
    • RT actions (as agreed)
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Produce some text for systemd mount bug (to submit to RH)
    • Take issue of disable per user journald logs on certain servers to OPS
    • Consider PD work for after LCFG client ...
      • looking at Ceph
    • Look at MPUActivitiesList
    • On metropolitan, find fastest baud rate we can drive the real physical consoles. (This so we can decide whether to use physical consoles for KVM servers).
    • Look at where we're using ALL in access.conf
    • Agree with RAT how software package requests are handled - waiting on Graham documenting
    • Start off NX replacement project (#389)
      • Complete Documentation
      • Introduce test service for staff users on metropolitan
    • Check whether websites are still using Allow/Deny configuration
      • Check individual .htaccess files
    • Add cron job to journalctl vacuum
    • Secure 'last' and 'w'
    • Look at hiding 'ps' on NX servers as trial

-- AlastairScobie - 18 Apr 2018

Topic revision: r7 - 23 Sep 2019 - 13:33:41 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies