MPU Meeting Wednesday 13th December 2017

Inventory

Chris has contributed a clientreport Management module which detects BMCs and DRACs and their firmware versions.

LCFG Client

  • Stephen has fixed Bug:1022 reported by George Ross: the behaviour of qxprof -p with a null argument had changed with the v4 client.
  • There's a new "check" function which will be rolled out after testing is complete.
  • Stephen has run the code through the Coverity code scanner, and this flagged up a few issues. The code scanner hadn't been used for a few months because it needs Debian Stable to run in.
  • The Programmer's Guide has been expanded.

Miscellaneous Development

  • The latest dsu is now available for servers on the develop release. At the same time we're dropping dsu from SL6.
  • The latest qemu-kvm-ev packages have been added to kvm-rhev.h for servers running SL 7.4.
  • The Centos Virtualization SIG is now packaging the latest libvirt RPMs. We'll make them available as options. We'll then test the latest libvirt client with Ian to see if it fixes the conserver memory leak problem.
  • Stephen has fixed Bug:1023 - the upgrade to SL 7.4 wasn't triggering a final reboot because the lcfg-release-el7 package hadn't been updated.

Operational

  • There's a patch for the OpenAFS getcwd() bug. Stephen will build new OpenAFS packages this week.
  • The LCFG Annual Review of 2017 was held on 8 December.
  • The LCFG chatroom has moved to Slack. If you have an ed.ac.uk email address you can sign up via this address. If you don't have an ed.ac.uk address, contact us for an invitation. Anyone who is interested in discussing topics related to LCFG is very welcome to join the workspace.
  • There's now a mirror of our Inf package bucket on the IS LCFG service, as a backup.
  • We could virtualise our LCFG DR server on IS infrastructure. We'll consider this idea.
  • (From the Annual Review page) We would like to move the LCFG source code repository from svn.lcfg.org onto a separate host. This might provide us with an opportunity to switch revision-control system to something like git. SEE are using gitea so we could provide an interface like that. We would also like to make the LCFG code more accessible/searchable so a mirror on a hosted service such as gitlab.com would be nice, this might help raise the profile of the project. It would also be somewhere we could upload source tar files to make bootstrapping new platforms simpler.
    • Chris adds: Tim Colles' blog post Migrating from svn to git while splitting repository might be of interest here as it covers a number of techniques they had to explore in order to properly conserve a rather convoluted revision history when splitting a repository and migrating to git.
    • Splitting the LCFG svn repository should remove one of the obstacles to adoption of the recommendations of Ian Durkacz' recent security paper.
  • As part of this year's spending round, Alastair asked if the package cache servers could be virtualised. We discussed a number of points:
    • They were made physical so that (after, say, a power outage at a site) they could be up and running before the KVM servers, in order to minimise the number of machines encountering long updaterpms timeouts while booting. So, the updaterpms timeouts are a problem. We came up with a couple of ideas to tackle them:
      • When updaterpms fails to contact a package server it could fail over to another server instead of (currently) simply failing. This would be easier to do if the package servers address configuration was moved from DNS to a new updaterpms configuration file. This could specify the names of packages servers for updaterpms to use, and these servers could be given weightings.
      • The timeout periods were set years ago. We should review them - we could perhaps massively cut the timeout period.
    • The package cache servers currently also act as our PXE servers. It would make more sense to house the PXE service on the network servers, where the DHCP service lives.
    • We should check the network usage of our package servers. Are any of them network bound? Should we upgrade to 10G network interfaces?
  • Another spending idea was to provide more disk space for the KVM servers. In particular we should bear in mind that our IBM SAN is now 7 years old and is out of warranty and no longer on maintenance, so we should consider providing enough extra space for the Forum-based KVM servers that we could take the IBM out of service. We could then remove Fibre Channel from those servers.
    • Chris will check the KVM servers for any unnecessary unused storage left over from migrations.
    • Chris will warn the owners of VMs using IBM-based pools about the IBM's age and status.
  • Chris will separate out our list of HTTP sites with text entry boxes so that it can be linked into the Operational meeting minutes.
  • Alastair has helped out with the US Unit SL7 server upgrades:
    • SL6 release testing is no longer needed so bigglesvb2 has been turned off.
    • The student discussion forum service isn't currently in use so forteviotvb2 has been disabled. *_hp1_ has been turned off, but it needs to be physically decommissioned.
    • The ANC web server nimrodkvm was getting traffic only from web crawlers, so apacheconf and ipfilter have been disabled. We'll wait for any reaction before going further. Neil has the contents of it backed up.
  • We need some written procedures for turning a DICE machine into a self-managed machine. Stephen will do this.

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Continue work on RESTful API - TartarusRESTAPI
      • Document REST API
      • Write more of the ii commands and document as writing.
      • Start work on final report!
      • Continue with REST API testing framework
      • Consider what else needs done other than docs and tidying and backups
      • Blog something....take dev meeting talks
      • and give details on how Tartarus tables are accessed to Ian D for inclusion in his privileged access discussion paper
      • Look at postgresql replication
    • Deploy encrypted /tmp and swap conversion script
      • Need to warn users that Gnome3 may pop up a window about /tmp being full (when script is run)
      • Now down to 3 user desktops
    • Schedule MPU meeting to discuss systemd ordering
    • Check sysmans (et al) have 'nograce'.
    • Take a look at RT #78875
    • Look at /etc/hosts - dns issue (IPV6?)
      • work out what we need to fix current problem
    • Circulate info on RH7.3 systemd changes we may wish to consider
    • RT actions (as agreed)
    • Implement change to kvmtool to allow KVMs to be marked as disabled
    • Look at Stephen's 'Thoughts on shell components'
    • Look at MPUActivitiesList
    • Start looking at https and computing.help (remove assumption that https means want cosign login)
      • wait on Neil's efforts with EdWeb
    • Chase Alison about LCFG check monitoring ( start doing again )
    • Investigate systemd reboot bug on gaivota and add some more debugging (store tree diff somewhere)
    • Try latest vdice.ova (sensa) and steno and record problems in detail
    • If in Forum server room, review MPU rack usage
    • Review 'ssh on a mac'
    • Start upgrading MPU servers to 7.4
    • Get costings for increasing storage space for Forum KVM servers (and get assertive in new year about tidying up old VMs)

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • Look at MPUActivitiesList
    • Look at RT
    • Continue work on SL7 coordination final project report (currently pending other units completing)
    • Ship latest Virtual DICE (once Alastair double checked at home on Windoze)
    • If in Forum server room, review MPU rack usage
    • Update MPUPackageRepository
    • Start upgrading MPU servers to 7.4
    • Make latest libvirt available and test for memory leaks (wrt console servers)
    • Some KVM housekeeping re old VMs (migrated?) - migrated VM storage now all deleted. girassol still has some storage for superseded or deleted VMs. They may have been preserved deliberately - we can clear this up in 2018.
    • Extract out http findings and add to Ops meeting action list - MPunitHttpList.

  • Stephen
    • LCFG client refactor stage 2
    • RT actions (as agreed)
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Produce some text for systemd mount bug (to submit to RH)
    • Take issue of disable per user journald logs on certain servers to OPS
    • Schedule jubilee downtime to move to SOL
    • Consider PD work for after LCFG client ...
      • looking at Ceph
    • Look at MPUActivitiesList
    • On metropolitan, find fast baud rate we can drive the real physical consoles. (This so we can decide whether to use physical consoles for KVM servers).
    • Look at where we're using ALL in access.conf
    • If in Forum server room, review MPU rack usage
    • Agree with RAT how software package requests are handled - waiting on Graham documenting
    • Start upgrading MPU servers to 7.4 - needs to wait until stable release on 13th December
    • Rebuild openafs with 'getcwd' patch and test
    • Produce page on converting a DICE machine to a self-managed machine - SwitchToSelfManaged

-- AlastairScobie - 13 Dec 2017

Topic revision: r9 - 24 Sep 2019 - 13:50:25 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies