MPU Meeting Wednesday 27th September 2017

Inventory

Alastair has been tidying code and adding pod documentation.

He's added a new script which updates hostnames from clientreport data where serial numbers match. This is a temporary arrangement which will be abandoned when the DHCP integration is done.

Another new script creates each machine's infdb_data.h headers (which set the machine's sysinfo resources). This brings a change in behaviour: old headers will be deleted rather than just being left around forever as in the old inventory system. However it will be possible to make this deletion optional.

LCFG Client Refactoring

The next stage will be the testing of the new code against the University's LCFG profiles.

Miscellaneous Development

  • Stephen has rewritten pkgsubmit in Python. It accepts the same configuration files and command line options as the current one. The new version compares checksums of files before and after having copied them into the repository, so that it can spot file truncation or damage. If the copy fails, it attempts to back out the files already submitted. Also, an attempt to submit a file which is already in the repository will no longer be regarded as an error - though a warning message may be produced. This is so that if a submit fails due to (for instance) a quota problem, the same submit can simply be repeated after the problem has been fixed.
  • Stephen has been looking at making LCFG work over https:
    • in normal use, switching the client to https just works.
    • installs don't work, because they use LCFG servers called lcfg and lcfghost, which aren't in the server's certificate. To fix this we need subject alternate name support in lcfg-x509 - see Bug:1010.
    • During the install process some scripts call rdxprof, but rdxprof doesn't have a default URL to fetch. It would be better if, when not given a specific URL to fetch, it was able to fall back to the contents of client.url.
    • This could make it possible to provide a higher level "Fetch XML" abstraction. This could be provided with about five days' work. With extra effort the client could also verify the certificate.
  • Chris provided a script to monitor package bucket quotas. It gives a summary of the usage of each package bucket, and an hourly cron job will alert us when a bucket or the host partition is filling up.
  • Chris has made a new Virtual DICE VM, but X fails to start, citing a missing vboxvideo module. Alastair has had a similar problem and will take a look.
  • Chris updated DIY DICE to SL 7.3 and fixed the DIY DICE server's apache configuration to properly separate the virtual hosts serving diy.rpms.inf.ed.ac.uk and diydice.inf.ed.ac.uk over http.
  • Alastair has fixed lcfg-systemd to be more defensive against, and improve reporting of, rsync failures; and to be more tolerant of punctuation in service names or IDs (Bug:1007).

Operational

  • Chris surveyed the MPU web servers, looking especially for those which offered text boxes over http, as this will be flagged as an error in Chrome 62. The affected hosts are in three categories:
    • ordershost - this will be going away with the old inventory system. The Tartarus equivalent is https only.
    • LCFG slaves - Stephen has been looking into this, see above.
    • computing.help hosts - Alastair will look at this problem.
  • Stephen has upgraded the AT-based KVM servers oyster and waterloo to SL 7.3, and updated their firmware. This went smoothly except for the OS Driver Pack firmware update on oyster. This update seemed to hang for minutes at a time, only sporadically updating its progress report, and eventually failed after running for 24 minutes. The OS Driver Pack update is marked "optional" and is not necessary for our systems since it "contains applicable storage, NIC, and video drivers to support OS installation using Unified Server Configurator - Lifecycle Controller Enabled" (Dell OS Driver Pack at dell.com).
  • gaivota is the only MPU KVM server yet to be upgraded to 7.3.
  • All other SL 7.2 machines have now been upgraded to 7.3. This involved Stephen doing his usual upgrading of package lists. Putting long lists of packages in LCFG profiles makes this unnecessarily difficult. Please use package lists where you can - these are far easier to upgrade, since they can just be fed to yummy.
  • We consider that we do not have any functional accounts - in the sense of accounts which use IS services, for example mail or logins to IS sites.
  • Emacs is currently broken (RT:84608) - the icon buttons are still there but their icons are not visible. This seems to have been caused by a broken GTK update. Emacs has been patched on Debian and Ubuntu, but the version on RHEL is too old to be patched (24.3). We therefore plan to take a more recent (24.5) Emacs package from Fedora and combine it with the current RHEL-sourced package to produce a version that we hope will work and have visible icons once again.
  • We discussed the possible use of physical serial consoles on our KVM servers, but were unsure of the speed they could handle. Stephen will experiment with metropolitan to see how high the baud rate could go.

Next meeting

The next meeting is expected to be at 2:15pm on Wednesday 11 October.

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Continue work on RESTful API - TartarusRESTAPI
      • Document REST API
      • Write more of the ii commands and document as writing.
      • Start work on final report!
      • How represent VMs
      • Continue with REST API testing framework
      • Consider what else needs done other than docs and tidying and backups
      • Blog something....take dev meeting talks
    • Deploy encrypted /tmp and swap conversion script
      • Need to warn users that Gnome3 may pop up a window about /tmp being full (when script is run)
      • Now down to 5 desktops, 3 users, 2 COs
    • Schedule MPU meeting to discuss systemd ordering
    • Check sysmans (et al) have 'nograce'.
    • Take a look at RT #78875
    • Look at /etc/hosts - dns issue (IPV6?)
      • work out what we need to fix current problem
    • Circulate info on RH7.3 systemd changes we may wish to consider
    • RT actions (as agreed)
    • Deploy disable-module header on all computing.help servers
      • Added to computing-help-server header, but after 20/9 will need to check config of help servers
    • Implement change to kvmtool to allow KVMs to be marked as disabled
    • Look at Stephen's 'Thoughts on shell components'
    • Upgrade gaivota to 7.3
      • remember to replace FC card with a 25xx series card (alexandria?)
      • remember to run the libvirt_guests script manually to shutdown guests (don't use systemd as will timeout due to length of time script needs)
      • remember to have a page so that units can sign off that they understand that the server is being upgraded - listing machines and whether they will be suspended or shutdown or migrate * Look at MPUActivitiesList
    • Start looking at https and computing.help (remove assumption that https means want cosign login)
    • Look at virtualbox X problem that Chris has
    • Check that tartarus clientreport reports on all that the old clientreport did (eg updaterpms success)
    • Chase Alison about LCFG check monitoring ( start doing again )
    • Look at RT

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • Look at MPUActivitiesList
    • Virtual DICE
    • Look at RT

  • Stephen
    • LCFG client refactor stage 2
      • testing and documentation
    • LCFG server symlink to exam branches - produce reporting script and discuss with Graham
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Produce some text for systemd mount bug (to submit to RH)
    • Take issue of disable per user journald logs on certain servers to OPS
    • Schedule jubilee downtime to move to SOL
    • Consider PD work for after LCFG client ...
    • Look at MPUActivitiesList
    • Fix emacs
    • On metropolitan, find fast baud rate we can drive the real physical consoles. (This so we can decide whether to use physical consoles for KVM servers).
    • Look at RT

-- AlastairScobie - 27 Sep 2017

Topic revision: r6 - 24 Sep 2019 - 13:50:24 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies