MPU Meeting Wednesday 13th September 2017

Inventory

Nothing happened.

LCFG Client Refactoring

The scripts for testing the client using a large set of XML profiles are now completed. Once these have made it into the stable release Stephen will contact Kenny about running them on MDP profiles.

A couple of small bugs were found in the way the rpmcfg files were generated. Profiles without any associated packages should not have a (mostly) empty rpmcfg file generated. Also a problem was found in the way the list of inactive packages was written out.

The qxprof and sxprof scripts have been moved from utils to the client package to resolve bootstrapping issues. An update for ngeneric on SL6 had to be reverted due to problems with logrotate templates in the openldap and kerberos packages. The problems have been fixed and another attempt will be made to upgrade ngeneric in a couple of weeks time.

Miscellaneous Development

Alastair has looked at adding metadata to the libvirt guest XML. It appears that this is possible, can we easily extend kvmtool to add support for managing the data?

<metadata> <kvmtool:instance xmlns:kvmtool="http://"><enabled/></kvmtool:instance></metadata>

Chris has created a script to monitor the package volumes to ensure we don't run out of space or quota. He will add the script to the pkglist-tools package. He will review existing quotas for all buckets and ensure that none is more than 50% full with a minimum capacity of 5GB.

Operational

amdgpu pro
Updated to 17.30. Still has problems with the latest SL7.3 updates on theia so it has been pinned to an old release from the week prior to the release which contains all the updates. Hopefully AMD will fix the problem quickly...

devtoolset
The version of gcc 6 has been updated to 6.3.1

Wandering time
as things seem to be working much better the report is now weekly

KVM server disk upgrades
Alastair has investigated what it would cost to upgrade the disks in gaivota and girassol - Space for 6 drives (600GB 10K) giving 1.8TB usable space (each server). Cost max 2400 each server.

KVM server memory upgrades
Alastair also looked at the memory for azul - Space for 16 DIMMs. Currently 8 x 16GB. Another 8 x 16GB (DDR4-2133 dual rank RDIMM 1.2V) max 3000 from Dell, 2000 from Crucial.

girassol
This appears to be stable so all guests have been migrated back

KVM server downtime
We now have a ScheduledKVMDowntime wiki page to manage all KVM server downtime.

Extra virtual CPUs
Chris has looked at the issue with adding extra virtual CPUs and has updated the FAQ to note that it appears to require booting the VM twice for the change to be applied.

computing.help
kernel module loading is now disabled on these machines

systemd component
There is a weird bug where rsync fails whilst copying the new config and the system ends up in a reboot loop, Alastair is investigating.

non-https web sites
We need to review the MPU web sites to see which are non-https with forms and will soon have problems with chrome

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Continue work on RESTful API - TartarusRESTAPI
      • Document REST API
      • Write more of the ii commands and document as writing.
      • Start work on final report!
      • How represent VMs
      • Continue with REST API testing framework
      • Consider what else needs done other than docs and tidying and backups
      • Blog something....take dev meeting talks
    • Deploy encrypted /tmp and swap conversion script
      • Need to warn users that Gnome3 may pop up a window about /tmp being full (when script is run)
      • Now down to 5 desktops, 3 users, 2 COs
    • Schedule MPU meeting to discuss systemd ordering
    • Check sysmans (et al) have 'nograce'.
    • Take a look at RT #78875
    • Look at /etc/hosts - dns issue (IPV6?)
      • work out what we need to fix current problem
    • Circulate info on RH7.3 systemd changes we may wish to consider
    • RT actions (as agreed)
    • Deploy disable-module header on all computing.help servers
      • Added to computing-help-server header, but after 20/9 will need to check config of help servers
    • Is there a route via libvirt to mark a VM as being disabled ?
      • Is it possible to add additional fields into the XML file which a local script could interpret?
        • Yes, there's a field -
          <metadata> <kvmtool:instance xmlns:kvmtool="http://"><enabled/></kvmtool:instance></metadata>
        • Now check whether kvmtool can easily add this data
          • Should be as kvmtool just uses XML::LibXML
    • Look at Stephen's 'Thoughts on shell components'
    • Upgrade gaivota to 7.3
      • remember to replace FC card with a 25xx series card (alexandria?)
      • remember to run the libvirt_guests script manually to shutdown guests (don't use systemd as will timeout due to length of time script needs)
      • remember to have a page so that units can sign off that they understand that the server is being upgraded - listing machines and whether they will be suspended or shutdown or migrate
    • Buy more memory for azul (upgrade to 256GB)
      • Space for 16 DIMMs. Currently 8 x 16GB. Another 8 x 16GB (DDR4-2133 dual rank RDIMM 1.2V) max 3000 from Dell, 2000 from Crucial
    • Look at buying more disks for gaivotta and girassol
      • Space for 6 drives (600GB 10K) giving 1.8TB usable space (each server). Cost max 2400 each server
    • Look at MPUActivitiesList
    • fix lcfg-systemd to be more tolerant of rsync failures (file an LCFG bug). Also look for any other LCFG bugs that could be fixed at same time. Also produce more informative errors (eg from rsync). Error() rather than Fail()

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • Produce script to monitor package volume usage
      • run on deneb, using contents of /etc/buckets.conf to find which AFS volumes to check
      • package the script
      • Review the existing quotas
    • Look at MPUActivitiesList
    • Review MPU web services (particularly wrt upcoming Chrome https problem)

  • Stephen
    • LCFG client refactor stage 2
      • testing and documentation
    • LCFG server symlink to exam branches - produce reporting script and discuss with Graham
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Produce some text for systemd mount bug (to submit to RH)
    • RT actions (as per agreed list) once 7.3 fully deployed
    • Take issue of disable per user journald logs on certain servers to OPS
    • Schedule jubilee downtime to move to SOL
    • Consider PD work for after LCFG client - rewrote pkgsubmit into Python
    • File bug against lcfg-systemd - spurious warnings about missing targets at first boot. - bug#1009
    • Upgrade waterloo (scheduled for 20th September) and oyster to 7.3 (scheduled for 26th September)
    • Look at MPUActivitiesList

-- AlastairScobie - 13 Sep 2017

Topic revision: r13 - 24 Sep 2019 - 13:50:24 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies