MPU Meeting Tuesday 17 January 2017

Inventory

No activity.

MPU SL7

computing.help.inf.ed.ac.uk
Alastair has an SL7 server (lagun) just about ready to go. He has also upgraded Drupal on all of the computing.help servers.
wake.inf.ed.ac.uk
is on SL7. (However the backend is still on SL6. Chris will check its functionality on SL7.) This resolves SL7RT:47.
bugs.lcfg.org
Ongoing. The packages and package list for Bugzilla 5 have been made.
LCFG master
Ongoing.
  • Stephen has been writing notes for this upgrade at MPUpgradingMasterLCFGServerSL7.
  • He's been doing detailed testing too. This revealed that the access controls were not correct for committing changes to the restricted-access header files in the LCFG subversion repository. Given the configuration that was in place, in theory this should never have worked, but somehow it did anyway on SL6. To get it working on SL7 Stephen removed truly anonymous access to the repository - we didn't need that anyway. This has reinforced our view that Thorough Testing Is A Very Good Idea.
  • Before the LCFG master can be upgraded, ordershost has to move elsewhere. (We did this for the SL6 upgrade too.) Alastair will move it to its new SL7 host nerano.
Packages mirror server
It's now running SL7. This resolves SL7RT:6.
Log Cabin
now running SL7. This resolves SL7RT:57. Log Cabin is temporarily using local certificates until we get the proper ones restored.
Loghost issue
The new loghost has an IPv6 address, so everything wants to talk IPv6 to it - so we need access controls which cover IPv6 as well as IPv4.
Inf level login host for computing staff
we need one which runs SL7. We forgot to add this to our upgrade plan.

Miscellaneous development

Stephen has made a start on LCFG support for SL 7.3 by making the lcfg-level package lists. The rest (lcfg level headers, dice level) will be done later. He found that yummy now crashes on certain package lists (Bug:986).

Operational

  • The recent Appleton Tower server room power-down (see e.g. the MPU power-down To Do list) went without major incident.
  • Alastair has fixed the LVM configuration and made the LVM component happy on azul.
  • LVM on gaivota is still broken, so Chris will schedule downtime for it.
  • Stephen updated VirtualBox to 5.1.12 and OpenAFS to 1.6.20.1, so DICE is up to date with both.
  • Stephen checked SL7 DICE functionality a couple of server models (R715 and R815) for Richard - we only have one of each of these so we had been unable to check them before this. This resolves tickets SL7RT:268 and SL7RT:270.
  • The local packages master server bruegel had been reinstalled with a tiny AFS cache by mistake. Stephen reinstalled it with a far larger cache, and AFS is currently using 25GB of it!
  • There's been some fallout from the recent large batch of SL7 updates (see for instance Kenny MacDonald's list):
    • The major update to NetworkManager brought configuration which made /home read-only. Our autofs config replaces the /home directory with a symlink, so this new behaviour was inappropriate for us. See Bugzilla:3273.
    • Also thanks to the updated NetworkManager, some GNOME applications started wrongly deciding that the machine was not online. Stephen's post to the lcfg-discuss list summarises the problem and a solution.
    • The Dell firmware updater dsu has stopped leaving temporary partitions dangling, but some of the firmware updaters no longer work properly - see Chris's post to the linux-poweredge list.
  • One of (KVM server) vermelha's data disks warned of imminent failure, so was replaced. See RT:80898.
  • The staff NX service will shortly move to newer and better hardware in the shape of former KVM server jubilee. The main NX service will then move to former KVM server hammersmith. The moves will give each service a 50% increase in CPU cores and quadruple the memory. We therefore plan to double the per-process memory limit on the general NX server and abolish it altogether on the staff NX server.

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Continue work on RESTful API - TartarusRESTAPI
      • Document REST API
      • Further encourage people to use API and ii commands
      • Write more of the ii commands and document as writing.
      • Speak to George about macaddr/space feed
      • Start work on final report!
      • Chase Tim about theon acccess credential for feed
      • Convert from mod-auth_kerb to mod-auth_gssapi (See Stephen for details)
      • How represent VMs
    • Deploy encrypted /tmp and swap conversion script
      • Do during Festival of Creative Learning week (w.b. 20th Feb)
      • Need to warn users that Gnome3 may pop up a window about /tmp being full (when script is run)
    • Schedule MPU meeting to discuss systemd ordering
    • Test new staff.nx server (jubilee)
    • submit polkit bug to redhat - with Stephen (check with 7.3)
    • MPU SL7
      • Chase Toby again about testing latest perl-Moose under prometheus (and then make live) after October 1
        • Toby reckons now fine - will update immediately after Xmas
      • Upgrade computing.help servers
        • Kill off hjaelpe and brent (now powered off)
        • Make 'lagun' live.
        • Look at rootmail to check for apacheconf.drupal problems
          • Problem is on lagun - for some reason rsyslogd isn't listening on various faciltities. Also, journalctl isn't loging anything - for some reason /var/log/journal doesn't exist
        • Remember proper certs for computing.help master
      • Move 'ordershost' to 'nerano' ASAP (blocking LCFG master upgrade)
    • Check sysmans (et al) have 'nograce'.
    • Take a look at RT #78875
    • Look at RT and SL7RT

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • MPU SL7
      • Continue with bugzilla
      • Look at wake backend (running on Inf servers)
    • Roll out fixed sleep code
    • Reschedule MPU futures meeting
    • Update PackagesSiteMirror
    • Schedule gaivota downtime to investigate LVM/IBM VG issue
    • Look at RT and SL7RT

  • Stephen
    • LCFG client refactor stage 1
      • schedule debrief meeting
    • LCFG client refactor stage 2
      • testing and documentation
      • blog article (once documentation complete)
    • LCFG server symlink to exam branches - produce reporting script and discuss with Graham
    • submit polkit bug to redhat - with Alastair (check under 7.3)
    • SL7 MPU
      • Schedule LCFG master server
    • SL7.3
    • Investigate George's multiple network interfaces SL7 issue (eg consoles server)
      • waiting on George breaking metropolitan
    • LCFG annual review - produce minutes
    • Replace piccadilly with hammersmith (NX service)
    • Look at RT and SL7RT

-- AlastairScobie - 17 Jan 2017

Topic revision: r8 - 24 Sep 2019 - 13:50:24 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies