MPU Meeting Tuesday 15th December 2015

Inventory

To process incoming email reports from suppliers - containing list of MAC addresses, serial numbers and suchlike - Alastair has created a prototype new script on the mail server. krbcurl reads a mail message on stdin and sends it over a kerberised authenticated connection, using curl, to an https cgi script.

He has also stopped clientreport from trying to check monitor details on servers (by removing the relevant module). This has cut down the number of clientreport errors.

LCFG Client Refactoring

Stephen has been pulling the context handling into the C library, so that contexts can be evaluated as soon as they are encountered. The code which sets contexts is complete, as is the code for handling simple context checks. More sophisticated context expressions - multiple context checks linked with and or or - are not yet there. Stephen may look at yacc and lex to help with this.

SL7 Server Base

Chris got SL7 up and running on sauce, our test DL180. We found on sauce and on metropolitan that we couldn't bond over any interfaces other than NIC1 and NIC2. Also, we need to fix the Nagios bond check. It currently asks the bonding code if the bonds are happy. Even if there is somehow only one bonded interface, this check succeeds! The check should really count the number of bonded interfaces and flag up a fault if there are less than two of them.

Stephen has been collecting ideas for a rewrite of the LCFG apacheconf component for SL7. The ideas so far are listed at LCFG:ApacheConfIdeas. Please read and contribute. This will be done as a separate project.

Alastair intends to get local home dirs working using pam-mkhomedir instead of the old automounter-based method. The future of the LCFG localhome component is therefore in question.

Misc Development

Bug:922 details a problem in the ngeneric component. Bad things could happen when two or more components tried to create the LCFG lock dir simultaneously. The problem is fixed in lcfg-ngeneric 1.15.6, which is now in stable.

Alastair has changed the updaterpms check script. It used to include all failures, then was scaled back to list only failures from the last 7 days. It now lists only failures from the last 2 days. However Alastair is now investigating a problem whereby the script does not generate output when run from cron.

Operational

Chris is working on additions to the NX help pages which will explain how to logout and how to control and kill suspended sessions. (Now created at Help:nx-sessions.)

The devproj VM ran out of disk space, so Alastair made a bigger one. devproj has therefore moved from korat to birman.

We intend to host a discussion, sometime in 2016, on revising the order of the SL7 systemd startup.

Alastair is investigating Drupal modules on the help server following some interesting scan results.

We should start compressing log files, but we should use the delaycompress directive to ensure that it's done cleanly and only once the log is no longer being written to.

Stephen will look into creating an fstab partition encrypt macro.

We've rebooted most of our servers. We won't reboot the KVM servers this time round, given the lack of a really critical update.

Chris will upgrade subversion on the NX servers to 1.7.

Chris will tackle RT:75237.

This week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • consider what next can be integrated into existing system, if anything
      • Check for systemic errors from clientreport
        • Look now that servers don't check monitors
      • Document clientreport
      • Document order sync code
      • Continue work on hpreport processing script
    • @home - look at using rsync from site.pkgs instead of mirroring from upstream
    • Remove default pool if ops meeting agrees
    • Experiment with different window managers under VNC (making the assumption that performance under NX will be similar)
    • Think of a use for 'atom'
    • Understand how NetworkManager works wrt init scripts
    • Deploy encrypted /tmp and swap
      • Continue work on script to modify existing machines
        • modify to be an installroot script
        • modify to wipe swap and /tmp
    • SL7 base server
      • Localhome functionality - use mkhome_dir instead?
      • check metropolitan USB and CD
      • Continue work with FC and LVM
        • investigate interaction between multipath and UDEV
        • check nagios notices if FC cable removed
      • Fix the bonding nagios script to scream if fewer than 2 slaves active for each bonded group
      • Understand how the network interface naming works and how we're going to bond over eth0 and eth2
    • Why is updaterpms check script returning 0 on weekly test - does it always do this when run from crontab - nope, works fine if reschedule for 4pm
    • look at Activities List
    • Schedule MPU meeting to discuss systemd ordering
    • Build computing.help honeypot
    • Rotate drupal logfile on computing.help and devproj
    • Schedule MPU meeting to discuss activities list

  • Chris
    • Inventory project
      • continue working through TartarusWorkFlow
      • Look at clientreport modules for replacing firmwarereport
    • pkgsearch for SL7
      • reimplement as a yum web front end (yum search for keyword produce an html file of links to cgi to do yum info)
      • Need support multiple platforms
    • Liaise with George over iDRAC documentation (look through ops reports to remind)
    • SL7 -
      • hwmon (HP raid and H200 raid)
      • hwmon on DL180 (try otaka)
      • Finish off looking at R620 (belter)
    • RT tickets close
    • Continue investigating SL6 sleep problem
    • look at Activities List
    • Look at USB device ownership (learn about udev) - RT75237
    • Improve documentation re suspended sessions on NX http://computing.help.inf.ed.ac.uk/nx-sessions
    • Schedule MPU stargazing meeting
    • Fixup reminders
    • Upgrade to svn 1.7 on ssh and NX servers

  • Stephen
    • LCFG client refactor stage 1
      • schedule debrief meeting
    • LCFG client refactor stage 2
      • document API
      • complete combining packages
      • blog article
    • Think about PD - Interested in ZeroMQ
    • Investigate kernel component pipe moan by using shell commands instead of RPM module => waiting on 7.2 => activities list
    • look at Activities List
    • Create project for lcfg-apacheconf rewrite
    • Need a FSTAB _PARTITION_ENCRYPT macro
    • Fixup reminders

-- AlastairScobie - 15 Dec 2015

Topic revision: r5 - 23 Sep 2019 - 13:33:38 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies