MPU Meeting Tuesday 2nd December 2014

Virtual DICE

The final boxes were ticked during the meeting so this is now finished.

Systemd

Nothing happened.

SL7

Chris has started building all the Perl modules for the sleep component. He had a few problems with the latest versions of some packages from koji. He will send Stephen the details so we can avoid having to go back to the older versions.

A lot of work has been done on getting lightdm to work in place of gdm. There is now an LCFG component and all the necessary headers have been moved to the lcfg layer. The PAM configuration was also done, this raised an important issue with the gdm PAM configuration which has now been fixed. Alastair has discovered there are some serious problems with using lightdm. Firstly it does not support locking the desktop, this used to be done by the screensaver but is now the job of the login manager. Ubuntu uses xlockmore but this is not currently available in epel. There is also an extension for lightdm to do the locking but it only works with newer versions. The second issue is that lightdm does not handle power management so, for example, we will not get screen blanking. Does this mean we have to revert to gdm? This could become a greater issue during the lifetime of EL7, can we sustain effort on making lightdm work? Having said that, lightdm works fine with Systemd (via the pam_systemd module) and supports all the features we used in kdm on SL6.

Alastair has documented the dconf component and shipped the package. It still lacks support for locking mandatory options (which we did use in gconf).

Toby's problem with console interaction from LCFG components at boot time has been solved. It is documented in the Systemd cookbook on the LCFG wiki.

Kenny MacDonald has noted a problem with updaterpms at boot time and the initial output (typically about 15 package installs) being buffered. Stephen noted this is particularly confusing when you reboot for a kernel upgrade and get no output for several minutes until the upgrade has finished and the automatic reboot occurs. We need to ensure users don't think their machine has hung.

Stephen fixed #802 which was a problem with the grub2 component template which prevented using the linux16/initrd16 option.

Stephen has been focussing on adding support for EL7 to the various package list management tools. This has required changes to lcfg-pkgtools and the associated LCFG::PkgTools perl modules. The pkglist-tools scripts can now retrieve information via http as well as from the local filesystem. Rather than searching for RPM files the tools now use the rpmlist files for efficiency. In the process he took the time to fix issues with the regular expression which is used to parse the LCFG package specifications and this has been documented on the LCFG wiki.

Work has also been done on the refreshpkgs tools. Firstly this needed to be updated to use the perl AFS::Command module for all functions. Secondly it needed a lot of code from the various scripts to be merged so that it is easier to modify the behaviour. For EL7 we want to alter the paths to the local buckets to incorporate the architecture but that can only be done when there is a single function which can work that out for all scripts. Stephen also noted the long-standing issue with the scripts occassionally missing new packages which are submitted whilst a volume release is in progress. He has some ideas on how to avoid that problem entirely.

Stephen and Graham had a chat about the DICE package lists for EL7. They also spent a while discussing the yummy tool. This can do a lot of what we want but for it to be truly useful we will need to add some enhancements. The aim is to fully automate the management of our package lists. For example. this would allow us to easily keep up-to-date with epel packages. MPU will look at providing a server where COs can run a tool which wraps up all the necessary stages for creating a package list from a yummy template. This would include ensuring there is an appropriate installroot available (maybe update this each night?) We might be able to do this on the current buildhosts if they had a bit more disk space.

Kenny MacDonald reported a problem with logrotate failures due to the /var/lcfg/log directory being group writable and the group being lcfg rather than root. This is a new "feature" of logrotate, there is a configuration option to change the behaviour, Kenny is going to try and come up with a patch. He spotted this in email from cron, this raised the issue that we should be checking rootmail and syslog from EL7 machines to see if there are any other problems.

Miscellaneous Devel

Operational

nomachine NX client
Chris has documented how to use the nomachine NX client with our NX service.

drupal
Alastair looked at why it was not possible to upload files to computing.help. This turned out to be an issue with the deny frame option which had been enabled as part of the "sensible" apache configuration. It appears that mod_security does work and it is running on the backup/test sites. This needs packaging up properly and an lcfg header adding. The logging has been changed to just use syslog rather than the DB, this should help avoid filling the local disk again. Alastair has found out how to "vacuum" the DB to reclaim space and this has been documented.

bakerloo
Chris has decommissioned this machine but it is currently still in the rack because there is no space in B.03 until the next uplift has occurred.

jubilee
We need to get the spare disks added as a new storage pool.

waterloo
We need to install the new memory. Once that is done could we move some VMs from the Forum to AT? Chris will look at what MPU have that could be easily moved and doesn't require migration whenever there is downtime.

hare and wildcat hwmon
We have long-term acked nagios errors for hwmon on these two machines, can we do something about this?

northern
The new disks have been ordered, see RT#69749 for details.

openafs
1.6.10 has been rolled out to develop machines. Stephen has started looking at building the latest 1.6.11pre1 release.

Intro to AFS talk
Stephen gave his usual Intro to AFS talk for new staff and postgrads. There wasn't a huge turnout but he got some good feedback from the attendees.

LCFG Annual Review
Stephen has been preparing for the LCFG annual review. He chatted about some of this at the Technical Meeting.

Last stable release
The final stable release of 2014 will be on Thursday 18th December as we have exams on the Wednesday afternoon.

This Week

  • Alastair
    • systemd project
      • Consider how components will work with systemd
      • Continue work on documentation - guidance for other COs on how to use
    • EL7 project
      • what sort of level of space is required by systemd journald logging (for desktop /var sizing)
        • (By default journald logs to /run/log. Have to mkdir /var/log/journal to keep data). Have enabled on one machine
        • identify default retention policyDefault retention is to use up to 10% of partition. Can use either space or time as a constraint on space. Logs are per user + system, so users can read their own data. Each log file starts at 8MB, so a popular machine will have lots of log data.
        • Blog about journald retention policy - and document how to set...
        • Blog about decision to keep journald and /var/lcfg/log/syslog duplication - and resulting configuration change.
      • check installroot stuff same version across SL6 and EL7
        • and pull out old SL5 stuff
      • Look at whether we need anything better than existing network component for desktops
      • Look at lightdm issues
        • locking (dm-lock doesn't actually lock, just respawns greeter on different VT - can still switch back to original VT
        • power management
      • Blog about dconf and lightdm components
      • Look at LCFG bug #799 (systemd buffered output)
    • RT 65774 - try two identical monitors on my machine
    • Need to remove default bridge from kvmtool create
    • Think about disk partition policy
    • Review last reviewed date for documentation
    • Consider more cores as default for KVM guests
    • Is there a way of disabling debugging information being displayed by drupal when there are problems?Can't see how to do safely (needs disable backtrace in /etc/php.ini?)- Ask David Marsh in Physics?
    • Read LISA notes
    • Look at KVM server loading

  • Chris
    • Virtual DICE
      • Put snapshot in a new shared area made for the purpose
    • EL7
      • Sleep component
    • url shortener (once gdm solved)
    • Create Project entries - for KVM refinement project
    • Experiment rename br0 as br33 on metropolitan
    • Think about disk partition policy
    • Review last reviewed date for documentation
    • Commission jubilee's extra disks as an extra pool
    • Identify VMs to move to waterloo
    • Investigate monitoring hare and wildcat problem

  • Stephen
    • LCFG client refactor stage 1
      • schedule debrief meeting
    • EL7
      • Continue thinking about boot.run functionality
      • Complete porting MPU managed resources to the DICE level
      • Sort out mock configuration
      • Unify scripts re package paths (freshenrpms etc)
    • Think about PD - Interested in ZeroMQ
    • Deploy northern as staff.nx (first open up holes and test from home)
    • Think about disk partition policy
    • Review last reviewed date for documentation
    • Create LCFG level apache mod_security header
    • Add extra memory to waterloo (and if those work, order up more memory for hammersmith)

-- AlastairScobie - 02 Dec 2014


This topic: DICE > WebHome > ManagedPlatformUnit > MPUnitMeetings > MPunitMeeting20141202
Topic revision: r6 - 08 Dec 2014 - 12:42:27 - StephenQuinney
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies