MPU Meeting Tuesday 30th September

Virtual DICE

Chris produced new releases for SL6 32bit and 64bit. However they haven't yet replaced the existing releases because once the image files were copied to AFS, the md5sum checksums generated for the new image files were inconsistent. Running md5sum on crustan on an image file in /afs/inf.ed.ac.uk/group/mp-unit/virtual_dice/images, a variety of checksums can be produced. Chris will generate the checksums from another machine and will look into the crustan problem further (RT 68813). We're concerned about the apparent failure of AFS to cope with this network problem.

systemd

The systemd component now has support for stopping a service if it is removed at configure time. Alastair also substantially updated the EL7/systemd status of LCFG components table. This week he will start producing documentation.

SL7 LCFG port

Alastair generated a graph of dependencies for LCFG components on EL7.

Alastair found that journald only logs to /run by default. However if the directory /var/log/journal exists then it will log there. While looking at this he noticed that the inf level was not logging to our network log host, and fixed the problem. We will need to look at journald log size and figure out a suitable retention policy.

Stephen finished his desktop disk usage analysis.

Chris got GDM working. The final piece of the jigsaw was that pam_systemd was missing from the EL7 PAM files. Stephen will add it to our system_auth PAM file. We will now try configuring GDM. This is now done using dconf.

Miscellaneous Development

Stephen has rewritten the cyrussasl component in Perl. It now uses Perl Template Toolkit templates.

Operational

Prompted by Ian's problems building conserver packages, Stephen discovered that the mock buildroots on PackageForge still needed to be updated to use SL6.5. They've now been updated.

Stephen has removed the last F13 and F20 packages, and has finished preparing the SL5 packages for archiving. Metadata and duplicate files have been entirely removed, as have Maple and Matlab. The remainder amounts to about 200GB which should fit onto one of our tapes.

Thanks to Stephen and Neil, northern and piccadilly are now in the Forum. northern is racked, upgraded and ready to become staff.nx.inf.ed.ac.uk. Its 600GB disks will go into jubilee to provide extra space for VMs. piccadilly will be installed in AT and will take over nx.inf.ed.ac.uk duties.

Quite a lot of effort went on ShellShock mitigation this week. At some point we should spend time looking at mod_security in more detail.

Chris updated the install profiles on DIY DICE to SL6.5.

Alastair found that the netgroup we use to control access to the staff NX server could not be used to set the memory limits, as a Unix group has to be used - so members of the staff group will continue to get a higher memory limit on the NX servers.

The outgoing NX servers bakerloo and central are power-hungry, slow, old and generally outdated, and will be scrapped.

Chris added a br202 network bridge to circle then tried powering off a test VM, switching it from br0 to br202 then powering it on. The test VM did not get a network connection after the switch. He'll try creating a VM using br202 from the start.

This Week

  • Alastair
    • systemd project
      • document the debugging including stuff about disabling graphical boot
      • Email list of LCFG components to COs. ASAP
      • Consider how components will work with systemd
      • Start producing some documentation - guidance for other COs on how to use - started on this
    • EL7 project
      • continue considering dependencies between components and ordering
      • Publish dependency graph - emphasing as is currently.
      • blog about dependency issues
        • client time before updaterpms
        • starting stuff before network
      • what sort of level of space is required by systemd journald logging (for desktop /var sizing)
        • (By default journald logs to /run/log. Have to mkdir /var/log/journal to keep data). Have enabled on one machine
        • identify default retention policyDefault retention is to use up to 10% of partition. Can use either space or time as a constraint on space. Logs are per user + system, so users can read their own data. Each log file starts at 8MB, so a popular machine will have lots of log data.
        • are we duplicating stuff in journald and /var/lcfg/log/syslog (if so, do we want to kill off /var/lcfg/log/syslog?)Yes we are.
      • check installroot stuff same version across SL6 and EL7
        • and pull out old SL5 stuff
      • Look at pam_systemd module - what does it do?
      • Look at dconf or amd
        • amd is more heavily used than I had imagined. It's used for /group which is commonly used.
      • Look at whether we need anything better than existing network component for desktops
    • Add APACHECONF_SENSIBLE to computing.help (and upgrade to 6.5)
    • Upgrade jubilee to 6.5
    • RT 65774
    • Need to remove default bridge from kvmtool create
    • Think about disk partition policy
    • Test northern NX (from home)
    • Ship XML for ISO boot to Chris

  • Chris
    • Virtual DICE
      • Distribute new images
      • publish poster
      • school announcement
    • EL7
      • continue looking at gdm, including pam config +dconf configuration
      • produce a report on EL7 progress to main systems blog
    • url shortener (once gdm solved)
    • Projects EL7 blog - start populating
    • Upgrade amarela, vermelha to 6.5
    • Create Project entries - for KVM refinement project
    • Experiment with adding an extra bridge to circle with same number as br0
    • Add section in KVM docs to describe how to add a boot from ISO section using XML
    • Think about disk partition policy

  • Stephen
    • LCFG client refactor stage 1
      • schedule debrief meeting
    • EL7
      • Continue thinking about boot.run functionality
      • Start EL7 component writing FAQ
      • Add pam-systemd to system-auth
      • Re-base on SL7 RC
    • Reboot hammersmith,
    • Think about PD - Interested in ZeroMQ
    • Take RT 68269
    • Arrange for piccadilly to move to Forum/AT
    • Deploy northern as staff.nx (first open up holes and test from home)
    • Have the SL5 RPMs archived (release space on sauce)
    • Think about disk partition policy

-- AlastairScobie - 30 Sep 2014

Topic revision: r14 - 05 Dec 2014 - 16:45:23 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies