MPU Meeting 17th July 2012

Simple KVM Service

  • There's a problem with kvmreport (bug 592) - the 'pool usage' section has out-of-date figures. Alastair will modify it to get its disk usage figures from LVM instead of libvirt.
  • We'd like to log what people are doing with the KVM commands so we can trace the causes of problems; Alastair will investigate libvirt logging.
  • There was a migration problem (RT 58480) - after kensal was migrated from bakerloo to hammersmith its 32 bit licence managers started crashing. The 64 bit ones were unaffected. The licence managers were apparently disturbed by the domain having been moved from an AMD-based server to an Intel-based one. A reboot sorted them out but this shouldn't have happened.
  • We have done no more migrations from circle because of the problems caused by br0 being on different subnets on other KVM hosts (most of circle's domains use br0). Could we have multiple bridge names for the same wire? That could ease the migration process.
  • We still need to update the (ancient) firmware on the IBM array before we use it for jubilee and hammersmith SAN pools. The array's last user is metropolitan so we'll try to get that emptied of important VMs (so we can stop it for long enough to do the firmware upgrade) or otherwise prise it loose from the array (e.g. buy some local disks to host the metropolitan VMs). As a first step Chris will update the metropolitan and central pages.

The server summary and detailed wiki pages for the KVM service have been fully updated.

SL6 Upgrades

Stephen is going to upgrade telford on Friday.

We talked about bugzilla - it needs updating as well as moving to an SL6 host. We wondered whether there might be an alternative which is easier to maintain and use. Chris will ask around. EPEL has a bugzilla 3.4 RPM - is this up-to-date with security releases? Chris will find out. Is there a newer one in the pipeline?

Server Hardware

Chris has cleaned up his script which finds out firmware/BIOS details, to make it save a standard set of data across all machines (serial numbers, models or description, all available revision/version numbers for all disks, RAID and FC cards, BIOS) to a sensible data structure. He's been learning about databases. The next step is to write the data to a test database, then to the orders database.

Security Enhancements

Stephen has been finishing his software and documenting it. The API is documented. The software itself is ready to use - it processes the logs, finds interesting events and puts them in a database. User-level documentation still needs to be written. The software takes an hour or so to process 24 hours' worth of logs so we don't envisage processing the several months' worth we have on the log host.

The reporting system is also working, but as yet there's no documentation for it.

Miscellaneous Developments

  • Stephen has changed the subversion component for Graham, to make it optionally not manage the webdav auth file.
  • At the same time he made it into a proper perl module. This has enabled Iain to successfully subclass it for svnsync.
  • Stephen looked into the automatic loading of the mptctl kernel module. It can't be done in the modprobe conf file as the mpt modules are loaded too early. For now he's using the hardware component to load it.
  • Following a discussion at the LCFG Deployers' Meeting Chris has added a new LCFG_RELEASE_NUMBER macro to the releases. It's basically the existing LCFG_RELEASE_VERSION but without any of the letters on the end; this lets people use it for CPP arithmetic comparisons so we can do "if before a certain release" or "if after a certain release".

Operational

  • Stephen rebooted hogwood and Chris rebooted bakerloo and jubilee. The desktops have now all rebooted to pick up the new kernel. Having a new kernel every three months seems about the right frequency - not too annoying yet not too obsolete.
  • The broken R720 hammersmith was fixed then installed and put into service.
  • We have two new package slaves, wildcat in AT and hare in the Forum. Stephen will move the PXE service to hare on Friday, after which we'll be able to free up schiff to become the new LCFG master.
  • Stephen reckons it is practical to do some surgery on gnome-disk-utility, so that we can install it again (to enable people to use it for encrypted USB keys) without alarming folk with its pop-up SMART warnings: so he'll do that this week. However we should understand why we're getting the SMART problems in the first place - perhaps smartd needs some configuration, for instance?
  • Once we have the SAN pools we'll make new KVM-based LCFG slaves.

This Week

  • Alastair
    • Modify KVM monitor script to use LVM to report VG usage rather than libvirt
    • Investigate libvirt logging so can log who does what on which kvm server - libvirt logging is for debugging. We want libvirt auditing, but that needs SL6.3.
    • Work through LCFG bugs
    • Check whether can have multiple bridges to one VLAN, so can migrate from using br0 to br for default wire.Experimentation suggests not (at least not with Redhat network configuration files)
    • Document CPU pinning
    • Auto guest location for kvmtool and rvirsh
    • Document cold migration
    • Report libvirt empty LVM group issue to Redhat - wait until SL6.3
    • Personal development topics
    • Review project list and prioritise

  • Chris
    • Update metropolitan and central guest list and encourage people to migrate off metropolitan
    • Server hardware project
    • File bug re bug in kvm reporting script (bug 592)
    • Find out status of support for bugzilla 3.4 and ask around for alternatives to bugzilla. Look in epel devel/pending for a later version of bugzilla - or even look at Fedora...
    • Personal development topics
    • Review project list and prioritise

  • Stephen
    • System security project
    • Look at removing smartd applet from gnome-disk-utility (palimpsest) so can use gnome-disk-utility's disk encryption functionality
    • Upgrade telford to SL6
    • Speak to Graham about Theon work
    • Personal development topics
    • Review project list and prioritise

-- AlastairScobie - 17 Jul 2012

Topic revision: r11 - 25 Jul 2012 - 14:47:06 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies