MPU Meeting Tuesday 19th June 2012

Simple KVM

Stephen reported that we would need about 4 x 50GB of space for four new guests to replace Richard's release-testing clients which are currently hosted on Virtual Box. This is a significant chunk of space. We'll put two on northern and two on piccadilly, both using the local disk pool.

Alastair will enhance his daily KVM guests list mail to include some information about pool use and possibly also warnings of pools becoming full, so we have some idea of how much space is being used and where.

There's no news yet on the replacement for our damaged R720.

We debated the question of migration between servers - how we should ensure that there was enough disk space on the receiving server to host the migrating guests, and whether servers should be regarded as equal partners or as principal server and backup server. For northern and piccadilly there are enough MPU sacrificial guests that enough space can be made for other guests when necessary. For jubilee and its future sibling we won't have sacrificial guests, but we can simply keep enough space aside unused and use that to host migrated guests. In both cases we think it will be simplest to regard each server as equal in status to its sibling.

Chris has prepared a new version of the SimpleKVMDocs page. It should reduce CO confusion.

Following a suggestion of Neil's, Alastair will consider how to have rvirsh and kvmtool automatically know which server a guest is located on.

We talked about possible ways to move guests off circle now that its support status is clearly less than that of other KVM servers. Alastair will check with Infrastructure whether it would be possible to add the AT 202 wire to any Forum-based KVM servers to ease the migration.

SL6 Upgrades

No activity.

Server Hardware

Chris has written a script which finds out lots of BIOS and firmware levels from the server on which it runs. It currently outputs data on the machine BIOS and on RAID controllers and disks. He'll add FC cards to the script. Ideas on other hardware to query would be welcome.

Some machines with the sas5iR.h header do not load the mptctl kernel module so their controller and disks cannot be queried; Chris will look into fixing this.

It would be useful to upload all this data to the Orders database so that reports can be done centrally. Alastair and Chris will look at this together.

Security Enhancements

Stephen has got the reports scripts running. They report on files and directories added, modified or removed, in the bits of the filesystem we're interested in. In addition they report the running of suid scripts and they report attempts to install or remove kernel modules. These are pretty useful things to know. One problem is with amd which can cause /etc/mtab to be changed fifty times a day. We can reconfigure to stop the /etc/mtab changes. The report scripts only report files which are not owned by packages, so they need to be complemented by regular RPM verify runs. These will be a little pointless until we have signed RPMs, which will be a little pointless until we have a PGP signing infrastructure in place. Another very worthwhile security measure would be to keep a copy of the RPM database off-machine, to get round an attacker's pockling of the database to hide intrusion.

Stephen is now moving on to monitoring logs. The plan is to go through the files on the loghost once a day to look for new entries. These will be parsed and split; those we are interested in can then be copied out into a database, which can then be used in the analysis stage.

Misc Development

Toohot
The component is at last running everywhere. The old script has been replaced. Most of the component's configuration has been moved to the lcfg level.

Boot
The new version seems to be fine so Stephen will roll it out everywhere. This will help with the rollout of the auditd component.

LCFG Slaves
Our experiments have shown that the fastest solution which we can implement without more development time (ruling out the use of the shared memory filesystem for the moment) is to have LCFG slaves running in VMs on the two Dell R720 KVM servers, each VM to have several GB of memory and to use the SAN-based pool. (Not explicitly mentioned at the meeting but we could profitably try it: configure CPU pinning to give the LCFG slave VMs exclusive access to all the threads on a CPU core.)

Operational

We need to rethink our kit allocation now that we have two new KVM servers.

Next Meeting

The next MPU meeting will be at 2pm on Wednesday 27th June.

This Week

  • Alastair
    • Mods to kvm mail - group by pool, warn < %
    • kvmtool - add log of who created kvm
    • Document cold-migration
    • Document CPU pinning
    • Consider how to give rvirsh and kvmtool auto-guest-location functionality (netgroup?)
    • Test hot migration with Chris
    • Check with George/IanD whether can add AT-202 wire to Forum KVM server(s) to make it easier to migrate Already available in Forum
    • Chase Dell re dead R720 Have replacement backplane - hopefully fitting on Thursday
    • Chris and Alastair to look at how can store BIOS revisions in orders database
    • Work through LCFG bugs

  • Chris
    • Test hot migration with Alastair
    • Add sas header to our PE860 machines' headers
    • Chris and Alastair to look at how can store BIOS revisions in orders database
    • Server hardware project

  • Stephen
    • Richard's release-testing desktops (4 of) - 2 -> northern, 2 -> piccadilly (local storage)
    • Upgrade telford to SL6
    • Release new lcfg-boot component to testing/stable
    • Reboot hogwood for SL6.2 kernel
    • Process other units' responses about their perl-AFS module usage (which functions etc)
    • Speak to Graham about Theon work

-- AlastairScobie - 19 Jun 2012

Topic revision: r8 - 27 Jun 2012 - 12:29:45 - ChrisCooke
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies