Managing a KVM host

Expected audience.

These instructions are intended for the manager of a KVM host. It is not expected that people will performing any of the tasks described below on machines not managed by their Unit (other than in serious emergencies).

How to configure a machine to provide KVM hosting

Add the following headers to the machine's profile :-

#include <dice/options/kvm-server.h>
#include <dice/options/kvm-client.h>

You will need to add a bridge interface, and configure this on top of your primary network interface. Number the bridge interface after the subnet used by the KVM server. For instance for a server on wire s33, make a bridge called br33 like this:

!network.interfaces   mADD(br33)
network.hostname_br33  <%profile.node%>
network.netmask_br33   255.255.255.0
network.type_br33      Bridge
network.delay_br33     0
network.stp_br33       off

network.bridge_eth0   br33

If you are using network bonding, you should replace the last line above with

network.bridge_bond0  br33

Configuration to support multiple wires is more complicated - contact MPU for assistance.

Disable some default behaviour

virsh net-autostart default --disable

Create a storage pool

There is a local convention of naming storage pools as hostpoolnumber where number starts from 1 - so for example the first pool on circle is named circlepool1.

We use LVM for storage pools. The LVM component can be used to create volume groups. Each LVM storage pool maps to an LVM volume group. We use abbreviated names for the LVM volume groups - so the pool circlepool1 would be mapped to LVM volume group cp1. virsh/virt-manager/kvmtool will create the LVM logical volumes.

It is important that you use device names in the form /dev/disk/by-id and not short device names such as /dev/sdd1 as the latter can change between system boots.

The following will create an LVM volume group called kp1 on /dev/disk/by-id/scsi-36d4ae5209c31cd00174555f61d1f3cfb-part1 :-

#include <dice/options/lvm.h>

!lvm.vgs        mADD(kp1)
!lvm.pvs_kp1    mADD(/dev/disk/by-id/scsi-36d4ae5209c31cd00174555f61d1f3cfb-part1)

The following will create a KVM storage pool kelvinpool1 on the LVM volume group kp1:-

[kelvin]root: virsh  pool-define-as kelvinpool1 logical --source-name kp1 --target /dev/kp1
Pool kelvinpool1 defined

[kelvin]root: virsh pool-start kelvinpool1
Pool kelvinpool1 started

[kelvin]root: virsh pool-autostart kelvinpool1
Pool kelvinpool1 marked as autostarted

Add another disk to an existing LVM volume group

The following will add the device /dev/sde to the LVM volume group vg1

[kelvin]root: pvcreate /dev/sde
[kelvin]root: vgextend vg1 /dev/sde

You can use pvscan to check that the volume group vg1 has grown as expected.

Add support for extra network wires

Adding support for extra network wires is done in two steps. First, add the appropriate extra vlans to the switch configurations for that host. Second, add a VLAN and Bridge for each new wire - CPP macros are provided to do this for you. The host itself does not need an IP address on the additional wires (though arguably that may make debugging simpler).

Eg, to add support for the DICE wire (129.215.24.0), add the following macro invocations to the host's profile.

LCFG_NETWORK_ADD_VLAN(24,bond0)
LCFG_NETWORK_ADD_BRIDGE(24)

You can then bring up the interfaces using ifup (or reboot)

[bakerloo]root: ifup vlan58
[bakerloo]root: ifup br58

Hot migrating a guest from one server to another

  • Warning: it is not expected that non MPU members will be migrating guests between hosts managed by MPU.
  • Warning: aborting a migration will kill the running guest on the original host.
  • When hot migrating a guest, the target volume group must have the same name as the original volume group. The workaround to this is to create a temporary softlink in /dev

Assuming moving guest fibble from host apple (volume group /dev/ap1) to host grape (volume group /dev/gp1). Fibble's volume size is 15GiB

Determine the size (in GiB) of fibble's volume on apple
[apple]: lvs --units g /dev/ap1/fibble

Create an LVM volume on grape in pool /dev/gp1 with same size as existing LVM volume on apple
[grape]: lvcreate -L15G gp1 -n fibble Remember that some VMs may have more than one disk volume. If so, create a new volume for each one.

Create a softlink /dev/ap1 on grape to pretend that the volume group /dev/ap1 is available on grape by adding this to lcfg/grape
LIVE_MPU_KVM_POOL_LINK(gp1,ap1)
(This makes a soft link which will persist across grape reboots.)

Migrate fibble -this may take some time
rvirsh  apple migrate --live fibble qemu+ssh://grape.inf.ed.ac.uk/system --copy-storage-all --verbose --persistent --undefinesource

If the migration fails with this message
error: Unsafe migration: Migration may lead to data corruption if disks use cache not equal to none
then change the caching mode on the guest's disk volumes
kvmtool edit --name fibble and add cache='none' to the disk volume's driver line; then restart the guest with kvmtool shutdown --name fibble and kvmtool start --name fibble to bring the configuration change into use.
Then try the migration again.

Fix fibble's xml so that it refers to the new volume group /dev/gp1
kvmtool edit --name fibble

Once you're happy that fibble has been successfully migrated, delete fibble's original volume on server apple
[apple]: lvremove ap1/fibble

Hot migrating a guest from one site to another

This is a mashup of Hot migrating a guest from one server to another and Moving a machine to a different wire without reinstalling.

In this example we move guest skippy from host east (volume group /dev/ep1) to host west (volume group /dev/wp1). Skippy's volume size is 20GiB.

Determine the size (in GiB) of skippy's volume on east
[east]: lvs --units g /dev/ep1/skippy
Create an LVM volume on west (in its pool /dev/wp1) with the same size
[west]: lvcreate -L20G wp1 -n skippy
Remember that some VMs may have more than one disk volume. If so, create a new volume for each one.
Create a softlink /dev/ep1 on west to pretend that the volume group /dev/ep1 is available on west by adding this to lcfg/west
LIVE_MPU_KVM_POOL_LINK(wp1,ep1)
Open a login window on skippy
ssh skippy
Edit skippy's profile to make two changes
Firstly, add (just above the mac address) the line
!network.ipaddr_eth0 mSET(DHCP)
Secondly, change the live/wire header to the new wire.
Check that the changes have reached skippy
[skippy]: qxprof network.ipaddr_eth0
Change skippy's IP address
rfe dns/inf
(Use the new server's default subnet if you can. If you don't know this, consult the server's own wiki page which you can find at SimpleKVMDocs.)
Update the DHCP server for the destination site (the aliases are atdhcp, ifdhcp and kbdhcp)
om kbdhcp.dhcpd configure
Ensure that it has the new address
[skippy]: om dns update
Check that skippy has the new IP
[skippy]: nslookup skippy
Ensure that skippy's /etc/hosts does not contain its old address
[skippy]: om network restart
Migrate skippy. This may take some time
rvirsh east migrate --live skippy qemu+ssh://west.inf.ed.ac.uk/system --copy-storage-all --verbose --persistent --undefinesource
If the migration fails with this message
error: Unsafe migration: Migration may lead to data corruption if disks use cache not equal to none
then change the caching mode on the guest's disk volumes
kvmtool edit --name skippy and add cache='none' to the disk volume's driver line; then restart the guest with kvmtool shutdown --name skippy and kvmtool start --name skippy to bring the configuration change into use.
If the migration fails with this message
error: cannot get interface MTU on 'br32' (or some other br number)
then fix the guest's bridge device
kvmtool edit --name skippy
and change the value of source bridge to br0. (This assumes that skippy uses the destination server's default subnet.) To bring this into effect you may have to shut skippy down (e.g. with kvmtool shutdown --name skippy), wait for it to power off, then start it again.
Then try the migration again.
Fix skippy's xml to refer to the new volume group /dev/wp1
kvmtool edit --name skippy
Once skippy has been migrated, it won't be able to talk to the network. To fix this, reboot it. Because you can't currently talk to it, this must be done indirectly and in several stages.
Shut it down
kvmtool --host west --name skippy shutdown
Wait until
kvmtool --host west --name skippy info
says
Guest is not currently running
Start it up again
kvmtool --host west --name skippy start
Shift its console to the new site in live/console_server.h
Move its KVM server wiki entry from the old server's page to the new. (They're all linked from SimpleKVMDocs.)
Remove the network.ipaddr_eth0 line you added earlier
rfe lcfg/skippy
Once you're happy that skippy has been successfully migrated, delete its original volume on server east
[east]: lvremove ep1/skippy

Cold migrating a guest from one server to another

We can find no way to cold migrate a guest using standard kvm/libvirt tools. The following manual process using netcat can be used :-

Assuming moving guest fibble from host apple (volume group /dev/ap1) to host grape (volume group /dev/gp1). Fibble's volume size is 15GiB

Create the new LVM volume on grape
as for hot migration. The /dev softlink hack is not required.

Choose a free TCP port on grape
use netstat -lntu to check free. In this example we've chosen port 4000.

On grape, prepare to receive the data
nc -l 4000 | dd of=/dev/gp1/fibble bs=64k

On apple, initiate the data transfer
dd if=/dev/ap1/fibble bs=64k | nc grape 4000

For the paranoid
compare result of running md5sum on original and copied LVM volumes (doesn't really take long, honest, so worth doing)

Copy across fibble's XML file on apple to grape
this lives in /etc/libvirt/qemu/fibble.xml

Register fibble on grape
rvirsh grape define /etc/libvirt/qemu/fibble.xml

Unregister fibble on apple
rvirsh apple undefine fibble

Fix fibble's xml on grape and remove original volume on apple
as for hot migration

Cold migrating a guest from one volume group to another (on same host)

Assuming moving guest fibble from volume group /dev/np1 to /dev/np2

Create the new LVM volume in volume group /dev/np2
as for hot migration

Use dd to copy across the volume data
dd if=/dev/np1/fibble of=/dev/np2/fibble bs=64k

Fix fibble's xml on to refer to /dev/np2/fibble and remove original volume /dev/np1/fibble
as for hot migration

CPU pinning (also known as processor affinity)

The default KVM policy is to run guests on any available CPU core. This is fine for non-NUMA architecture machines (typically desktops and single-processor-socket servers), but sub-optimal for NUMA architecture machines (typically multi-processor socket servers) where this default policy can result in guests accessing memory across cross-node links. KVM allows the pinning of individual guests to specific processor cores - this can be used to ensure that a guest will only ever use the CPU and memory of one physical processor. NUMA servers in Informatics are - Dell R710, Dell R720.

Unfortunately the Linux kernel doesn't necessarily order CPUs in a nice sequential fashion. For example, currently for a Dell R720 with a 6-core E5-2650 processor, the processors on the first physical socket are numbered 0,2,4,6,8,10,12,14,16,18,20,22 with those on the second socket being numbered 1,3,5,... You should use the virsh capabilities command (see the <topology> section) to see how the processors are numbered.

You can implement CPU pinning for a guest either by using the virt-manager command or by editing the guest's XML config file. Eg, the following

<vcpu placement='static' cpuset="0,2,4,6,8,10,12,14,16,18,20,22">1</vcpu> 

will pin a guest to the first physical processor (on a Dell R720).

Note: it is expected that SL6.3 will allow automatic NUMA friendly CPU pinning without the need for working out which CPUs belong to which physical socket.

For further details see RHEL6 Virt Admin Guide Ch9.

Reinstalling a KVM server and its guests

This procedure assumes that the guest image files are still intact - either by being on a SAN volume or on hot-plug disks. No backup of guest image files is taken.

Warning: the below has not yet been tested.

  • Reinstall server
  • Reinstate LCFG configuration for KVM and network resources
  • Create appropriate LVM configuration for the the KVM storage pool(s) - see above section on creating a new storage pool
  • Restore the guest config files from rmirror backup into /etc/libvirt/qemu. Be careful not to blast away any existing files in that directory.
  • Start the guests

Remove a KVM server from the pool

If a KVM server is broken or otherwise abnormal you may want to remove it from the HOST_kvmserver netgroup used by kvmtool. Do that by putting the following line at the top of the host's lcfg file:
#define MPU_NOT_A_KVMSERVER

When you can't connect to a KVM server

Sometimes libvirtd dies. When that happens the VMs carry on happily but you can't connect to the server using kvmtool, virt-manager, virsh and so on. The solution is simply to login to the server and restart libvirtd. Please ask the MPU to do this for you unless you're facing a pandemic-type situation. This is how to restart libvirtd:
$ ssh <the affected KVM server>
$ nsu
# /etc/init.d/libvirtd restart  systemctl restart libvirtd
It's safe to do this with VMs running since the script does not start or stop VMs. (That's done by a separate script called libvirtguests.)

To upgrade or reboot a KVM server

See the KVM server upgrade page.

Swapping /var and /var/cache/afs without reinstalling

We realised that we'd made /var too small and /var/cache/afs unnecessarily large, so we tried swapping them on circle. We did it as follows:
  • Migrate the guests elsewhere.
  • Pacify nagios.
  • Record the output of 'df -t ext4' - e.g.
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       9.5G  2.5G  6.6G  28% /
/dev/sda4       1.9G  1.2G  690M  63% /var
/dev/sda5       7.6G  1.1G  6.2G  15% /var/cache/afs
/dev/sda6       113G   61M  107G   1% /var/lib/libvirt/qemu
  • Take the machine down.
  • Edit its profile, find the partition definitions, and swap the partition names and mount options.
  • Check that the profile compiles.
  • PXE-boot the machine.
  • mkdir -p /move/root
  • mount /dev/sda1 /move/root
  • edit /move/root/fstab and swap the /var and /var/cache/afs names, mount options and check order (fields 2, 4 and 6).
  • mkdir /move/sda4 /move/sda5
  • mount /dev/sda4 /move/sda4
  • mount /dev/sda5 /move/sda5
  • cd /move/sda5
  • /bin/rm -rf C* D* VolumeItems (i.e. delete everything except lost+found)
  • mkdir tmpvar
  • rsync --dry-run -avHApESx /move/sda4 tmpvar
  • rsync -avHApESx /move/sda4 tmpvar
  • rmdir tmpvar/lost+found
  • mv tmpvar/* .
  • rmdir tmpvar
  • emacs lcfg/conf/fstab/fstab/fstab.sda and swap the definitions of /var and /var/cache/afs here too (fields 2, 4 and 6).
  • cd /move/sda4
  • /bin/rm [a-k]* lcfg lib local lock log [m-z]* (delete everything except lost+found)
  • cd /
  • umount /dev/sda1
  • umount /dev/sda4
  • umount /dev/sda5
  • Take a deep breath
  • reboot
Topic revision: r42 - 28 Oct 2019 - 10:15:56 - ChrisCooke
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies