Release Management Procedures

This page documents how, why and when to make and install the managed DICE releases. It is purely for the use of the people whose responsibility it is to make releases - normally the ManagedPlatformUnit. Everyone else should instead read the ReleasesFAQ. There is also a project page (MPUProjectReleaseManagement) which was prepared for the DevelopmentMeeting.

Release Scripts

Releases are manipulated using a suite of scripts. These should be in the /usr/lib/lcfg/release-scripts/scripts/ directory on your machine (these notes work best if you have /usr/lib/lcfg/release-scripts/scripts in your path). If they are not present then you will need to install the lcfg-release-scripts package. This is done by including the live/release-scripts.h header into the LCFG source profile of your machine and running updaterpms.

ALL RELEASE SCRIPTS SHOULD BE RUN FROM YOUR OWN ACCOUNT, NOT AS ROOT. Do not attempt to run any release script as root. If you do, the script will complain then immediately exit. When a release script needs root permission it will arrange it automatically without your help.

The test instructions are in boxes like this.
Explanations, context and other material are not in coloured boxes.

Permission to run release scripts

Permission to run the release scripts - more accurately, permission to write to the releases section of the LCFG subversion repository - can most easily be granted by adding a name to the subversion.authzmembers_lcfg_mpu resource in the live/lcfg-subversion-authz.h header.

Permission to write to the MPU AFS space

You'll need to be able to write to the MPU AFS space which is /afs/inf.ed.ac.uk/group/mp-unit. More specifically you will need to be able to write new files to the cdroms and cdroms/devel directories there, and also to delete old files from those directories. (Both of these operations will be done for you automatically by release scripts.) man fs_listacl and man fs_setacl will help here.

Changing a Release

Once a testing release is on the LCFG Servers it should NEVER be changed. If a change is needed in a testing release, a copy should be made of the release, and the copy changed, then released.

Types of Release

There are two main types of release: testing and stable. These will be described separately.

Current Releases

To find out the numbers of the releases that are currently installed on the LCFG servers, run currentstablerelease or currenttestingrelease.

Listing the Releases

listtestingreleases lists the numbers of all the testing releases in the repository. liststablereleases does the same for the stable releases.

The Testing Release

The testing release is for the testing of weekly releases; if a release passes all tests it will later become the next stable release.

Testing releases come in two varieties.

New Testing Release

A new testing release is normally made every Monday morning, or as soon as possible afterwards. It is a copy of whatever is in lcfg/core at the time of the release's creation.

To make a new testing release, in your own account NOT as root run the script makenewtestingrelease. This numbers the release and copies it to the correct new directory in the Subversion repository.

Once this has been done, the release should NOT be changed in any way. If a change is needed, the procedure is to make a copy of this release then change the copy. Read the next section to find out how to do this! If not, skip straight to Installing a Testing Release (of either kind).

Copied Testing Release

A copied testing release is made when you need to make a change in another testing release - probably as a result of the release failing a test. No release is ever changed once it has been installed on the LCFG servers - instead you copy it to a new release and make the changes in the copied release; then install that on the LCFG servers for further testing. If more changes are subsequently needed, copy the copied release; and so on.

To make a copied testing release:

  1. Find out the number of the release you want to copy. The script currenttestingrelease tells you the number of the testing release currently installed on the LCFG servers.
  2. Run copytestingrelease giving the number of the release to be copied as an argument.
  3. Note the release number and command which copytestingrelease will give, and use the command to check out a copy of the release.
  4. Make your required changes then check them in to the repository.
  5. Remember that any changes you make in the testing release will almost certainly need to be made in the core too!

Installing a Testing Release (of either kind)

  1. To install a testing release, login to lcfg-master and (in your own account, not as root) run installtestingrelease giving the number of the release to be installed as an argument. For example installtestingrelease 2016102401
  2. Remember to mail COs with details of the new release. Your mail should include a list of the differences between the new testing release and the last stable release. To generate this run generatediff.
  3. Then build and install this week's new CD images: buildtestingcd
  4. Lastly, mail Richard (or whoever is doing the release testing) to tell him about the new release and to ask him to test it. See the next section for more on testing the release.

Testing a Testing Release

The installed testing release has to pass all of its tests before it can be made the next stable release. This is how to do the tests.

Richard usually does the release testing, but whoever makes the testing release (usually Carol) should tell him when a testing release is ready for him to start testing.

Tests Summary

For SL7 we do an install test and an upgrade/downgrade test on an office desktop configuration, and for a studentlab configuration.

We also check a DICE minimal server and an Inf-level (ultra-basic) server. This can be summarised like this:

SL7 office desktop upgrade/downgrade
SL7 studentlab upgrade/downgrade
SL74 office desktop upgrade/downgrade
SL7 studentlab install
SL7 office desktop install
SL7 Inf-level server upgrade
SL7 DICE minimal server upgrade

Strictly speaking this doesn't test every possible combination, but there is a limit to the time and effort available for testing, and this is about as much as we need to do to make problems show up.

At times when we are transitioning from one platform to another (e.g. SL7.4 to SL7.5) it is necessary to duplicate the office desktop and studentlab tests so they are done on both platforms. This should be done by creating more LCFG profiles so that the tests can be run in parallel.

The test machines

Richard or Carol run the release tests on 3 KVM virtual clients on mandolin:

bigglesvb3 SL7 (x86_64) routine weekly testing
whinfell SL7 (x86_64) routine weekly testing
rosehearty SL74 (x86_64) platform transition testing

If you need to make new test machines, they should be configured to be standard DICE desktops as far as possible, but with the addition of the following two lines to their LCFG files:

#include <live/release-scripts.h>
!profile.release mSET(testing)

Your own DICE desktop will also need #include <live/release-scripts.h in order to run lcfg-check-release.

Test 1: Upgrade with a Reboot

  1. Take a machine that's using the stable release.
  2. Run updaterpms (om updaterpms run) to make sure that its software is fully up to date. (If it isn't, abandon the machine and start again with another, or run updaterpms then reboot before continuing.)
  3. Configure the machine to use the testing release (!profile.release mSET(testing) and check that the profile compiles).
  4. Wait until the profile has been downloaded by the machine (monitor its /var/lcfg/log/client).
  5. Reboot the machine, and check that it boots properly. In particular,check that updaterpms does not hang.
  6. Login and check you can see your home directory.
  7. Check the LCFG status page for the machine and check that all components are active (only green squares in the first column, there should not normally be any blue squares).
  8. Run /usr/bin/lcfg-check-release machine_name for each client and check for "success!"

Test 2: Upgrade then Reboot

  1. Take a machine that's using the stable release.
  2. Run updaterpms (om updaterpms run) to make sure that its software is fully up to date. (If it isn't, abandon the machine and start again with another, or run updaterpms then reboot before continuing.)
  3. Configure the machine to use the testing release (!profile.release mSET(testing) and check that the profile compiles).
  4. Wait until the profile has been downloaded by the machine (monitor its /var/lcfg/log/client).
  5. Run om updaterpms run on the machine, and check that it succeeds. Look in /var/lcfg/log/updaterpms for errors or warnings at the end, and check the machine's LCFG status page as in the first test.
  6. Reboot the machine.
  7. Login and check the machine as in the first test.
  8. Check the LCFG status page for the machine and check that all components are active (only green squares in the first column, there should not normally be any blue squares).
  9. Run /usr/bin/lcfg-check-release machine_name for each client and check for "success!"

Test 3: Downgrade back to "stable"

  1. Upgrade a machine's packages using any of the previous tests.
  2. Set the machine back to the stable release (!profile.release mSET(stable) and check that the profile compiles).
  3. Wait until the profile has been downloaded by the machine (monitor its /var/lcfg/log/client).
  4. Run om updaterpms run on the machine, and check that it succeeds cleanly with no errors or warnings.
  5. Check the LCFG status page for the machine and check that all components are active (only green squares in the first column, there should not normally be any blue squares).
  6. Run /usr/bin/lcfg-check-release machine_name for each client and check for "success!"

Test 4: Install

  1. Configure a machine to use the testing release (!profile.release mSET(testing) and check that the profile compiles).
  2. Find the latest lcfginstall-arch-testing CD image in /afs/inf.ed.ac.uk/group/mp-unit/cdroms/devel/
  3. Ssh onto mandolin (the host machine for the testing clients), cd /var/lib/libvirt/images and (as root) delete the existing images there already
  4. Copy the latest lcfginstall-arch-testing image into /var/lib/libvirt/images
  5. As you, run virt-manager on mandolin and double click on the host machine you want to use
  6. Click on the small black down-arrow icon at the top of the window and select 'force off'
  7. Click on the 'information' icon, then 'IDE CDROM 1'. Click 'Disconnect' then 'Remove'
  8. Click 'Add Hardware', 'Select managed or other existing storage', 'Browse'
  9. Click 'Browse local' and navigate to /var/lib/libvirt/images if not there already, select the image you want to use
  10. Set 'Device type' to 'IDE CDROM' and 'Storage Format' to 'raw'. Leave all other settings and click on 'Finish'
  11. Click on the monitor icon at the top of the screen, followed by the play icon
  12. As the virtual client starts, press Escape, then 4 to boot off DVD/CD. The process from here on is as with a standard DICE box, although the display will appear to freeze during the installation. The display output will have switched over to serial console and can be accessed in the normal way: ssh atconsoles , console bigglesvb3 for example.
  13. When prompted on the serial console, enter your admin principal. On SL7 this happens early in the install; on older releases it happens at the end.
  14. Once the installation has finished the graphical login screen will be usable. Login to the machine and check that you can alter files in your home directory and use the network (e.g. examine a web page or ssh to another machine). If you use gnome you'll probably need to reboot the sl7 machines (or switch to MATE) before you'll be able to log in. It seems to have been this way for some time, and isn't considered a big issue.
  15. Run /usr/lib/lcfg/release-scripts/scripts/lcfg-release-tests on the test host or lcfg-check-release --warnings --verbose testhost from your desktop machine.
  16. Check the LCFG status page for the machine and check that all components are active (only green squares in the first column, there should not normally be any blue squares).
  17. Run /usr/bin/lcfg-check-release machine_name on each client and check for "success!"

Test 5: deleted

Test 6: Inf level profile

We do a few tests at the "inf" level to check whether LCFG users outside Informatics are likely to have any problems with this release. The inf level is as near as we can get to using just the lcfg level - which is deliberately not complete and usable in its own right (e.g. you may need to enter a password to login to polecat. See https://wiki.inf.ed.ac.uk/DICE/MPUInfLayer).

Look at the profile status page for the inf-level test machine polecat (SL7) on either of the main LCFG servers. Are there any compilation errors or warnings?

http://lcfg2.inf.ed.ac.uk/cgi/status.cgi/inf.ed.ac.uk/polecat.html
http://lcfg1.inf.ed.ac.uk/cgi/status.cgi/inf.ed.ac.uk/polecat.html

Test 7: Inf level updaterpms

  1. Login to the inf-level test machines polecat.
  2. Type nsu
  3. Type om server run
  4. Type om updaterpms run
  5. Are there any problems in the updaterpms log?

Test 8: Inf level LCFG server

The inf-level test machine polecat runs an LCFG server which compiles a grand total of one test profile. Check that this profile has compiled with no errors or warnings.

http://polecat.inf.ed.ac.uk/cgi/

Test 9: DICE-level server profile

Look at the profile status page for the dice-level test LCFG slave (lcfgtest) on either of the main LCFG servers. Are there any compilation errors or warnings?

http://lcfg2.inf.ed.ac.uk/cgi/status.cgi/inf.ed.ac.uk/vole.html
http://lcfg1.inf.ed.ac.uk/cgi/status.cgi/inf.ed.ac.uk/vole.html

Test 10: DICE-level server updaterpms

This should give us early warning of a problem which could affect our own DICE LCFG servers.
  1. Login to the DICE-level test slave lcfgtest.
  2. Type om server run
  3. Type om updaterpms run
  4. Are there any problems in the updaterpms log?

How to Report Test Results

If the tests don't show up any problems, email MPU to say all test have been passed. If a test does show up a problem, please report it in bugzilla.

To report a release testing problem:

  1. Either do this from a DICE machine, or FIRST visit https://authportal.inf.ed.ac.uk/login/
  2. Visit https://bugzilla.inf.ed.ac.uk/
  3. Choose "File a bug"
  4. Choose "Managed Platforms Unit"
  5. Choose Component: "Release Testing"
  6. Fill in the details - the architecture being tested, an extract copied from the updaterpms log, that sort of thing.

Stephen, Richard and Chris will get mail automatically. To ensure that you are mailed about the release testing bugs - and if you are doing release testing, this is a good idea - do this:

  1. Go to https://bugzilla.inf.ed.ac.uk/
  2. Click "Preferences" (at the bottom).
  3. Click "Email Preferences".
  4. Find the "User Watching" section near the bottom of the page.
  5. Watch this user: release-testing@bugzilla.bugs
  6. Submit Changes.

The Stable release

A stable release is a copy of a testing release which has passed all tests and been copied to the lcfg/releases/stable tree in the repository. A stable release is only ever made by making an exact copy of a testing release. If you need to change a stable release, make a copy of the testing release on which it is based, and change that - then if that passes its tests it can be made into a new stable release.

A stable release is only ever made from a testing release which has passed all of its tests.

To make the current testing release into a new stable release, and install that stable release on the LCFG servers, login to lcfg-master and, as yourself and not "root", run makeinstallstablerelease.

Don't forget to warn the COs in advance by mail about the LCFG profile rebuild that will result; it currently (March 2017) takes around 35 minutes (Jan 2019) takes around 53 minutes. The stable release is normally made and installed on a Wednesday (Monday and Tuesday are given over to the testing of the testing release) starting at 2.30pm - early enough that COs will have time to see the changes that result, yet late enough that machines rebooted during most lab sessions won't pick up the new stable release and start installing loads of new packages.

Once the installation process has completely finished - it is important that you wait for it to finish - you should then build a new install CD image. Do this by running the script buildstablecd.

Release Numbers

Each release is given a unique number. These numbers are concocted and allocated by the scripts which make the releases.

Each release that is a direct copy of lcfg/core is numbered YYYYMMDDRR, where YYYY is a four-figure year (e.g. 2006), MM a two-figure month (e.g. 07), DD a two-figure day of the month (e.g. 05) and RR a two-figure number representing the number of releases made so far that day - the first release of the day getting the number 01, the second release in the same day 02, and so on.

The first eight figures of the number correspond to the output of the command date +%Y%m%d

A release that is an altered version of an existing release has the existing release's number with the letter a added to the end. If a letter is already on the end then increment it by one, so a would become b and c would become d. Examples would be 2006070501a and 2006070501b.

Each stable release is simply a direct copy of a testing release and uses the same number.

Note that the alphabetical order of the release numbers must always reflect their release dates: the last item in an alphabetically sorted list of release numbers should always be the number of the most recent release. LCFG distribution software relies on this.

Each numbered release is kept in the repository indefinitely. Although this quickly results in lots of releases all being stored in the repository, we shouldn't quickly run out of space for the repository as a result, because Subversion keeps multiple versions of files as links and sets of differences rather than as entire discrete copies. However it will mean that we should no longer check out the entire repository as a working copy, because checked out versions will be discrete copies! Anyone wanting to edit the normal include and package files should just check out the lcfg/core and lcfg/live directories rather than checking out the whole of lcfg.

Branches

It's assumed that lcfg/core will always be left in a usable state, suitable for all machines. If you want to make big changes or do radical experiments on headers or packages, you're encouraged to do it in a separate branch copy of lcfg/core.

It might also be useful to have machines follow a particular release for longer than the usual weekly cycle, for example, to allow preparation and testing for lab exams.

If you are preparing for an exam please use the makeexambranch script which helps ensure the branch name matches the standard naming scheme.

/usr/lib/lcfg/release-scripts/scripts/makeexambranch

Otherwise there is a makenewbranch script which will do most of the work for you, e.g.:

/usr/lib/lcfg/release-scripts/scripts/makenewbranch 2009090701 sjqtest

the first argument is the name of the release you want to copy and the second is the name of the branch you wish to create. Do not use the same name for the branch as the release (the script won't let you anyway), this could thoroughly confuse the LCFG server. You would be much better off using a descriptive name such as "exam20091210".

Once you have your branch prepared you need to do a little bit of work on the LCFG slave servers. We do not make branches available automatically to avoid bad mistakes. You will need to edit the live/lcfg-slave-server.h header file and add something like this:

!file.files         mADD(sjqtestbranch)
file.file_sjqtestbranch      LCFGCONF/server/releases/sjqtest
file.type_sjqtestbranch      link
file.tmpl_sjqtestbranch      LCFGCONF/server/svn/branches/sjqtest

As soon as this is processed by the file component on the various LCFG slave servers you should see a big load of information about new headers and package lists appear in the LCFG server logs. Once this has happened on all the slaves you are ready to use the new branch. You can set any LCFG profile to this release in the normal way:

!profile.release      mSET(sjqtest)

You can now checkout the branch, edit it and commit changes in the usual way. Note that this works like the develop branch in that any changes submitted will result in an immediate recompilation of all dependent LCFG profiles. Please be considerate of other COs if your changes are likely to cause large recompilations, particularly as any errors could result in profiles "bouncing" between develop and your branch which can make the server run very slowly.

A Typical Week

  • On Monday morning Carol makes and installs a new testing release.
  • When she's done this she tells Richard and he starts testing it.
  • Richard finds that one of his test machines won't install properly. He reports the problem.
  • Stephen and Chris (and anyone else watching the release testing bugs) automatically get a copy of the problem report. One of them tracks down the package or file responsible for the problem and asks its owner to fix it.
  • The solution to some problems will involve making changes to headers or package list files in the testing release. When this is necessary Stephen or Chris will:
    • make a copy of the testing release
    • check it out (the script which makes the release tells you exactly how to do this)
    • in this new copy of the testing release, make the exact change needed. It helps to ask whoever boobed to provide an exact diff of their fix, as misunderstandings occur easily and can be disastrous. Typing svn diff before committing a change is an easy way to make a diff.
    • svn commit the changes.
    • Install this testing release.
    • ask Richard to run the tests again. He'll want to know whether he can use the same testing CD install images as for this week's original testing release or whether new CD images will be needed. It's rarely necessary to make a new set of CD images for a copy of a testing release, since the CD images contain only a few hundred core packages used for bootstrapping the system, and these core packages are rarely to blame when a release fails a test.
  • Eventually Richard reports success on all the tests.
  • Once all the tests have been passed, everyone relaxes and waits for Wednesday.
  • On Wednesday afternoon at half two Richard or Carol makes and installs the stable release and makes that week's install CD images.

In Dire Emergency Only

Very rarely, a really nasty fault is discovered in the stable release. If this can't be patched up using a live header file, and a change to the current stable release is really necessary, this is what to do:

  • Never ever ever directly change the stable release! No! Just don't! This is NEVER done!
  • Instead, copy the identically-numbered testing release. Here's how to find the number of the current stable release.
  • Check out the testing release you've just made.
  • Make the crucial change.
  • svn commit
  • Install your new testing release. This will obviously disrupt whatever testing is going on at the time, but you're dealing with an emergency here.
  • Make and install a new stable release. A new stable release is only ever a copy of the current testing release. That's why the previous step is necessary.
  • Reinstall the testing release that was installed before you replaced it with your new one.

Adding new tests

It is possible to add test scripts to the weekly release testing.

This is done by adding scripts to the 'tests' directory in the lcfg-release-scripts package (which is in svn.lcfg.org). Please talk to MPU before just chucking them in there though.

It's a very simple framework, the script should exit zero if all is well and non-zero when something is wrong. Preferably when an error is detected you should also send some diagnostic output to stderr.

Each script might be passed either (or both) of the --verbose and --warnings flags. The first is fairly self-explanatory the second is really intended to control whether minor issues should be considered the same as errors. For example, the 'check_components' script looks for component files in /var/lcfg/log/ which have the '.err' suffix, with the --warnings flag it also checks for the '.warn' suffix.

You can use any language you like. A Perl script can handle the options like this:

use Getopt::Long ();
my ( $verbose, $warnings );
Getopt::Long::GetOptions( 'v|verbose'  => \$verbose,
                          'w|warnings' => \$warnings )
    or die "Could not parse command line options\n";

to error out when problems are found just use 'die', e.g.:

if ( !$ok ) {
   die "$errmsg\n";
}
Topic revision: r129 - 30 Jan 2019 - 15:25:48 - RichardBell
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies