Upgrading the Package Master Server

This is a note of the problems encountered when the Packages Master brendel was upgraded from SL5 to SL6 in December 2012.

  1. The installation of the first wave of RPMs failed as the install was trying to get them from http.pkgs.inf.ed.ac.uk which of course was served from brendel itself and wasn't available. This problem had been anticipated and updaterpms.rpmpath had been overridden to use dr.pkgs.inf.ed.ac.uk instead; but unfortunately the inclusion of pkg-slave.h further down the profile had overridden that override! For a while I thought that the updaterpms.rpmpath value would be coming from the diceinstallbase-sl6_64-stable profile but qxprof -v at the shell prompt which appeared after the install failed revealed that the value was actually coming from brendel's own lcfg file.
  2. I moved the updaterpms.rpmpath override down to the bottom of lcfg/brendel then tried again. The first wave of RPMs then installed and the machine booted.
  3. The (second wave) updaterpms from dr.pkgs.inf.ed.ac.uk then failed. The error from updaterpms was to the effect that http://dr.pkgs.inf.ed.ac.uk/sites/sl6/6.3/x86_64/os/Packages/hdrs/_nfs-acl- (something) could not be found. At first I misinterpreted this error message as meaning that the hdrs directory had somehow not been copied to the mirror, but this was wrong; the error message actually points to the fact that updaterpms had first tried the "dot" version of the header file, then the "hdrs" version, and hadn't found either. It turned out that rsync had simply not copied files starting .nfs, assuming that they were temporary NFS lock files. Stephen has now altered the affected buckets to use the "hdrs" version of the RPM header files rather than the "dot" version. We're now also using dr.pkgs as the permanent source of RPMs for a couple of minor MPU servers so we get quicker warning of such mirroring failures in future.
  4. At the time first priority was to get the packages master up and running, so the next attempt was to point updaterpms at brouwer, the temporary VM which had been used to test aspects of the package master configuration on SL6, and which was serving the package buckets over http. The attempt failed with http 403 errors. This was progress from 404 errors. Apache permissions proved to be a red herring; the problem turned out to be AFS permissions. The waklog file template was altered to mention brendel 's pkgaccess keytab which had been copied over from brendel to brouwer earlier for testing.
  5. Updaterpms could then obtain RPMs. However it didn't install any because of a package conflict. The perl-AFS package was missing. This turned out to be because the stable release version of the pkg-master.h header was specifying the old version which wasn't available, meaning that the new one didn't get installed. The lcfg file was altered to specifically mention the new perl-AFS version. The second wave of RPMs then installed cleanly.
  6. When the machine booted refreshpkgs would not start. This turned out to be because brendel was missing its refreshpkgs keytab file. I made its directory (/var/lcfg/conf/refreshpkgs) and copied in the keytab which I had luckily earlier copied to the test machine brouwer.
  7. A number of other files had also been missing but om file configure made them.
  8. At this point a fix for the create repo bug became available so a new createrepo RPM was made and installed along with a version of the lcfg-refreshpkgs RPM in which freshenrpms called createrepo with the checkts option. This all worked.
  9. Test runs of updaterpms on other machines worked, and test submits of packages to buckets resulted in the correct metadata being generated for both updaterpms and for yum so the SL6 packages master server was then declared functional.

Stephen has made some suggestions:

  • We should test the entire service beforehand, we looked at refreshpkgs but didn't check that apache/waklog would work. Thankfully it did...
  • For future upgrades we might put all temporary overrides in the relevant live header rather than in the machine's lcfg file, to minimise confusion and mistakes when copying them over from a test machine to the real machine.
  • It's probably a good idea to manage the /var/lcfg/conf/refreshpkgs directory with the file component. This reduces the number of manual steps necessary to reinstall the machine.
  • A new keytab can be made using the ktadd command in the kadmin utility.

Chris observes: although I had tested apache - and then partially dismantled the test setup so it didn't work any more, since I thought I wouldn't be needing it again! - it would certainly be a good idea in future to, as a test, do a successful complete machine installation from each of the test server and the DR server before going ahead with the reinstallation of the master package server. Another idea would be to work out the necessary settings and AFS permissions for configuring updaterpms to get its packages from AFS instead of HTTP, in case that was needed.

-- ChrisCooke - 14 Dec 2012

Topic revision: r4 - 29 May 2014 - 15:13:28 - ChrisCooke
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies