Infrastructure Unit SL6 Server Upgrade Plans

"Non-Inf-Unit" Things

  • cosign client-side believed done
  • rfe server-side needs final testing and sign-off (this may block some upgrades)

Network Servers

  • Is multi-home/DHCP/PXE working yet? "march" went OK...
  • Tnm already "ported", but need to build and test pcScan
    • Ideally revisit packaging, particularly handling of 64-bit issues
  • Track rrd-users to see how tcl bindings issues are resolved
    • ... but will want to use our own build anyway, per dice/options/rrdtool.h
    • Our fix probably went into upstream, but unlikely to appear in SL any time soon
  • Can we just run with the existing (32-bit) rrd files, or do we have to dump/restore?
    • NO; we do have to dump/restore
    • Create a tool which does an rrdtool dump of everything, for use in a nightly cron job
      • Done: runs nightly from cron on the netinf machines
      • Nameservers now dump/compress, and the graphing machine(s) rsync the .rrd.xml.gz files and unpack them, rather than copying raw rrd files around. (They even know how to detect which version of the rrd files they're using, and set the graphs up accordingly.)
      • We still have to decide whether to try to add the additional DSes or reinitialise from scratch. Maybe just appending a standard (empty) bunch of xml to the dumped files would be enough? Check the mailing list.
      • (Old rrds have been moved sideways to ../rrd-archive/)
    • Create another tool to re-import en masse as needed.
  • Deprecate fstab entries in netinf.h; partition explicitly in machine profiles now. See march for an example
  • Repartition abbado? Re-RAID??
    • Will need to rsync /disk/home off and back on again, so as to preserve all its contents
    • Essential to preserve /disk/home on all other upgrades (for rrd files and network configuration)
  • Haven't thoroughly tested 64-bit quagga yet, but it seems to run fine on march
    • Now testing 0.99.20.1 (and also on SL5)
  • S1 NTP; and can a GX260 cope with DICE SL6?
  • DHCP
    • Remember to move the leases database when the service is moved
  • Remember to allow time for service changes to propagate
    • DHCP-provided
    • VPN endpoints (via DNS?)
    • move-wait-upgrade-return
  • Move remaining non-wire-B machines to have VLAN 64 untagged and all others tagged
    • AT and KB (Forum already OK?)
    • May need corresponding header changes and generalisations
    • Plan how best to cope with the site configuration machines!
  • OpenVPN appears to run fine as 64-bit client
  • Wake service??
  • rfe server-side hasn't been looked at yet
    • test rfe server itself
    • port ldap sync code to new buildtools (done but not tested)
  • Are we going to get any replacement hardware??

  • ToDo at Forum:
    • harnoncourt - primary router, timeserver
    • abbado - network configuration and monitoring, secondary router, DNS master, backup site DNS, DHCP (†), timeserver, rfemaps master
    • hickox - site DNS, OpenVPN, secondary router
    • linnaeus - external DNS, external timeserver, secondary router, has faulty memory
    • march - gdmr's desktop/development machine, secondary router
    • Order: linnaeus, harnoncourt, hickox, abbado?

  • ToDo at AT:
    • kubelik - primary router
    • jarvi - network configuration and monitoring, secondary router, backup site DNS, DHCP (†), timeserver
    • ancerl - site DNS, OpenVPN, secondary router
    • darwin - external DNS, external timeserver, secondary router
    • Order: darwin, kubelik, ancerl, jarvi?

  • ToDo at KB:
    • kleiber - primary router, network configuration and monitoring, site DNS, DHCP (†), timeserver
    • wallace - external DNS, secondary router
    • Order: wallace, kleiber
    • Hoping for a KB netServ machine...

(†) DHCP service has been ported and tested on SL6 - see Console Servers below.

  • Who: Ian with George
  • What: external nameservers, primary routers, network services, in that order. Then pause for thought.

Console Servers

  • (Although KB console server still needs to be actually upgraded)
    Component ported; latest version of conserver patched, built and installed for SL6 (and SL6_64.) The necessary IPMI s/w for SL6 has been arranged. Previous experience of OS upgrades suggests that this should now just be a case of actually doing it.
    • =ventura= (self-managed) and verte (AT) reinstalled as SL6_64 on 13.2.12. The latter showed up problems with the DHCP server on SL6 - fixes now tested and will be in stable release of Wed 22.2.2012.
      marriner (Forum) reinstalled as SL6_64 on 24.2.12. Coincidentally (?), srslc05.f.net.inf.ed.ac.uk failed and was replaced by srslc07.f.net.inf.ed.ac.uk.
    • To-do: roujan (KB). _Can this machine run 64-bit?_
    • Hoping for a new netServ machine for KB. If we get that, run the KB consoles from it too.

  • Who: Ian
  • What: Nothing for now, pending netServ machine for KB

Monitoring (nagios) Servers

  • (Although secondary server can't actually be upgraded to SL6 until Jabber service is converted)
    Wait for [[https://devproj.inf.ed.ac.uk/project/show/125][DevProj #125]
    ]

  • Who: Ian
  • What: Look at jabber, with a view to upgrading curlew

Logging

  • loghost (tycho) is already running SL6_64

Authentication (Kerberos, cosign) Servers

  • DevProj #223
  • Also DevProj #168
  • KDCs (INF and iFriend)
  • wallet - noting that this service runs on the master KDC. Main thing is to save entire contents of /var/wallet, and to recreate the controlling database from its dump (sqlite3 .read <dumpfile>) to deal with any 32-bit->64-bit issues.
  • KCAs
  • SIXKTS - and move this service (which runs on a single machine; i.e. which is not duplicated) to a machine in the Informatics Forum
  • cosign

  • What should we do about logging?
    • Just carry on with logs on individual machines?
      • Not so easy to trawl later
    • Use the loghost?
      • Do we want auth data to go there?
    • Set up our own auth loghost?
      • Straightfoward to set up, but one more thing to bother about later

  • Who: Toby and George
  • What: Bring up VMs to test the various services, noting that the KDC will likely need component enhancements for the changed/additional functionality in 1.9

Directory (OpenLDAP) servers

  • DevProj #181
  • We need to decide:
    • which version of openldap (2.4.30?) => 2.4.30
    • which version of bdb (4.8.30?) => 4.8.30
  • Note that Prometheus DR machine is KB LDAP slave
    • Ideally we shouldn't do anything to prevent us running Prometheus on any of the LDAP slaves if necessary
    • ... hence we must be confident that we can take Prometheus along

  • Who: Ian with Toby
  • What: Upgrade and test one of the slaves (not KB); then another (not KB). Pause for thought.
  • Current state: (26.4.12, idurkacz) mckinley and blackwell (Forum and AT slaves respectively) have been upgraded to SL6_64. (Note: 64-bit.) Pausing for thought! Not sure about the Prometheus implications mentioned above ...

Account Management (Prometheus) Servers

  • Note that Prometheus DR machine is KB LDAP slave
  • Need to have 64-bit issues sorted out - specifically that we require perl-AFS but it can't currently be built on 64-bit platforms
  • We need to decide whether we wait for perl-AFS or install Prometheus as 32 bit SL6
  • Recommendation that we use stock SL6 perl-Moose (1.15) with a view to investigating the latest version (currently 2.004 - this will require some changes in the prometheus framework as at least one part of the API which we use has now disappeared)
  • Prometheus tests on SL6_64 all run successfully (with a few minor source changes), apart from AFS ones.

  • Who: Toby with George
  • What: Packaging first?

-- GeorgeRoss - 02 Dec 2011

Topic revision: r41 - 10 Aug 2012 - 15:21:04 - IanDurkacz
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies