MPU Meeting Tuesday 16th February 2010

AFS Component

Toby has got the AFS KDC up and running so this week Stephen will transfer the AFS database functionality to it from the wheezing old FH AFS DB server.

We want telford to be an AFS fileserver but it's 64 bit and that's not properly supported by the perl AFS module, and OpenAFS doesn't build 64 bit libraries with the correct option. SO, Stephen has written a helper module which does things in shell instead. It's very basic and just implements the functions he needed. Now when his component calls the perl module and this fails, the helper module is used instead.

Stephen found a bug in his component: it wasn't correctly setting up the configuration on a brand new machine.

LCFG Server Refactoring

Command line option handing has been rewritten.

The code now passes perlcritic level 4.

The server component has been extracted into subversion. It will now be built separately from the mkxprof part. This makes it easier to build each part in the most appropriate way.

Server Hardware

The toohot script has gone out to all appropriate servers. No problems showed up on develop servers but when it hit stable some servers mailed their managers a report on every run of toohot, which wasn't ideal. A redirection to /dev/null has been added.

Installroot

Alastair has talked with IS about a possible joint replacement for PIE and our install technology. IS say that PIE will work as-is on Windows 7, and anyway they wouldn't have time to redevelop it. In the long run they expect to replace PIE with a Microsoft solution and run Linux in virtual machines. They suggest that people who don't want to use the Microsoft solution could either use our technology or could maybe use Kickstart.

Alastair is interested in the Kickstart idea, it has definite possibilities, but there's currently not enough effort available to investigate it. For now (and for the Fedora 12 port) he'll carry on with the current installroot improvements.

Stephen suggested also taking a look at Cobbler, open source installation management technology which interfaces with other configuration management solutions already, for example Puppet.

There's now an entry in the wee projects list for the Kickstart and Cobbler evaluations.

F12

Chris's Fedora install went wrong in all sorts of ways (blog). Stephen also had trouble (blog); from memory he had these tips on the Fedora installer:

  • Set the LDAP details in the AUthorisation tab not in the Authentication tab.
  • The first time you login it waits for your AFS homedir for a minute or so, then gives up and makes a local one. It's fine subsequently.

Chris had misunderstood the package building - it should merely be necessary to rebuild existing source RPMs on the new platform. If they don't rebuild & reinstall, file a bug.

Miscellaneous Development

RAID configuration tests
Alastair has tried some performance tests with VMware Server and different RAID configurations. He simulated a heavy load by doing five simultaneous VMware server client installs then five simultaneous upgrades. He got an odd finding: pre-created disks are about 30% slower than on-the-fly disks for installs. For upgrades they're about 30% faster. This is with 3 RAID 1 arrays striping with RAID 0. He's next going to try storing VM disks on physical devices. However this has uncovered a problem with sysinit scripts: Software RAID starts before multipath. Our disks are multipath so software RAID doesn't see them. Another problem: trying this on our ATAbeast, there are 26 useable disks, which would be 13 RAID 1 arrays. Each of these, together with LUN masking and so on, would need to be configured completely by hand via the awkward web interface: both labour-intensive and highly prone to error. There must be a better way.

Bugs fixed in (stable) LCFG Server
Stephen has fixed bug 193 (dumpdeps only runs as root) and bug 115 (mangling of servername resource) in the stable LCFG Server. He has also back-ported the release level changes from the development tree: it should no longer be possible to have a looping release level change.

Upstart
Alastair has looked at this. The configuration seems to be conflated with the code! There doesn't seem to be a nice way to configure Upstart from LCFG. Simon says there are Upstart command line configuration tools so this might be worth investigating. For now though, one way forward might be to let Upstart take care of those things which are already started using the new dependency-based mechanism, then for the remaining old-style rc-based scripts, to throw away Upstart's compatibility mode and instead use the boot component to configure them. It's all quite messy. Alastair will look at how it's done in Ubuntu, and will check out more recent versions of Upstart. This work now seems to be a higher priority than the install redevelopment project.

Network component
Stephen is fed up with, and will fix, the bug whereby the network component doesn't remove configuration for interfaces whose LCFG configuration has been removed. He suggests it should keep track of what interfaces it has at one time or other been asked to configure, and remove configuration for those which are not currently being configured by LCFG. This would be safer than the simpler option of removing the configuration for all interfaces which it is not currently being asked to configure, as it is sometimes necessary to configure an interface by other means, particularly in emergency situations. While he's there he will also apply Panos's patch for the network component.

Operational

DICE level repository configuration
the new arrangement is now in testing.

VMware server
Alastair has tested VMware server with SL 5.4. CLients/guests run well enough, but there's a problem with the web interface - it's less reliable than with SL 5.3. It's bad enough with 5.3 so we're going to have to tie the VMware host servers back at 5.3, and look hard for a replacement for VMware server.

SL 5.4
it hits stable this week, and users should be warned. Stephen will talk this over with Alison.

dresden disk space
Stephen has got this sorted out.

telford disk space
This is in progress.

telford is now an AFS fileserver

Fedora 12 mirror
Stephen has made a local Fedora 12 mirror.

AFS cache size
Stephen will tune down the AFS cache size: it's currently a bit too big. It should be set so that the cache partition is no more than 90% full.

FH to AT move
Stephen did this. Mousa and split are now ensconced in AT basement. Both currently use 5m ethernet cables. These should be switched for 4m cables when some arrive. Stephen has added ethernet bonding to the machines so the cables can simply be swapped one at a time. _mousa_'s old UPS is now in the Forum awaiting deployment elsewhere.

pkgwrite access for AFS pkgs tree
Alastair has done this but hasn't tested it yet.

dice-orders fixed
by Tim.

File configure after updaterpms
We have an action from the Operational meeting to consider the implications of periodically calling the configure method of the file component, such as after running updaterpms. We should all consider this.

Next Meeting

The next meeting will be held on Tuesday 23rd February.

This Week

Alastair will:

  • Think about MPU logging requirements
  • Talk to George about routing problems
  • RAID testing
  • Consider periodic file configures
  • Add refreshpkgs to telford
  • Hold VMware host servers at 5.3
  • Look at upstart in ubuntu

Chris will:

  • Consider periodic file configures
  • Think about MPU logging requirements
  • F12

Stephen will:

  • Set up mirror cron job with keytab
  • Consider periodic file configures
  • Think about MPU logging requirements
  • Server refactoring
  • telford disk space
  • Release testing

-- ChrisCooke 17 Feb 2010

Topic revision: r3 - 19 Feb 2010 - 09:34:52 - ChrisCooke
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies