MPU Meeting Tuesday 4th August 2009

Power Management Project

The HP dc7900 has been tested and is now supported by lcfg-sleep. With the right sleep quirk it seems to suspend and resume impeccably.

Preparations are under way for the deployment of lcfg-sleep in one student lab:

  • sleep now cooperates with autoreboot: it refuses to suspend the machine if it detects that the root user is running the autoreboot component's favoured shutdown process.
  • sleep now cooperates with Condor: the maximum sleep period is reduced, and condor is shut down before sleep and started after waking (to get round Condor's own faulty response to sleep).
  • sleep now cooperates with exam lockdown: sleep is disabled when exam lockdown is detected.

rpmsubmit

No recent progress.

AFS Component

No recent progress. Documentation is pretty much all that's still to be done.

TiBS

Chris, Alison and Craig have been trying to finalise the list of exactly what maintains each config file and how. That's almost done. At that point the component can be adapted to maintain the remaining config files as appropriate.

The %config(noreplace) directive will be used as appropriate in the tibs 2402 RPMs once we have the tarball, some time after Alison's holiday.

LCFG Server Refactoring

Simon has imported the LCFG server code into git. The dependencies haven't yet been imported. The idea is that for the duration of the project any dependencies will be formed into a module, so that the server refactoring project can be standalone; afterwards they may get factored out again. Here's Simon's announcement.

Stephen has been working on the test suite, and has made interesting discoveries about LCFG's use of XML. Here's his description of the problem.

Virtual Servers

Alastair's talk will be on 19 August. It will constitute the project's final report.

Miscellaneous Development

Monitor client component
We need to monitor the client component to check for hangs. We could develop a whole suite of tests for this sort of problem - for example one could look for two boot components. Alastair will add this to the small projects list.

Operational

PXE Upgrade
Stephen has upgraded PXE, so we'll be able to use it with the HP dc7900 and with more server models than before too. Broadcom support still needs to be added though.

Routing problem
Alastair talked to George about this. George thought it might be to do with the routing configuration.

Cosign upgrade for lcfg.org
Not yet done. The current apacheconf nagios test is too dumb. It does SSL tests against the machine's apache "main host" rather than the virtual host in question. Stephen will reorganise the lcfg.org configuration somewhat to make www.lcfg.org the "main host". This will also simplify the Cosign configuration and make the Cosign upgrade easier.

DIY DICE server move
The DIY DICE servers are now on padua.

VirtualBox version of DIY DICE install instructions
Alastair has done this.

Cosign upgrade for Forum Tracker
Done.

Cosign upgrade for DIY DICE servers
Done.

lcfg-cron
the latest fix has been rolled out.

VMware performance
Alastair has been looking at this, in particular at why it was so bad when the virtual servers upgraded to SL5.3. He has concluded that the disks in the EVO arrays simply aren't fast enough. The arrays contain standard SATA disks; we should be using faster SAS drives. Fortunately it's possible to add expansion cabinets to the array and load these with SAS drives, to which we can then move the VMware disk blobs. We're getting a quote for this. We talked of ways to stagger updaterpms runs across the virtual servers and then remembered that this was actually a widespread problem, sometimes affecting also the desktops. To solve it the simplest course of action seems to be to enhance lcfg-cron to make it possible to specify a range of times during which a job will be run at one random time: so updaterpms could be told to run at a random time outside the working day.

Install CD default
We should change the default drive from hdc to sr0

Kernel panic on cameleopard
Craig has reported seeing repeated kernel panics on cameleopard, a Linux AFS fileserver and mirror server. This is the machine which has recently been the most heavily used for rsyncing new user files. The panic seems to be something to do with ext3 filesystem lookups but the problem could be in any lower level, for instance it could stem from a hardware problem.

GDM 745 freeze
Alastair's 745 has been freezing when gdm restarts. Stephen advised using a DVI cable and reporting to him if this didn't solve the problem. (Chris has since tried it on his own 745 and it doesn't.)

New nvidia drivers
There are new Nvidia drivers out, including a new 185 series which will become the default. Stephen is bracing himself for the usual problems.

This Week

Alastair will:

  • Review project list and prioritise
  • Add new time-monitoring category to wiki
  • Get rpm master change details to Stephen
  • Get a quote for SAS drives and EVO expansion
  • rpmsubmit project
  • Tidy lcfg/inf level

Carol will:

  • be on support this week so no MPU work

Chris will:

  • Review project list and prioritise
  • TiBS project
  • Sleep project

Stephen will:

  • Review project list and prioritise
  • Server Refactoring project
  • Report on SL5.3 upgrade problems
  • Raise rpm master changes at LCFG Deployers Meeting DONE
  • Change the default install CD drive to sr0 DONE
  • Finish the PXE upgrade DONE
  • Reorganise the lcfg.org web configuration DONE

-- ChrisCooke - 5 August 2009

Topic revision: r2 - 07 Aug 2009 - 09:35:51 - StephenQuinney
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies