MPU Meeting Tuesday 4th August 2009
Power Management Project
The HP dc7900 has been tested and is now supported by lcfg-sleep. With the right sleep quirk it seems to suspend and resume impeccably.
Preparations are under way for the deployment of lcfg-sleep in one student lab:
- sleep now cooperates with autoreboot: it refuses to suspend the machine if it detects that the root user is running the autoreboot component's favoured shutdown process.
- sleep now cooperates with Condor: the maximum sleep period is reduced, and condor is shut down before sleep and started after waking (to get round Condor's own faulty response to sleep).
- sleep now cooperates with exam lockdown: sleep is disabled when exam lockdown is detected.
rpmsubmit
No recent progress.
AFS Component
No recent progress. Documentation is pretty much all that's still to be done.
TiBS
Chris, Alison and Craig have been trying to finalise the list of exactly what maintains each config file and how. That's almost done. At that point the component can be adapted to maintain the remaining config files as appropriate.
The
%config(noreplace)
directive will be used as appropriate in the tibs 2402 RPMs once we have the tarball, some time after Alison's holiday.
LCFG Server Refactoring
Simon has imported the LCFG server code into git. The dependencies haven't yet been imported. The idea is that for the duration of the project any dependencies will be formed into a module, so that the server refactoring project can be standalone; afterwards they may get factored out again.
Here's Simon's announcement.
Stephen has been working on the test suite, and has made interesting discoveries about LCFG's use of XML.
Here's his description of the problem.
Virtual Servers
Alastair's talk will be on 19 August. It will constitute the project's final report.
Miscellaneous Development
- Monitor client component
- We need to monitor the client component to check for hangs. We could develop a whole suite of tests for this sort of problem - for example one could look for two boot components. Alastair will add this to the small projects list.
Operational
- PXE Upgrade
- Stephen has upgraded PXE, so we'll be able to use it with the HP dc7900 and with more server models than before too. Broadcom support still needs to be added though.
- Routing problem
- Alastair talked to George about this. George thought it might be to do with the routing configuration.
- Cosign upgrade for lcfg.org
- Not yet done. The current apacheconf nagios test is too dumb. It does SSL tests against the machine's apache "main host" rather than the virtual host in question. Stephen will reorganise the lcfg.org configuration somewhat to make www.lcfg.org the "main host". This will also simplify the Cosign configuration and make the Cosign upgrade easier.
- DIY DICE server move
- The DIY DICE servers are now on padua.
- VirtualBox version of DIY DICE install instructions
- Alastair has done this.
- Cosign upgrade for Forum Tracker
- Done.
- Cosign upgrade for DIY DICE servers
- Done.
- lcfg-cron
- the latest fix has been rolled out.
- VMware performance
- Alastair has been looking at this, in particular at why it was so bad when the virtual servers upgraded to SL5.3. He has concluded that the disks in the EVO arrays simply aren't fast enough. The arrays contain standard SATA disks; we should be using faster SAS drives. Fortunately it's possible to add expansion cabinets to the array and load these with SAS drives, to which we can then move the VMware disk blobs. We're getting a quote for this. We talked of ways to stagger updaterpms runs across the virtual servers and then remembered that this was actually a widespread problem, sometimes affecting also the desktops. To solve it the simplest course of action seems to be to enhance lcfg-cron to make it possible to specify a range of times during which a job will be run at one random time: so updaterpms could be told to run at a random time outside the working day.
- Install CD default
- We should change the default drive from
hdc
to sr0
- Kernel panic on cameleopard
- Craig has reported seeing repeated kernel panics on cameleopard, a Linux AFS fileserver and mirror server. This is the machine which has recently been the most heavily used for rsyncing new user files. The panic seems to be something to do with ext3 filesystem lookups but the problem could be in any lower level, for instance it could stem from a hardware problem.
- GDM 745 freeze
- Alastair's 745 has been freezing when gdm restarts. Stephen advised using a DVI cable and reporting to him if this didn't solve the problem. (Chris has since tried it on his own 745 and it doesn't.)
- New nvidia drivers
- There are new Nvidia drivers out, including a new 185 series which will become the default. Stephen is bracing himself for the usual problems.
This Week
Alastair will:
- Review project list and prioritise
- Add new time-monitoring category to wiki
- Get rpm master change details to Stephen
- Get a quote for SAS drives and EVO expansion
- rpmsubmit project
- Tidy lcfg/inf level
Carol will:
- be on support this week so no MPU work
Chris will:
- Review project list and prioritise
- TiBS project
- Sleep project
Stephen will:
- Review project list and prioritise
- Server Refactoring project
- Report on SL5.3 upgrade problems
- Raise rpm master changes at LCFG Deployers Meeting
- Change the default install CD drive to
sr0
- Finish the PXE upgrade
- Reorganise the lcfg.org web configuration
--
ChrisCooke - 5 August 2009