MPU Meeting 12th December 2006

64 bit project

Stephen brought the fc5_64 packages back into line with the current fc5 and tidied up a few loose ends in the inf level headers.

Alastair and Stephen need to review the status of the project data on http://devproj.inf.ed.ac.uk before the Development meeting.

LCFG website

rsync.lcfg.org has been transferred to the upgraded dresden and made publically readable to the world (as compared to previously being ed.ac.uk only). There is now also a development version of the website based on dresden at http://devwww.lcfg.org/

It was discovered that there was not enough disk space on dresden to store all the packages, headers and install cdrom images. We will acquire a fibre channel card and attach dresden to the SAN. Stephen will contact Craig about getting some space allocated. The space does not need to be backed up as all the content for the website is either generated or already stored elsewhere.

A set of the current website templates, css, images and static content has been handed over to EUCS for them to consider what they could do for us.

Solaris improvement project

Alastair has not yet had a chance to look at the Solaris headers. Once he has everyone will get together for a chat to decide the best plan of action. We think that we will all have a go at tackling the header problem together in January.

Chris looked at jumpstart, we are agreed that we will stick with the current setup for now and Chris will document how it all works.

FC5 upgrade

New PXE docs
Stephen still needs to do some team-internal documentation for the new PXE code.

bugzilla entries
The remaining fc5 upgrade bugzilla entries for MPU still need checking by all team members

Operational

Email list of broken machines
Stephen has begun work on a new component - errorlog - which will run on the LCFG servers and log status changes (broken to fixed and fixed to broken). There will also be a cronjob on one machine to do a nightly check of the profiles and email the errors to the managers.

rsync to scunner
Neil noted that an rsync cronjob copying lcfg data from the autocheckout rsync module on achilles was failing as the groups web server had moved from scunner to stoater. Stephen fixed the access controls but was not sure why this cronjob was needed. We can almost certainly get rid of it once the new LCFG website has been launched.

Serial console problems
Alastair noted that serial console problems on 2850, 860 and 1950 poweredge machines was likely to be caused by their IPMI support. Neil had managed to do a hacky fix to get the serial console working on a 2850 but this is not a good permanent solution.

FC5 Flakiness
There is a USB/UPS problem, another related to H323, another related to ip6table, and separate crashing/freezing problems.

USB/UPS problem
There is no fix for this problem in the kernel. The workaround is to use a usb-to-serial dongle. The only problem with this is that a UPS needs to be power cycled to switch from a usb to a serial connector.

H323 problem
George found that there was a bug related to this, see http://www.thisishull.net/archive/index.php/t-221483.html for details. Using a kernel with CONFIG_IP_NF_CT_ACCT turned off does seem to have made this particular bug go away.

Crashing/freezing machines
One major cause of this was tracked down to the Radeon graphics cards that are used in the Dell GX260s. We have now turned off, in-kernel, hardware acceleration - dri - in the /etc/X11/xorg.conf configuration file. It may well be that the GX260s need rebooting to truly clear this kernel driver from memory. If problems persist we might need to remove the radeon cards and use the on-board Intel card instead. There are reports of some other "freezes" but these do not seem to be the same type of problem. With the GX260s the machines lock up so badly nothing can be done but the other machines still respond to alt-sysrq and, in several cases allow root login on the console. These appear to be related to some sort of network problem, it was noted that this has never been reported as happening on the KB subnet. We also noted that the recent problems have been exacerbated by the unrealistic resource requirements of certain undergraduate practicals. Many machines were not really frozen but were just failing to cope with the enormous processor and memory requirements.

LCFG Workshop
It was agreed that Alastair, Chris and Stephen would get together after the development meeting and put together some ideas for next week's workshop presentation.

This Week

Alastair will:

  • Look at Solaris headers
  • Test and document the installbase context
  • Upgrade pezenas to fc5

Stephen will:

  • Finish the errorlog component
  • List the achilles upgrade dependencies
  • Prepare the slides for the LCFG workshop presentation

Chris will:

  • Upgrade lcfg1/kipper to FC5
  • LISA reports
  • Complete tests of testing release
  • stable release
  • Internal documentation

-- StephenQuinney - 12 Dec 2006

Topic revision: r1 - 12 Dec 2006 - 16:19:54 - StephenQuinney
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies