MPU Meeting Tuesday 7th June 2011
SL6
Not much happening with this project now. All essential packages must
be in the stable package buckets by the end of Friday 10th June so we
can make the first SL6 stable release on Wednesday 15th June.
AFS automation project
The script to promote read-only copies of lost volumes has now been
finished and manually tested. Chris would still like to add some
code-level testing as well.
The next task on the
enhancements list is
"Script to automate distribution of volumes across servers" There were some thoughts that the script could balance volumes across the servers based on space requirements and speed requirements for busy volumes. This could be a rather complex problem to solve involving many factors. There is a tool on
Russ Allbery's website which can do something like this using a "linear programming optimizer".
We wondered whether it might be much simpler to do the balancing just based on space requirements with some human input regarding what types
of storage arrays were appropriate for certain volumes. For instance, we could manually manage a list of busy volumes which must be on
RAID10 SAS disk arrays. It may also be that certain volumes have to stay on certain servers.
As part of this it would be good to have a database of usage statistics so that we can see how busy a volume is over time and the
rate at space is being consumed.
LCFG Server Refactoring
All the outstanding patches stored in
gerrit.inf have been
reviewed by Simon and have now been submitted. Not much needed
changing, mainly some work needed to be done to create a sub-class of
the Module::Build class to improve handling of various file types.
The LCFG server component was split off and is stored in the LCFG
subversion repsitory. This needs to be updated for the new build tools
and tested.
It will be good to get the server daemon running for a while with live
data to see how well it functions. We might need to limit the number
of profiles being processed to avoid memory leaks.
The next development stage is to work on the patch for the Safe mode
usage. There is a patch but it hasn't been thoroughly tested or
submitted for review yet.
Miscellaneous Development
- sleep
- We have decided to fix a particular version of SL5 and not continue to do updates on that platform. Some of the DICE policy has been moved to the LCFG layer to make the sleep component more useful by default. The blacklist support which was discussed at the LCFG Deployers Meeting has not yet been added. Chris wants to test the SL6 sleep support for the Dell 755 with different "quirks" to see if it can be made to sleep properly.
- fstab
- At the LCFG Deployers meeting we discussed making SCSI disks the default for SL6 as currently it's using the old IDE
hda
devices which make no sense with modern kernels. We also need to update the default partition sizes and provide macros for altering the sizes in a standard way.
Operational
- SL5 kernel
- There is a new SL5 kernel (
2.6.18-238.12.1.el5
), is it time for us to be upgrading our SL5 DICE machines?
- DL180
- We now have an HP DL180 for MPU usage so fantoosh can go back to the Services Unit.
- Dell Optiplex 790
- We will be buying the Dell Optiplex 790 this year for our standard desktop. There is still a question over which processor we should buy.
- ssh firewall holes
- Stephen suggested that DICE desktop machines which have ssh firewall holes should have a header added to their LCFG profile (e.g.
dice/options/desktop-ssh-access.h
) which, as well as creating the firewall holes, would add fail2ban and do anything else we feel is necessary to tighten up access restrictions. This would make it much easier to track desktop machines which have ssh firewall holes and in the event of an emergency we could close them all much more easily.
This Week
- Alastair
-
Split off NaturalDocs work from SL6 into mini devel project
- Propose Simple KVM service project
- Start looking at fstab miniproj
-
Discuss MPU taking over mailcap and alias components with RAT and services respectivelyAgreed.
-
Discuss IPV6 under SL6 with RAT - can we put it back?
- purchase disks for northern and metropolitan
- Chris
-
Talk to Craig re what is possible in timescales wrt volume balancing tool
- Investigate and rectify "false errors" during install and boot
- More SL6 sleep work
- Gordon
- upgrade to SL6
- mpath component
- Stephen
- Server project work
- Webmark form for TA bidding
- Try RHEL6.1 on circle to see if fibre problem fixed.
--
AlastairScobie - 07 Jun 2011