Proposed AFS Enhancements - devproj 135
Development project 135 deals with the creation of tools and scripts to make the management of the School's AFS file system easier. This page details the ideas we have come up with so far. Additions and comments welcome. Chris Cooke is the person actually working on the project and you may wish to discuss ideas with him.
Description |
Priority |
Comments |
Automate conversion of RO volumes to RW |
1 |
For disaster recovery. Should take as argument a single volume, a partition on a server or an entire server that has died, then identify suitable RO replacement volumes elsewhere and promote them to RW. It should create a new RO volume on the same partition as the newly RW volume. It should also remove the old RW volume from the VLDB. |
Script to automate distribution of volumes across servers |
2 |
Should ensure that volumes are equally distributed across partitions/servers. There are some thoughts on this below. |
Dynamic quotas |
3 |
At the 12th January ops meeting, it was agreed that one way of avoiding the current situation of much user disk allocation being unused would be to give users a fairly small quota with these quotas being raised either automatically or by the users themselves when they approach their limit. This would allow us to avoid overloading partitions whilst still making the most effective use of available space. DynamicAFSQuotas. This depends on Script to automate distribution of volumes across servers (as otherwise partitions could quickly fill up). |
Mountpoint database |
4 |
stores mountpoint of each volume on file system |
Mirroring database |
4 |
which partition is mirrored to which partition |
Script to move volumes |
5 |
Should take as argument a single volume, a partition on a server or an entire server. Scripts already exist for moving single volumes which could be used as a basis for this |
Wrapper for long running jobs |
5 |
Nearly done - see Craig |
Script to identify suitable partitions for new volume |
5 |
Prometheus related, see Simon/Toby |
Load monitoring enhancements |
5 |
see Neil |
Script for managing ACLs |
5 |
Should do things like changing ACLs recursively, checking whether users/groups in ACLs exist etc. |
It is entirely possibly that scripts to do some of these tasks already exist in the wider world in which case all that is needed is to integrate them into the DICE environment.
--
CraigStrachan - 25 Mar 2011
Balancing
Here are some more thoughts on the script to automate distribution of volumes across servers.
We want to balance volumes across servers both to ensure enough free space everywhere and also to ensure that the load gets spread as evenly as possible across as many spindles as possible.
Ideally a piece of software would run every night and automatically move volumes about as necessary.
We will need:
- Some kind of partition database type thing. It might need the following bits of information:
- server/partition tuple
- underlying storage type RAID5, RAID10, RAID10SAS
- its mirror partition, if there is one
- overloading policy
- content type - group, user, pkgs, other
- DR type - RW + RO, or just RO
- size
- % free
- Information on volumes:
- Usage figures. (Ross has written a script which collects these.)
- content type (in the volume name!)
- A tool for moving volumes about. (Neil might know of one.)
- An algorithm for what goes where.
- A script to actually do it!
Topic revision: r5 - 31 Jul 2013 - 15:15:42 -
NeilBrown