Proposed AFS Enhancements - devproj 135

Development project 135 deals with the creation of tools and scripts to make the management of the School's AFS file system easier. This page details the ideas we have come up with so far. Additions and comments welcome. Chris Cooke is the person actually working on the project and you may wish to discuss ideas with him.

Description Priority Comments
Automate conversion of RO volumes to RW 1 For disaster recovery. Should take as argument a single volume, a partition on a server or an entire server that has died, then identify suitable RO replacement volumes elsewhere and promote them to RW. It should create a new RO volume on the same partition as the newly RW volume. It should also remove the old RW volume from the VLDB.
Script to automate distribution of volumes across servers 2 Should ensure that volumes are equally distributed across partitions/servers. There are some thoughts on this below.
Dynamic quotas 3 At the 12th January ops meeting, it was agreed that one way of avoiding the current situation of much user disk allocation being unused would be to give users a fairly small quota with these quotas being raised either automatically or by the users themselves when they approach their limit. This would allow us to avoid overloading partitions whilst still making the most effective use of available space. DynamicAFSQuotas. This depends on Script to automate distribution of volumes across servers (as otherwise partitions could quickly fill up).
Mountpoint database 4 stores mountpoint of each volume on file system
Mirroring database 4 which partition is mirrored to which partition
Script to move volumes 5 Should take as argument a single volume, a partition on a server or an entire server. Scripts already exist for moving single volumes which could be used as a basis for this
Wrapper for long running jobs 5 Nearly done - see Craig
Script to identify suitable partitions for new volume 5 Prometheus related, see Simon/Toby
Load monitoring enhancements 5 see Neil
Script for managing ACLs 5 Should do things like changing ACLs recursively, checking whether users/groups in ACLs exist etc.

It is entirely possibly that scripts to do some of these tasks already exist in the wider world in which case all that is needed is to integrate them into the DICE environment.

-- CraigStrachan - 25 Mar 2011

Balancing

Here are some more thoughts on the script to automate distribution of volumes across servers.

We want to balance volumes across servers both to ensure enough free space everywhere and also to ensure that the load gets spread as evenly as possible across as many spindles as possible. Ideally a piece of software would run every night and automatically move volumes about as necessary.

We will need:

  1. Some kind of partition database type thing. It might need the following bits of information:
    • server/partition tuple
    • underlying storage type RAID5, RAID10, RAID10SAS
    • its mirror partition, if there is one
    • overloading policy
    • content type - group, user, pkgs, other
    • DR type - RW + RO, or just RO
    • size
    • % free
  2. Information on volumes:
    • Usage figures. (Ross has written a script which collects these.)
    • content type (in the volume name!)
  3. A tool for moving volumes about. (Neil might know of one.)
  4. An algorithm for what goes where.
  5. A script to actually do it!
Topic revision: r5 - 31 Jul 2013 - 15:15:42 - NeilBrown
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies