Further AFS Development and Automation

This is the final report for devproj.inf project number 135. There was a little extra work done on this in 2013T2 Project135FinalReportPart2.

The project's aim was to complete as many as possible of a list of possible enhancements in the time available.

Automate conversion of RO volumes to RW
The most urgent task on the list was to provide a utility which promoted read-only volumes to read-write status. This was for use only in an emergency. This task has been done. The script, written by Chris, is called promoteRO and is installed on our AFS fileservers. The documentation is provided in the form of a man page.
Wrapper for Long Running Jobs
Craig wrote and tested this script. It's called longjob and is installed on all DICE machines. The documentation is provided in the form of a man page.
Script to automate distribution of volumes across servers
Craig and Chris thought about this one quite a lot, and did some research on how other AFS sites tackle this problem. The most helpful writeup we found was Russ Allbery's paper on AFS Server Balancing. We concluded, as Russ Allbery had before us, that the problem is not nearly as simple as it first appears. Factors to be taken into account when considering the automatic moving of volumes between servers include size, usage, expected rate of growth and capability of the hardware. Any solution which takes all of these factors into account, correctly identifies the volumes most in need of moving, identifies the most suitable destinations for them, then proceeds to do the moves, and does all this correctly, while taking up less human time than an equivalent manual management of AFS volumes, has got to be extremely complex. Russ Allbery found the system so complex that it seemed only to be expressible in terms of linear programming problems which took quite a few hours to solve using Stanford's compute servers and commercial linear programming optimisation software. In fact, the problem had to be simplified in order to avoid overwhelming the available resources. In view of this, we shelved this problem for further thought at a later date.
Statistics
It was thought that most of the possible enhancements would benefit from having an accurate idea of our cell's AFS usage and how it develops over time. With that in mind a script called get_afs_info was written and installed on the backup server alexandria. It runs there every night, adding a fresh day's AFS stats into the /usr/afsstats directory. For each volume in the cell it collects the information below. In time, analysis of this data should help us in the management of our AFS infrastructure, since we'll be able to see how quickly the amount of data changes, how many new volumes we get per year, how evenly distributed they are by size, number of volumes, number of accesses and so on.
  • The volume name.
  • The volume type (RW, RO or BK).
  • The current size of the volume.
  • The volume quota.
  • The number of accesses to the volume in the last day.
  • The ID number of the volume.
  • The ID number of its parent volume.
  • The ID number of its backup volume.
  • The ID number of its clone volume.
  • The server on which the volume resides.
  • The partition on which the volume resides.
  • The free space available on the partition.
  • The total space available on the partition.
Other enhancements
the other enhancements on the list seemed mostly to be dependent on having a solution to the automated distribution of volumes across servers, so were shelved pending a rethink or the availability of enough effort to tackle a large project.

Facilities Used by the Project

A new AFS cell was established for the project. An AFS database server and two fileservers were installed on virtual machines. The test cell proved very useful: AFS commands and volumes could be tried out, and scripts tested, without fear of causing disaster or disruption to any real data or users.

Time Taken

The project took 122 hours, a little under four weeks.

Topic revision: r2 - 03 Dec 2013 - 16:04:26 - NeilBrown
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies