Dynamic AFS Quotas

As part of the ServicesUnitAFSEnhancements we plan to implement some form of dynamic quotas to make better use of our disk space.

The plan will be to give classes of people a maximum AFS home directory quota, eg 2GB for 1st year undergrads 20GB for staff. As not all users will use up to their limit, it seems a waste of space to allocate all the disk quota on what may be used. So even though your max quota may be 20GB. If you're only using 1GB just now, then your quota may actually be set to 2GB, but as your usage increased your quota would also increase automatically, up to a maximum of 20GB.

We will then have a better view of how much disk space is actually required.

Possible Problems

1. How quickly would the quota grow? A new user may have no files and so a small quota, but if the first thing they do is copy a large set of files from a previous place of work, then they may hit their initial quota before the next automated increase. Depending on the application, people hitting their quota can lead to silent loss of data as file writes fail.

2. Currently we tend not to "over subscribe" AFS partitions (disk space that holds multiple AFS volumes). So if everyone on that partition decides to fill up their volume to their quota limit, there will be space on the partition to accommodate the files. However if quotas become dynamic then a currently "under subscribed" partition could suddenly become over subscribed (as user quotas increase), meaning that if everyone then did use up to their new enlarged quota, the disk would physically run out of space. This scenario is worse than a user just hitting their quota, as someone who may still be under quota may not be able to update files as the disk is now full.

3. anything else?

Solutions

For 1. user education of dynamic quotas will help, and if they know how things work, eg quotas updated every hour, then that might be enough. If not, then perhaps we can provide an "increase me now" service. Perhaps via an authenticated web tool or remctl function.

For 2. we plan to have some automated tools to automatically move AFS volumes from filling partitions to partitions with uncommitted space. Such a tool will be quite hard to implement in an intelligent manner, so in the first instance simple reporting, eg perhaps via nagios to request human intervention will be required. To avoid partitions being oversubscribed in the first place, perhaps the dynamic quotas mechanism will refuse to increase a quota if it will leave the partition over subscribed. At least then only the person approaching their quota is at risk, rather than everyone on that partition.

Current Plan

  • Replace the existing "setafsquotas" script with one that implements dynamic quotas.
  • This script would run more frequently than the current twice a day.
  • Use capabilities/entitlements to opt users in/out of the dynamic quotas, so some people will get all they are due at once. No longer necessary as we'll only ever increase quotas. If someone does want all their quota at once, support just needs to give it to them and it will stick.
  • Only increase quotas, we won't shrink quotas back down if users then delete lots of files.
  • Don't increase a user's quota if it will over subscribe a partition. The only snag with this is working out what the current subscription of the partition is time consuming, and hence will slow down the script. cf how long it takes to work out https://groups.inf.ed.ac.uk/cos/disk_script/genafspartitionusage.cgi. Perhaps some caching or generalisations can be done.

Currently what is implemented, is:

  1. if the DB decided quota (via roles/capabilities) is less than the user's current quota, then the users current quota is set to the DB decided one. ie it is shrunk, presumably because someone has lost their staff/student role that entitled them to the larger quota.
  2. otherwise, even though their DB quota may be 10GB, their current disk usage is looked at, and their current quota is set to roughly 1GB more than they are currently using (though not exceeding the DB quota).

For item 2, quotas are actually set on 1GB boundaries, if their usage is currently within 600MB of their current quota, then their quota is increased by 1GB. So a user using 1.2GB will have their quota set at 2GB, once their usage increases past 1.4GB, then their quota will increase to 3GB, and remain there until their usage increases beyond 2.4GB, when it will be increased to 4GB, and so on until their max DB quota is reached.

Other musings

  • should we have a different starting quota for different classes of people, eg for staff the min quota would start at 5GB, but for ug1 or visitors, 500MB?
  • could the automated increase amount depend on how close to their current quota they are, eg if someone's current use is 3GB on a 5GB quota and the next run they are now 4.9GB should the quota be increased to 8GB, rather than 6GB (assuming a default of 1GB increments)?
  • Should we also reduce the dynamic quota if people do start using less disk space?
  • If we were to do reductions, perhaps all automated increases could be large increases, say +5GB, and in subsequent runs it is reduced gradually (say 500MB) until the quota reaches the agreed buffer. A problem with this, would be either recording the time the last increase in quota was made, and/or how often we run the "dynamic quota" script.
  • Try to draw some diags explaining how quota and usage relate, easier to visualise the planned behaviour.

-- NeilBrown - 31 Jul 2013

Edit | Attach | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r6 - 02 Dec 2013 - 16:25:01 - NeilBrown
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies