Development Meeting AFS On Datastore.ed Update 3/9/2014


Project Devproj page - note/blog/diary Project268AFSonECDF

As part of the University's Research Data Management (RDM) service, aka, the University is providing 500GB of storage space per researcher. But being an IS service, the default access mechanism is either via NFS, CIFS (Samba) or SSHFS. We'd like to see about getting our share of the space via AFS. This project is to investigate (and implement if possible) just that.

I may mention ECDF when talking about this, this is because I thought the space was coming from that project, but it's not it is this RDM thing. However it's pretty much the same IS staff, Orlando and co. that are running both, and based on similar technology.

Initial tests

We knew that datastore.ed was based on GPFS, so as Iain R had a test GPFS cluster I was able to do some basic tests here while waiting to make contact with the IS people. They seem very under resourced (staff wise) and so getting time with them can be tricky. Our test showed it did "just work". We had an AFS file server in our cell, mounting a bit of GPFS space as /vicepg/ and, as it presents a POSIX filesystem, the AFS fileserver process was happy to use it.

More recently - Current state

In June of this year, the ECDF people had finished what ever was keeping them busy, and they were able to concentrate on getting the College of Science and Engineering space on the RDM up and running, and hence had more time to deal with me. I had a meeting with Orlando and Jan (who'd be doing most of the work at their end), and since then we now have a test cell called . It currently has one KVM based AFSDB server (dsone.inf) and two fileservers: dstwo.inf - another VM with the structural volumes on it; and which is the real datastore.ed hardware up at KB (though it is currently running on a development node).

So far we've just done some basic testing, and again it does seem to "just work", however there's some management stuff which needs to be solved, eg backups, more on this later.

Issues and decisions to discuss

The cell

The way we have things set up at the moment is that the fileserver (FS) at datastore.ed is a Centos machine managed by IS. We could have given them our cell key tab, so that the FS running up there was just part of our cell, and the space would just appear under /afs/, and we could manage what we need via the bos and vos commands. However, being in possession of that keyfile would potentially allow anyone with root access to the IS machine the ability to access any of the files in our AFS. Given some of our users and their NDAs, this was not acceptable to us, so a separate cell, with a separate key was generated. This does mean that anyone with access to the FS can access any of the files on their hardware, but that would be the case regardless of AFS or not.

Having the separate cell will mean a bit more of a management overhead, and possible inconvenience to the users ie it won't be seemless with our existing cell. At one of my recent meetings with Jan, he said that only 4 members of staff have sufficient access to the datastore hardware to be able to read the key file, so perhaps we may want to revisit this decision?

The PTS database

Again a separate cell, means a separate PTS (user) database. As a first step I just copied all our existing user (not groups) into the datastore cell, in a production service we'd need to decide how we want (if we want) to keep this in sync with our normal inf cell. The correct way would seem to be to add it to prometheus, so just as it adds new users to the inf cell, it would add them to the datastore.inf cell too.

Another possibility, is we could authenticate the cell against EASE. In an earlier test I tried this, and it does just work. It seemed it might be useful if people wanted to grant other Uni people access via AFS, without having to worry about cross-realm entries in the PTS (they effectively get a random id's in the PTS database, where as they would/could have their Uni allocated UID). However we'd then need to be a happy that we could create the equivalent of our /admin identities in EASE so we could manage things as we do just now, plus I need to check that AFS could cope with the full range of UIDs that the Uni uses.


They use IBM's Tivoli to backup the datastore, and can go back 60(?) days to recover deleted files. However, that isn't going to be much use for us. It should be fine for disaster recover, but not for "undeletes", as they'd have no way to know which bit of the AFS partition relates to a particular AFS file. So you'd be looking at restoring a whole partition to the point of time of interest and then mounting all those volumes on a server to find the one volume and file you are interested in. And this would all need to be done without interfering with the currently running volumes.

Jan's going to see if Tivoli is AFS aware, and if not, we'll have to look at some sort of just walking the filesystem type solution like we used to do for Networker before we moved to TiBs

Their existing service also supports snapshotting, so you can just cd to a hidden path and find files from any day in the last 2 weeks (I think Jan said), a bit like our /Yesterday but with more history. We could still provide a /Yesterday via the local RO volumes or Backup volumes.

Number of AFS partitions and servers

From a quick look, Jan seemed to think we had 1000 people entitled to their 500GB in Informatics, which can't be right, can it? Anyway, that would be 500TBs of storage. Apparently we're only entitled to physical server per college, so that's only one AFS server for us serving our 500TBs. The recommended maximum size for a parition is 2TB, so that's 250 partitions (vicepa, vicepb, etc). The maximum number of partitions a Fileserver can have is 255! So providing we don't have more than 1000 people entitled use datastore.inf, we'll just be OK.

If we need another server, we'd have to pay for it.

Managing the space

We haven't really got a solution to this yet. But probably us just managing volume quotas like we do just now, and ECDF only worrying about the total space we are using.


If for some reason we don't think the AFS route is going to work, then we'll have to see what can be achieved via the datastore's officially supported access mechanisms - see

Certainly smbclient \\\\\\cse -W ED -U neilb works, but I was expecting to see the list of 1000 users I saw when Jan was logged in directly to the machine and browsing the file system. I'm probably just using the wrong path.

But something like sshfs -o intr,large_read,auto_cache,workaround=all -oPort=22222 /tmp/nb does work, and in /tmp/nb/datastore/inf/users/ there is in fact 1355 users! So that would be roughly and extra 180TB of space, so another 90x2TB partitions, puting us over the 255 partitions per server.

Both of the above ask for your AD password.

Next steps

  • Some more concrete bench marking
  • Make sure backups are possible/feasible.
  • Decision on cell, authentication, integration with prometheus
  • How to manage the space: People can donate their allocation to a larger group.

I've spent about 9 days effort on this, 3 weeks were allocated.

Notes from the Development Meeting

  • Even if there are only 4 IS staff, it will not be acceptable for them to have access to our Cell keyfile.
  • Use prometheus to drive the population of the datastore PTS
  • Purchasing hardware to overcome number of /vicep partitions limit is not an issue, should the limit become an issue.
  • Investigate RDMs NFS v3, and v4 if available, support. As a possible fall back solution (if AFS isn't going to be workable)
  • Why have I not investigated the hardware only solution ie we claim our number of spindles, and attach them to our hardware at RDM?
  • Have this project finished by the end of the year.

-- NeilBrown - 02 Sep 2014

Topic revision: r3 - 09 Jan 2015 - 16:38:59 - NeilBrown
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies