AFS Top 5 Questions - last updated 26/2/2019

Before we begin, some background information on our AFS service. For general AFS background, see the Informatics user information, especially the AFS online documentation (though beware that some of this is in need of updating). The Updated local copy of the AFS reference manual is also very useful.

To get an up-to-date list of all file servers in the volume database, run the command vos listaddrs.

At 26/2/19, the list of servers was:

File Servers

Name Location Notes
kraken Forum  
huldra Forum  
nessie Forum  
yeti Forum  
lammasu Forum No Firewall Holes
gresley Forum  
peppercorn Forum  
riddles Forum Research Group Server
stanier JCMB  
maunsell JCMB  
fairburn JCMB  
churchward JCMB  
ivatt JCMB  
bulleid JCMB  
collett AT  
lemon AT  
keto AT  
ladon AT  

Database Servers

Name Alias Location Notes
afsdbvmkb afsdb0 KB VM
afsdbvmat afsdb1 AT VM
hanlon afsdb2 Forum  

Partitions

Each partition on a file service contains only one class of data (user space, group space etc) and is either a read-write partition (that is it contains read-write volumes) or a mirror partition (only contains read-only volumes). there is, by and large, a one to one mapping between read-write and mirror partitions with each volume in a given read-write partition being mirrored to the same partition on the mirror server though there are exceptions. Note that some partitions are not mirrored. A full list of partitions and their purpose can be found at

AFSPartitions

There are two types of RW partitions, those on RAID 5 and RAID 10 partitions on the commodity disk arrays and those on the internal server disks configured as RAID 10. The latter is considered to be faster and so home volumes form the majority of the data on these partitions. Most partitions are 250GB in size. Some are 500GB. To get a list of the partitions on an AFS server, use the command

vos listpart <servername>

To find out the size of the partitions, run df on the server.

Volumes

Volumes have names of the form

<type>.<name>

Volumes used by the package management system have a more complicated naming scheme beyond the scope of this document.

Possible types are:

Type Purpose Default location in AFS file tree
user User data, ie home directories /afs/inf.ed.ac.uk/user
group group data /afs/inf.ed.ac.uk/group
project project data /afs/inf.ed.ac.uk/project
src packages  
bin packages  
udir file system infrastructure  
gdir file system infrastructure  
pkgdir file system infrastructure  
root special - see AFS documentation  

so user.fred would contain the home directory of the user fred and group.killbots would contain the data owned by the killbots research group.

Volumes are mounted at particular points in the file tree under /afs and may contain files, directories and further volume mount points. Consider the following diagram showing where volumes are mounted in a typical AFS pathname:


/afs/inf.ed.ac.uk/user/f/fred
  |        |        |  |   |
root.afs   |        |  |   |
       root.cell    |  |   |
                  udir |   |
                    udir.f |
                       user.fred

To determine which volume any part of the AFS file system is located in, use the command

fs examine <pathname>

for example:

fs examine /afs/inf.ed.ac.uk/user/c/cms
File /afs/inf.ed.ac.uk/user/c/cms (536871009.1.1) contained in volume 536871009
Volume status for vid = 536871009 named user.cms
Current disk quota is 8000000
Current blocks used are 3622170
The partition has 36623421 blocks available out of 250916820

So not unreasonably, my home directory is in the volume user.cms.

Because of limitations in the number of objects a volume can contain, /afs/inf.ed.ac.uk/user is further divided into a subdirectory for each letter of the alphabet and academic year (s08, s09 etc). We don't do this for the group and project subtrees as we anticipate that the number of group and project volumes won't ever approach this limit.

Most volumes are mirrored to a server on a different site overnight. This is done as part of the preparations for the nightly TiBS backup run and is controlled by a script run on the TiBS backup server (currently Pergamon). The same script also updates the backup version of each volume (this is the volume which is mounted under Yesterday in user's home directories and gives access to the previous day's version of their file space. Everyone on the AFS pandemic team should receive the output from this release script. It's normal for a few volumes with names ending in .restore or .duff to fail to release but any normal volume failing to release is a cause for concern and should be investigated.

To see which servers a volume is mounted on, use the vos listvldb command

brunel[~] vos listvldb user.cms

user.cms 
    RWrite: 536871009     ROnly: 536871010     Backup: 536871011 
    number of sites -> 3
       server cetus.inf.ed.ac.uk partition /vicepb RW Site 
       server cetus.inf.ed.ac.uk partition /vicepb RO Site 
       server kelpie.inf.ed.ac.uk partition /vicepq RO Site 

This tells you that the read-write version of my home directory is served from cetus/vicepb and the read-only mirror is on kelpie/vicepq. Note that there's also a read-only copy of the volume on the read-write partition. Though it isn't shown here, the backup version of the volume also resides on the read-write partition.

AFS commands

There are numerous AFS commands split into several command suites according to their functions. Fortunately, the documentation for these comnmands is for once copious and (relatively) well written. You can get an overview of all AFS commands by looking at the afs manual page. There are individual manual pages for each of the command suites and manual pages for each command within the command suite. To get a list of the commands within a suite, run the command:

<command suite name> help

and to see the manual page for an individual command:

man <command suite name>_<command name>

Becoming an admin user

Some of the commands detailed below require that you have obtained an AFS token based on your kerberos admin principal. The most convenient way fo doing this is to include the following in your .brc file

alias asu='pagsh -c "export KRB5CCNAME=$KRB5CCNAME.asu \\
          && kinit $USER/admin \\
          && aklog \\
          && PS1=[\\\\h]\\\\u/admin: PS2=[\\\\h]\\\\u/admin.. /bin/bash --norc \\
          && kdestroy"'

You can then simply type asu and give your admin password to obtain the necessary token. You can check which tokens you have using the tokens command. Should you not be able to access your .brc file for any reason, the following sequence of commands will also allow you to obtain an admin token:

kinit <username>/admin
aklog

Log files

On a server, the log files for the various AFS processes can be found in /usr/afs/logs.

The Good Stuff

Enough of the background, what can go wrong with the AFS service? There are, generally speaking, 5 areas in which problems can occur:

  • The file server
  • The database server
  • The cache manager
  • The network
  • The user

How to determine where the fault lies? Let us suppose that a user is complaining that they are having trouble accessing part or all of their home directory. I suggest the following course of action:

  • Obtain admin credentials on the same machine is the user is having problems on. Try to access the user's home directory. If you succeed and all seems well, then the problem is user related. If not
  • Obtain admin credentials on a different machine and try to access the user's home directory. If you succeed and all seems well, the the problem lies with the cache manager on the original machine. You can confirm this by getting the user to try accessing their home directory on the new machine. If you are still having no luck accessing the users files:
  • run the command
vos listvldb user.<username>

You should get the following output

 vos listvldb user.cms

user.cms 
    RWrite: 536871009     ROnly: 536871010     Backup: 536871011 
    number of sites -> 3
       server squonk.inf.ed.ac.uk partition /vicepa RW Site 
       server squonk.inf.ed.ac.uk partition /vicepa RO Site 
       server unicorn.inf.ed.ac.uk partition /vicepb RO Site 

If this succeeds, it tells you two things: That the volume location database is working correctly and the name of the server the user's read-write volume resides on. If you do not get output similar to this, then the problem is in some way associated with the database servers. If the DB servers seem fine:

  • Run the command
vos listvol -s <servername> -p <partition>

with servername and _partition_coming from the output from the above command. You should get back something like:

vos listvol -s squonk -p vicepa
Total number of volumes on server squonk partition /vicepa: 264 
backup.root                       536871305 RW       3178 K On-line
backup.root.backup                536871307 BK       2251 K On-line
gdir.admin                        536880872 RW          2 K On-line
gdir.admin.backup                 536880874 BK          2 K On-line

                               .
                               .
                               .


user.v1screer                     536872219 RW      26301 K On-line
user.v1screer.backup              536872221 BK      26301 K On-line
user.v1screer.readonly            536872220 RO      26301 K On-line
user.v1swils2                     536873534 RW    4207473 K On-line
user.v1swils2.backup              536873536 BK    4207473 K On-line
user.v1swils2.readonly            536873535 RO    4207473 K On-line

Total volumes onLine 264 ; Total volumes offLine 0 ; Total busy 0

Pay close attention to the last line of the output. All volumes should be online. Any offline problems indicate a file server problem as does a failure of the command to return (though don't panic too soon as vos listvol may take some time to run. If you've been waiting 5 minutes, there's a problem.)

It may be possible using common-sense to take some shortcuts in the above. If you receive reports from 20 different users that they cannot access their home directories and all their home directories reside on the same server, then it's probably a fairly safe bet that the problem lies with the files server. On the other hand, 20 complaining users with home directories on different servers suggest database or network related problems.

Finally, we come to the

Top 5(ish) AFS questions

What should I do when a volume doesn't release?

If the daily report on volume releases seems to indicate that some volumes which you would normally expect to release successfully have failed to release, you should investigate. The following only applies to the case when a relatively small amount of volumes have failed to release. If a large number of volumes fail to release, it is almost certainly a problem with one or more file or database servers and you should consult the "What should I do about database server related problems?" and "What should I do about file server related problems?" sections below.

First check that the volume which failed to release actually has read-only volumes associated with it. It sometimes happens that new user or group volumes are created without the associated read-only volumes being created. Run the command


vos examine <volume name>

The output should look something like:

 vos examine user.cms
user.cms                          536871009 RW    3619758 K  On-line
    squonk.inf.ed.ac.uk /vicepa 
    RWrite  536871009 ROnly  536871010 Backup  536871011 
    MaxQuota    8000000 K 
    Creation    Tue Jul  7 21:46:50 2009
    Copy        Wed Jul  8 18:11:13 2009
    Backup      Wed Oct  7 06:44:13 2009
    Last Update Wed Oct  7 14:49:26 2009
    16246 accesses in the past day (i.e., vnode references)

    RWrite: 536871009     ROnly: 536871010     Backup: 536871011 
    number of sites -> 3
       server squonk.inf.ed.ac.uk partition /vicepa RW Site 
       server squonk.inf.ed.ac.uk partition /vicepa RO Site 
       server unicorn.inf.ed.ac.uk partition /vicepb RO Site 

If this listing after number of sites doesn't contain any read-only volumes, then something is clearly wrong since all volumes should have at least one read-only copy on the same server and partition as the read-write. You can add this with:


vos addsite -server <server name> -partition <partition name> -id <volume name>

Then retry the release with


vos release <volume name>

Most, though not all volumes will also have a Read-only volume on a remote site (Kings Buildings for volumes on Forum and Appleton Tower servers, the Forum and Tower are regarded as not being sufficiently distant from each other for DR purposes). The exception to this rule is for group volumes where the group has decided that a separate backup copy of their data is not required. RW partitions are divided into those which are replicated remotely and those which are not, all volumes on a given partition will share a mirror partition on the remote site. If a volume should have a remote RO copy, you should create this as well. From the above, it will be seen that the simplest way to determine where a volume's remote RO should go is to examine one of the other volumes in the same partition and see where its RO is located. The full list of steps would be:

  1. Determine the server and partition the problem volume resides on with
vos examine <volume name>
  1. get the name of the other volumes on this server and partition with:
vos listvldb -s <server name> -p <partition>

  1. Finally, find out where the remote ROs of the other volumes on the same partition using the same vos examine command as above (but substituting the name of another volume on the same partition obviously).

Once you've found where the volume should go, create it using the vos addsite command as above:


vos addsite -server <server name> -partition <partition name> -id <volume name>

and release the volume again.

There is one set of circumstances where this won't work. Some volumes are so large that they have a partition to themselves. If this is the case, consult AFSPartitions (this may actually be the easier option in all cases) which, as mentioned above, contains information about all the AFS partitions, including the location of any off-site mirror partition.

Another cause of a release failing is that the volume may be locked as part of some other operation on it. Volumes are normally only locked for a few seconds or so but can remain locked if the operation on the volume is interrupted prematurely, for example by a server crashing. If the volume is locked, the last line of output from the vos examine command will be Volume is locked.

To determine whether it is safe to unlock the volume, run the following command substituting the name of the server the read-write volume resides on for server name:


vos status <server name>

If the output from this command is


No active transactions on <server name>

it's safe to unlock the volume with the command:


vos unlock <volume name>

Then retry the release.

If the output from the vos status command indicated that the server is busy and this continues for more than a few minutes, this is probably an indication of problems on the file server.

If the output from the vos examine command returns an error and only prints out the second part of the example output from the line


   RWrite: 536871009     ROnly: 536871010     Backup: 536871011  

this is an indication that the volume is currently off-line. Follow the instructions below for bringing the volume back on-line and retry the release.

Finally, if none of the above have provided an answer and the volume still fails to release, rerun the release command with -verbose flag:


vos release <volume name> -verbose

This will produce a large amount of output which may provide some insight into what is going wrong.

What should I do about user related problems?

Have the user run renc to obtain a new AFS token. If that doesn't solve the problem, view the ACLs for the user's home directory (you will need to asu to do this) using the command

fs la ~<username>

and check that a line like <username> rlidwka

appears in the output. If it doesn't restore the user's access to their home directory with the command

fs sa ~<username> <username> all

You may also need to check the permissions of the subdirectories in the user's home directory (remember ACLs only apply to directories in AFS and changes aren't recursive).

What should I do about cache manager related problems?

There are three commands available for resetting the cache, fs flush which flushs individual files or directories from the cache, fs flushvolume which flushes all entries associated with a given volume and fs flushmount which flushes all information associated with a mountpoint. Pragmatically, you may find it more expedient to simply reboot the affected machine if possible.

What should I do about database server related problems?

First ascertain whether the problem lies with an individual database server or all three. If all three are affected, the problems are more likely to be network related. Check that you can ping all three of the machines. Use the command

bos status <servername> to check which AFS processes the server is running. The output should look like:

bos status afsdb0
Instance ptserver, currently running normally.
Instance vlserver, currently running normally.

The ptserver is the protection database server which deals with controlling access to data. The vlserver is the volume location database server which keeps track of where volumes are located. The buserver is the backup server and will not normally concern us. If one or more of these processes doesn't seem to be running, reboot the server and see if that fixes the problem.

You can obtain further information about what the database servers are doing using the udebug command. for example:

udebug afsdb0 7003
Host's addresses are: 129.215.64.16 
Host's 129.215.64.16 time is Fri Oct  2 16:07:09 2009
Local time is Fri Oct  2 16:07:09 2009 (time differential 0 secs)
Last yes vote for 16.64.215.129 was 13 secs ago (sync site); 
Last vote started 13 secs ago (at Fri Oct  2 16:06:56 2009)
Local db version is 1254020518.81376
I am sync site until 47 secs from now (at Fri Oct  2 16:07:56 2009) (3 servers)
Recovery state 1f
Sync site's db version is 1254020518.81376
0 locked pages, 0 of them for write
Last time a new db version was labelled was:
         475511 secs ago (at Sun Sep 27 04:01:58 2009)

Server (129.215.64.18): (db 1254020518.81376)
    last vote rcvd 13 secs ago (at Fri Oct  2 16:06:56 2009),
    last beacon sent 13 secs ago (at Fri Oct  2 16:06:56 2009), last vote was yes
    dbcurrent=1, up=1 beaconSince=1

Server (129.215.64.17): (db 1254020518.81376)
    last vote rcvd 13 secs ago (at Fri Oct  2 16:06:56 2009),
    last beacon sent 13 secs ago (at Fri Oct  2 16:06:56 2009), last vote was yes
    dbcurrent=1, up=1 beaconSince=1

this amount of output, and in particular the line beginning I am sync site shows that afsdb0 is the master server for the volume location database (port 7003, the ptserver is 7002). If we look at one of the slave servers, we get:


 udebug afsdb2 7003
Host's addresses are: 129.215.64.18 
Host's 129.215.64.18 time is Fri Oct  2 16:09:02 2009
Local time is Fri Oct  2 16:09:03 2009 (time differential 1 secs)
Last yes vote for 16.64.215.129 was 6 secs ago (sync site); 
Last vote started 6 secs ago (at Fri Oct  2 16:08:57 2009)
Local db version is 1254020518.81376
I am not sync site
Lowest host 129.215.64.16 was set 6 secs ago
Sync host 129.215.64.16 was set 6 secs ago
Sync site's db version is 1254020518.81376
0 locked pages, 0 of them for write
 

What to do if a database server is down and cannot easily be restored to service? In theory it shouldn't matter since the remaining two database servers should be quite capable of coping (in fact even a single server should be ok). Unfortunately there is a problem. When the AFS client starts up on a host, it selects one database server to communicate with. If that database server goes off-line, it should switch to using one of the remaining database servers but a bug in the AFS client means that this does not happen as quickly as it should. If roughly a third of the user base starts complaining that access to the file system has suddenly slowed dramatically, it's a fair bet that one of the database servers has failed. Rebooting a client should make it use a different database server. Alternatively, you can install a new database server by following the instructions at AFSInstallingServer. Remember that the new server MUST have the same IP address as the failed server.

What should I do about file server related problems?

This very much depends on the nature of the problem. Once you have identified the server on which the problem volume resides, check the status of the server using the bos status command. This should normally return something like:


bos status squonk

Instance fs, currently running normally.
    Auxiliary status is: file server running.

If instead, the return from bos status says that the file server is Salvaging file system, it means that some sort of problem has been detected with the AFS file space on that machine and that the AFS salvager (which could perhaps be considered to be roughly the equivalent of fsck for AFS file systems) is being run to correct any errors. None of the volumes served from this file server will be available until the salvage process has completed. You can keep track of the process of this salvage by checking the contents of the log files with names beginning SalvageLog in the AFS log file directory /usr/afs/logs. Our file servers are configured to run five salvage processes in parallel and so there will be five of these files to check. Once the salvage has completed, see if this has fixed the problem. The length of time the salvage will take depends on the number of volumes and the size of these volumes in the partition. It can take several hours for the salvage of a large partition with many files to complete. Monitor the salvage log to follow progress.

If it hasn't, or if the return from bos status reports that the file server is running normally, check the status of the AFS partition the volume resides in with the vos listvol command:

vos listvol -s <servername> -p <partition name>

for example

vos listvol -s squonk -p vicepa
Total number of volumes on server squonk partition /vicepa: 264 
backup.root                       536871305 RW       3178 K On-line
backup.root.backup                536871307 BK       2251 K On-line
gdir.admin                        536880872 RW          2 K On-line
gdir.admin.backup                 536880874 BK          2 K On-line

                               .
                               .
                               .


user.v1screer                     536872219 RW      26301 K On-line
user.v1screer.backup              536872221 BK      26301 K On-line
user.v1screer.readonly            536872220 RO      26301 K On-line
user.v1swils2                     536873534 RW    4207473 K On-line
user.v1swils2.backup              536873536 BK    4207473 K On-line
user.v1swils2.readonly            536873535 RO    4207473 K On-line

Total volumes onLine 264 ; Total volumes offLine 0 ; Total busy 0

We now run the Demand Attached File Server (DAFS ) on all our fileservers. In theory, this should mean that you will never need to run the salvager since the filesystem should do this automatically when issues such as a volume not being attached or being corrupted are detected. It is possible however that a volume may become corrupt without the file system detecting it. If this happens, you should follow this procedure:

There are three forms of the command to run the salvager manually

bos salvage -server <server> -volume <volume name>

bos salvage -server <server> -partition <partition name>

bos salvage -server <server> -all

The first salvages an individual volume, the second a single partition and the third every AFS partition on the server. When salvaging one or more partitions, the file server is shut down until the salvage is complete and none of the volumes served from the server will be available. If only a single volume is being salvaged, the file server continues to run and access to the remaining volumes on the server will be uninterrupted. If the majority of volumes on the server are still available, it's preferable to salvage the off-line volumes individually rather than taking the file server off-line for all the users of that file server.

After salvaging has completed, the affected volumes should be back on-line and once more accessible. If you continue to see volumes going off-line on this server, there may be problems with the underlying storage and you may wish to consult the appropriate expert in the field.

The above assumes that only a few volumes on the server are affected but what if the server itself or the underlying storage device has failed? There is a procedure available for converting the read-only copy of a volume to the read-write version (this only makes sense with read-only volumes stored on a different server and underlying storage device of course) but this is not something to be undertaken lightly since undoing this action once the original hardware is once more available takes an extremely large amount of effort. On the other hand, users are likely to be extremely unhappy if their files are unavailable for more than a couple of days...

Consider the following:

 vos listvldb user.cms

user.cms 
    RWrite: 536871009     ROnly: 536871010     Backup: 536871011 
    number of sites -> 3
       server squonk.inf.ed.ac.uk partition /vicepa RW Site 
       server squonk.inf.ed.ac.uk partition /vicepa RO Site 
       server unicorn.inf.ed.ac.uk partition /vicepb RO Site 

This shows that the read-write version of my home volume is on the file server squonk in the partition vicepa. As explained above, there is a read-only version of my volume in the same place. There is also a read-only copy of my home volume on the file server unicorn in partition vicepb. If squonk has suffered some kind of a meltdown and my home volume is likely to be unavailable for several days, I can turn the read-only copy of my home volume on unicorn into the read write version by running the command


vos convertROtoRW -server unicorn -partition vicepb -id user.cms

In my experience, the newly converted RW volume will be off-line once the command has completed and you will have to salvage the volume (see above) to bring it back on-line. You can get a list of all the volumes in a partition even if the file server is down using

vos listvldb -s <server name> -p <partition name >

To get a list of all the volumes on a server, omit the partition argument in the above command.

Once you have converted a read-only volume to read-write, the original read-write volume cannot be allowed to come on-line again or file system confusion will reign. Sorting out the after effects of running convertROtoRW is something that should only be undertaken by experts after much thoughtful consideration.

What should I do about network related problems?

That alas is beyond the scope of this document. Try fixing Network Things

-- CraigStrachan - 26/3/2019

Topic revision: r17 - 25 Apr 2019 - 10:40:40 - CraigStrachan
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies