AFS Top 5 Questions - last updated 4/2/2011
Before we begin, some background information on our AFS service. For general AFS background, see the
Informatics user information, especially the
AFS online documentation (though beware that some of this is in need of updating). The
Updated local copy of the AFS reference manual is also very useful.
To get an up-to-date list of all file servers in the volume database, run the command
vos listaddrs.
At 26/2/19, the list of servers was:
File Servers
Name |
Location |
Notes |
kraken |
Forum |
|
huldra |
Forum |
|
nessie |
Forum |
|
yeti |
Forum |
|
lammasu |
Forum |
No Firewall Holes |
gresley |
Forum |
|
peppercorn |
Forum |
|
riddles |
Forum |
Research Group Server |
stanier |
JCMB |
|
maunsell |
JCMB |
|
fairburn |
JCMB |
|
churchward |
JCMB |
|
ivatt |
JCMB |
|
bulleid |
JCMB |
|
collett |
AT |
|
lemon |
AT |
|
keto |
AT |
|
ladon |
AT |
|
Database Servers
Partitions
Each partition on a file service contains only one class of data (user space, group space etc) and is either a read-write partition (that is it contains read-write volumes) or a mirror partition (only contains read-only volumes). there is, by and large, a one to one mapping between read-write and mirror partitions with each volume in a given read-write partition being mirrored to the same partition on the mirror server though there are exceptions. Note that some partitions are not mirrored. A full list of partitions and their purpose can be found at
AFSPartitions
There are two types of RW partitions, those on RAID 5 and RAID 10 partitions on the commodity disk arrays and those on the internal server disks configured as RAID 10. The latter is considered to be faster and so home volumes form the majority of the data on these partitions. Most partitions are 250GB in size. Some are 500GB. To get a list of the partitions on an AFS server, use the command
vos listpart <
servername>
To find out the size of the partitions, run
df
on the server.
Volumes
Volumes have names of the form
<
type>.<
name>
Volumes used by the package management system have a more complicated naming scheme beyond the scope of this document.
Possible types are:
Type |
Purpose |
Default location in AFS file tree |
user |
User data, ie home directories |
/afs/inf.ed.ac.uk/user |
group |
group data |
/afs/inf.ed.ac.uk/group |
project |
project data |
/afs/inf.ed.ac.uk/project |
src |
packages |
|
bin |
packages |
|
udir |
file system infrastructure |
|
gdir |
file system infrastructure |
|
pkgdir |
file system infrastructure |
|
root |
special - see AFS documentation |
|
so user.fred would contain the home directory of the user fred and group.killbots would contain the data owned by the killbots research group.
Volumes are mounted at particular points in the file tree under /afs and may contain files, directories and further volume mount points. Consider the following diagram showing where volumes are mounted in a typical AFS pathname:
/afs/inf.ed.ac.uk/user/f/fred
| | | | |
root.afs | | | |
root.cell | | |
udir | |
udir.f |
user.fred
To determine which volume any part of the AFS file system is located in, use the command
fs examine <pathname>
for example:
fs examine /afs/inf.ed.ac.uk/user/c/cms
File /afs/inf.ed.ac.uk/user/c/cms (536871009.1.1) contained in volume 536871009
Volume status for vid = 536871009 named user.cms
Current disk quota is 8000000
Current blocks used are 3622170
The partition has 36623421 blocks available out of 250916820
So not unreasonably, my home directory is in the volume user.cms.
Because of limitations in the number of objects a volume can contain, /afs/inf.ed.ac.uk/user is further divided into a subdirectory for each letter of the alphabet and academic year (s08, s09 etc). We don't do this for the group and project subtrees as we anticipate that the number of group and project volumes won't ever approach this limit.
Most volumes are mirrored to a server on a different site overnight. This is done as part of the preparations for the nightly TiBS backup run and is controlled by a script run on the TiBS backup server (currently Pergamon). The same script also updates the backup version of each volume (this is the volume which is mounted under Yesterday in user's home directories and gives access to the previous day's version of their file space. Everyone on the AFS pandemic team should receive the output from this release script. It's normal for a few volumes with names ending in .restore or .duff to fail to release but any normal volume failing to release is a cause for concern and should be investigated.
To see which servers a volume is mounted on, use the
vos listvldb command
brunel[~] vos listvldb user.cms
user.cms
RWrite: 536871009 ROnly: 536871010 Backup: 536871011
number of sites -> 3
server cetus.inf.ed.ac.uk partition /vicepb RW Site
server cetus.inf.ed.ac.uk partition /vicepb RO Site
server kelpie.inf.ed.ac.uk partition /vicepq RO Site
This tells you that the read-write version of my home directory is served from cetus/vicepb and the read-only mirror is on kelpie/vicepq. Note that there's also a read-only copy of the volume on the read-write partition. Though it isn't shown here, the backup version of the volume also resides on the read-write partition.
AFS commands
There are numerous AFS commands split into several command suites according to their functions. Fortunately, the documentation for these comnmands is for once copious and (relatively) well written. You can get an overview of all AFS commands by looking at the
afs manual page. There are individual manual pages for each of the command suites and manual pages for each command within the command suite. To get a list of the commands within a suite, run the command:
<
command suite name> help
and to see the manual page for an individual command:
man <
command suite name>_<
command name>
Becoming an admin user
Some of the commands detailed below require that you have obtained an AFS token based on your kerberos admin principal. The most convenient way fo doing this is to include the following in your
.brc file
alias asu='pagsh -c "export KRB5CCNAME=$KRB5CCNAME.asu \\
&& kinit $USER/admin \\
&& aklog \\
&& PS1=[\\\\h]\\\\u/admin: PS2=[\\\\h]\\\\u/admin.. /bin/bash --norc \\
&& kdestroy"'
You can then simply type
asu and give your admin password to obtain the necessary token. You can check which tokens you have using the
tokens command. Should you not be able to access your
.brc file for any reason, the following sequence of commands will also allow you to obtain an admin token:
kinit <username>/admin
aklog
Log files
On a server, the log files for the various AFS processes can be found in
/usr/afs/logs.
The Good Stuff
Enough of the background, what can go wrong with the AFS service? There are, generally speaking, 5 areas in which problems can occur:
- The file server
- The database server
- The cache manager
- The network
- The user
How to determine where the fault lies? Let us suppose that a user is complaining that they are having trouble accessing part or all of their home directory. I suggest the following course of action:
- Obtain admin credentials on the same machine is the user is having problems on. Try to access the user's home directory. If you succeed and all seems well, then the problem is user related. If not
- Obtain admin credentials on a different machine and try to access the user's home directory. If you succeed and all seems well, the the problem lies with the cache manager on the original machine. You can confirm this by getting the user to try accessing their home directory on the new machine. If you are still having no luck accessing the users files:
- run the command
vos listvldb user.<username>
You should get the following output
vos listvldb user.cms
user.cms
RWrite: 536871009 ROnly: 536871010 Backup: 536871011
number of sites -> 3
server squonk.inf.ed.ac.uk partition /vicepa RW Site
server squonk.inf.ed.ac.uk partition /vicepa RO Site
server unicorn.inf.ed.ac.uk partition /vicepb RO Site
If this succeeds, it tells you two things: That the volume location database is working correctly and the name of the server the user's read-write volume resides on. If you do not get output similar to this, then the problem is in some way associated with the database servers. If the DB servers seem fine:
vos listvol -s <servername> -p <partition>
with
servername and _partition_coming from the output from the above command. You should get back something like:
vos listvol -s squonk -p vicepa
Total number of volumes on server squonk partition /vicepa: 264
backup.root 536871305 RW 3178 K On-line
backup.root.backup 536871307 BK 2251 K On-line
gdir.admin 536880872 RW 2 K On-line
gdir.admin.backup 536880874 BK 2 K On-line
.
.
.
user.v1screer 536872219 RW 26301 K On-line
user.v1screer.backup 536872221 BK 26301 K On-line
user.v1screer.readonly 536872220 RO 26301 K On-line
user.v1swils2 536873534 RW 4207473 K On-line
user.v1swils2.backup 536873536 BK 4207473 K On-line
user.v1swils2.readonly 536873535 RO 4207473 K On-line
Total volumes onLine 264 ; Total volumes offLine 0 ; Total busy 0
Pay close attention to the last line of the output. All volumes should be online. Any offline problems indicate a file server problem as does a failure of the command to return (though don't panic too soon as
vos listvol may take some time to run. If you've been waiting 5 minutes, there's a problem.)
It may be possible using common-sense to take some shortcuts in the above. If you receive reports from 20 different users that they cannot access their home directories and all their home directories reside on the same server, then it's probably a fairly safe bet that the problem lies with the files server. On the other hand, 20 complaining users with home directories on different servers suggest database or network related problems.
Finally, we come to the
Top 5(ish) AFS questions
- What should I do when a volume doesn't release?
If the daily report on volume releases seems to indicate that some volumes which you would normally expect to release successfully have failed to release, you should investigate. The following only applies to the case when a relatively small amount of volumes have failed to release. If a large number of volumes fail to release, it is almost certainly a problem with one or more file or database servers and you should follow the procedures detailed below.
First check that the volume which failed to release actually has read-only volumes associated with it. It sometimes happens that new user or group volumes are created without the associated read-only volumes being created. Run the command
vos examine <volume name>
The output should look something like:
vos examine user.cms
user.cms 536871009 RW 3619758 K On-line
squonk.inf.ed.ac.uk /vicepa
RWrite 536871009 ROnly 536871010 Backup 536871011
MaxQuota 8000000 K
Creation Tue Jul 7 21:46:50 2009
Copy Wed Jul 8 18:11:13 2009
Backup Wed Oct 7 06:44:13 2009
Last Update Wed Oct 7 14:49:26 2009
16246 accesses in the past day (i.e., vnode references)
RWrite: 536871009 ROnly: 536871010 Backup: 536871011
number of sites -> 3
server squonk.inf.ed.ac.uk partition /vicepa RW Site
server squonk.inf.ed.ac.uk partition /vicepa RO Site
server unicorn.inf.ed.ac.uk partition /vicepb RO Site
If this listing doesn't contain any read-only volumes, create the volumes in the appropriate partitions using the command
vos addsite -server <server name> -partition <partition name> -id <volume name>
Then retry the release with
vos release <volume name>
Another cause of a release failing is that the volume may be locked as part of some other operation on it. Volumes are normally only locked for a few seconds or so but can remain locked if the operation on the volume is interrupted prematurely, for example by a server crashing. If the volume is locked, the last line of output from the
vos examine command will be
Volume is locked.
To determine whether it is safe to unlock the volume, run the following command substituting the name of the server the read-write volume resides on for
server name:
vos status <server name>
If the output from this command is
No active transactions on <server name>
it's safe to unlock the volume with the command:
vos unlock <volume name>
Then retry the release.
If the output from the
vos status command indicated that the server is busy and this continues for more than a few minutes, this is probably an indication of problems on the file server.
If the output from the
vos examine command returns an error and only prints out the second part of the example output from the line
RWrite: 536871009 ROnly: 536871010 Backup: 536871011
this is an indication that the volume is currently off-line. Follow the instructions below for bringing the volume back on-line and retry the release.
Finally, if none of the above have provided an answer and the volume still fails to release, rerun the release command with
-verbose flag:
vos release <volume name> -verbose
This will produce a large amount of output which may provide some insight into what is going wrong.
- What should I do about user related problems?
Have the user run renc to obtain a new AFS token. If that doesn't solve the problem, view the ACLs for the user's home directory using the command
fs la ~<username>
and check that a line like
<username> rlidwka
appears in the output. If it doesn't restore the user's access to their home directory with the command
fs sa ~<username> <username> all
You may also need to check the permissions of the subdirectories in the user's home directory (remember ACLs only apply to directories in AFS and changes aren't recursive).
- What should I do about cache manager related problems?
There are three commands available for resetting the cache,
fs flush which flushs individual files or directories from the cache,
fs flushvolume which flushes all entries associated with a given volume and
fs flushmount which flushes all information associated with a mountpoint. Pragmatically, you may find it more expedient to simply reboot the affected machine if possible.
- What should I do about database server related problems?
First ascertain whether the problem lies with an individual database server or all three. If all three are affected, the problems are more likely to be network related. Check that you can ping all three of the machines. Use the command
bos status <servername>
to check which AFS processes the server is running. The output should look like:
bos status afsdb0
Instance ptserver, currently running normally.
Instance vlserver, currently running normally.
The ptserver is the protection database server which deals with controlling access to data. The vlserver is the volume location database server which keeps track of where volumes are located. The buserver is the backup server and will not normally concern us. If one or more of these processes doesn't seem to be running, reboot the server and see if that fixes the problem.
You can obtain further information about what the database servers are doing using the
udebug command. for example:
udebug afsdb0 7003
Host's addresses are: 129.215.64.16
Host's 129.215.64.16 time is Fri Oct 2 16:07:09 2009
Local time is Fri Oct 2 16:07:09 2009 (time differential 0 secs)
Last yes vote for 16.64.215.129 was 13 secs ago (sync site);
Last vote started 13 secs ago (at Fri Oct 2 16:06:56 2009)
Local db version is 1254020518.81376
I am sync site until 47 secs from now (at Fri Oct 2 16:07:56 2009) (3 servers)
Recovery state 1f
Sync site's db version is 1254020518.81376
0 locked pages, 0 of them for write
Last time a new db version was labelled was:
475511 secs ago (at Sun Sep 27 04:01:58 2009)
Server (129.215.64.18): (db 1254020518.81376)
last vote rcvd 13 secs ago (at Fri Oct 2 16:06:56 2009),
last beacon sent 13 secs ago (at Fri Oct 2 16:06:56 2009), last vote was yes
dbcurrent=1, up=1 beaconSince=1
Server (129.215.64.17): (db 1254020518.81376)
last vote rcvd 13 secs ago (at Fri Oct 2 16:06:56 2009),
last beacon sent 13 secs ago (at Fri Oct 2 16:06:56 2009), last vote was yes
dbcurrent=1, up=1 beaconSince=1
this shows that afsdb0 is the master server for the volume location database (port 7003, the ptserver is 7002). If we look at one of the slave servers, we get:
udebug afsdb2 7003
Host's addresses are: 129.215.64.18
Host's 129.215.64.18 time is Fri Oct 2 16:09:02 2009
Local time is Fri Oct 2 16:09:03 2009 (time differential 1 secs)
Last yes vote for 16.64.215.129 was 6 secs ago (sync site);
Last vote started 6 secs ago (at Fri Oct 2 16:08:57 2009)
Local db version is 1254020518.81376
I am not sync site
Lowest host 129.215.64.16 was set 6 secs ago
Sync host 129.215.64.16 was set 6 secs ago
Sync site's db version is 1254020518.81376
0 locked pages, 0 of them for write
What to do if a database server is down and cannot easily be restored to service? In theory it shouldn't matter since the remaining two database servers should be quite capable of coping (in fact even a single server should be ok). Unfortunately there is a problem. When the AFS client starts up on a host, it selects one database server to communicate with. If that database server goes off-line, it should switch to using one of the remaining database servers but a bug in the AFS client means that this does not happen as quickly as it should. If roughly a third of the user base starts complaining that access to the file system has suddenly slowed dramatically, it's a fair bet that one of the database servers has failed. Rebooting a client should make it use a different database server. Alternatively, you can install a new database server by following the instructions at
AFSInstallingServer. Remember that the new server
MUST have the same IP address as the failed server.
- What should I do about file server related problems?
This very much depends on the nature of the problem. Once you have identified the server on which the problem volume resides, check the status of the server using the
bos status command. This should normally return something like:
bos status squonk
Instance fs, currently running normally.
Auxiliary status is: file server running.
If instead, the return from
bos status says that the file server is
Salvaging file system, it means that some sort of problem has been detected with the AFS file space on that machine and that the AFS salvager (which could perhaps be considered to be roughly the equivalent of
fsck for AFS file systems) is being run to correct any errors. None of the volumes served from this file server will be available until the salvage process has completed. You can keep track of the process of this salvage by checking the contents of the log files with names beginning SalvageLog in the AFS log file directory. Our file servers are configured to run five salvage processes in parallel and so there will be five of these files to check. Once the salvage has completed, see if this has fixed the problem.
If it hasn't, or if the return from
bos status reports that the file server is running normally, check the status of the AFS partition the volume resides in with the
vos listvol command:
vos listvol -s <
servername> -p <
partition name>
for example
vos listvol -s squonk -p vicepa
Total number of volumes on server squonk partition /vicepa: 264
backup.root 536871305 RW 3178 K On-line
backup.root.backup 536871307 BK 2251 K On-line
gdir.admin 536880872 RW 2 K On-line
gdir.admin.backup 536880874 BK 2 K On-line
.
.
.
user.v1screer 536872219 RW 26301 K On-line
user.v1screer.backup 536872221 BK 26301 K On-line
user.v1screer.readonly 536872220 RO 26301 K On-line
user.v1swils2 536873534 RW 4207473 K On-line
user.v1swils2.backup 536873536 BK 4207473 K On-line
user.v1swils2.readonly 536873535 RO 4207473 K On-line
Total volumes onLine 264 ; Total volumes offLine 0 ; Total busy 0
(you may have already carried this out as part of the determining where the problem lies). If the partition has any volumes off-line, they need to be brought on-line again before they can be accessed. This is done using the salvager which can be run by the user as well as by bosserver. A common recent cause of volumes going off-line has been AFS partitions being remounted read-only on the server due to fibre channel glitches. The AFS server the attempts to write to a volume on a read-only partition, finds it cannot do so and takes the volume off-line. Recent changed to the multipath daemon configuration hopefully means that this will not be a problem in the future but none the less, before attempting to salvage volumes on a partition, you should make sure that the partition is mounted read-write (AFS partitions are normal EXT2 or 3 partitions so you can use tools such as
fsck to reassure yourself that all is well with the partition).
There are three forms of the command to run the salvager manually
bos salvage -server <
server> -volume <
volume name>
bos salvage -server <
server> -partition <
partition name>
bos salvage -server <
server> -all
The first salvages an individual volume, the second a single partition and the third every AFS partition on the server. When salvaging one or more partitions, the file server is shut down until the salvage is complete and none of the volumes served from the server will be available. If only a single volume is being salvaged, the file server continues to run and access to the remaining volumes on the server will be uninterrupted. If the majority of volumes on the server are still available, it's preferable to salvage the off-line volumes individually rather than taking the file server off-line for all the users of that file server.
After salvaging has completed, the affected volumes should be back on-line and once more accessible. If you continue to see volumes going off-line on this server, there may be problems with the underlying storage and you may wish to consult the appropriate expert in the field.
The above assumes that only a few volumes on the server are affected but what if the server itself or the underlying storage device has failed? There is a procedure available for converting the read-only copy of a volume to the read-write version (this only makes sense with read-only volumes stored on a different server and underlying storage device of course) but this is not something to be undertaken lightly since undoing this action once the original hardware is once more available takes an extremely large amount of effort. On the other hand, users are likely to be extremely unhappy if their files are unavailable for more than a couple of days...
Consider the following:
vos listvldb user.cms
user.cms
RWrite: 536871009 ROnly: 536871010 Backup: 536871011
number of sites -> 3
server squonk.inf.ed.ac.uk partition /vicepa RW Site
server squonk.inf.ed.ac.uk partition /vicepa RO Site
server unicorn.inf.ed.ac.uk partition /vicepb RO Site
This shows that the read-write version of my home volume is on the file server squonk in the partition vicepa. As explained above, there is a read-only version of my volume in the same place. There is also a read-only copy of my home volume on the file server unicorn in partition vicepb. If squonk has suffered some kind of a meltdown and my home volume is likely to be unavailable for several days, I can turn the read-only copy of my home volume on unicorn into the read write version by running the command
vos convertROtoRW -server unicorn -partition vicepb -id user.cms
In my experience, the newly converted RW volume will be off-line once the command has completed and you will have to salvage the volume (see above) to bring it back on-line. There is now a script, written by Chris called promoteRO which can be used to promote individual volumes, all volumes on a given partition of a server or all volumes on a server. This script should be available on all AFS servers. You can get a list of all the volumes in a partition even if the file server is down using
vos listvldb -s <
server name> -p <
partition name >
To get a list of all the volumes on a server, omit the partition argument in the above command.
Once you have converted a read-only volume to read-write, the original read-write volume cannot be allowed to come on-line again or file system confusion will reign. Sorting out the after effects of running convertROtoRW is something that should only be undertaken by experts after much thoughtful consideration.
- What should I do about network related problems?
That alas is beyond the scope of this document. Try
fixing Network Things
--
CraigStrachan - 30/9/2013