TWiki
>
DICE Web
>
AFSTopFive
(25 Apr 2019,
CraigStrachan
)
(raw view)
E
dit
A
ttach
<!-- This is a comment, in the HTML sense of the word, but Wiki directives still work, so you could put access control statements here to restrict access to your new topic, and they won't show up in the published page. Please think carefully when creating a new topic, and use a GoodWikiName. Remember that by default all Wiki content is world readable and editable unless you've taken steps to limit it. --> ---+ AFS Top 5 Questions - last updated 26/2/2019 Before we begin, some background information on our AFS service. For general AFS background, see the [[http://www.inf.ed.ac.uk/systems/AFS/][Informatics user information]], especially the [[http://docs.openafs.org/][AFS online documentation]] (though beware that some of this is in need of updating). The [[http://www.inf.ed.ac.uk/systems/AFS/OpenAFSRefMan/index.html][Updated local copy of the AFS reference manual]] is also very useful. %INCLUDE{AFSCurrentHardware}% ---++++Partitions Each partition on a file service contains only one class of data (user space, group space etc) and is either a read-write partition (that is it contains read-write volumes) or a mirror partition (only contains read-only volumes). there is, by and large, a one to one mapping between read-write and mirror partitions with each volume in a given read-write partition being mirrored to the same partition on the mirror server though there are exceptions. Note that some partitions are not mirrored. A full list of partitions and their purpose can be found at [[ AFSPartitions ]] There are two types of RW partitions, those on RAID 5 and RAID 10 partitions on the commodity disk arrays and those on the internal server disks configured as RAID 10. The latter is considered to be faster and so home volumes form the majority of the data on these partitions. Most partitions are 250GB in size. Some are 500GB. To get a list of the partitions on an AFS server, use the command <literal> vos listpart <<i>servername</i>> </literal> To find out the size of the partitions, run =df= on the server. ---++++Volumes Volumes have names of the form <literal> <<i>type</i>>.<<i>name</i>> </literal> Volumes used by the package management system have a more complicated naming scheme beyond the scope of this document. Possible types are: | *Type* | *Purpose* | *Default location in AFS file tree* | | user | User data, ie home directories | /afs/inf.ed.ac.uk/user | | group | group data | /afs/inf.ed.ac.uk/group | | project | project data | /afs/inf.ed.ac.uk/project | | src | packages | | | bin | packages | | | udir | file system infrastructure | | | gdir | file system infrastructure | | | pkgdir | file system infrastructure | | | root | special - see AFS documentation | | so user.fred would contain the home directory of the user fred and group.killbots would contain the data owned by the killbots research group. Volumes are mounted at particular points in the file tree under /afs and may contain files, directories and further volume mount points. Consider the following diagram showing where volumes are mounted in a typical AFS pathname: <verbatim> /afs/inf.ed.ac.uk/user/f/fred | | | | | root.afs | | | | root.cell | | | udir | | udir.f | user.fred </verbatim> To determine which volume any part of the AFS file system is located in, use the command <literal> fs examine <pathname> </literal> for example: <verbatim> fs examine /afs/inf.ed.ac.uk/user/c/cms File /afs/inf.ed.ac.uk/user/c/cms (536871009.1.1) contained in volume 536871009 Volume status for vid = 536871009 named user.cms Current disk quota is 8000000 Current blocks used are 3622170 The partition has 36623421 blocks available out of 250916820 </verbatim> So not unreasonably, my home directory is in the volume user.cms. Because of limitations in the number of objects a volume can contain, /afs/inf.ed.ac.uk/user is further divided into a subdirectory for each letter of the alphabet and academic year (s08, s09 etc). We don't do this for the group and project subtrees as we anticipate that the number of group and project volumes won't ever approach this limit. Most volumes are mirrored to a server on a different site overnight. This is done as part of the preparations for the nightly !TiBS backup run and is controlled by a script run on the !TiBS backup server (currently Pergamon). The same script also updates the backup version of each volume (this is the volume which is mounted under Yesterday in user's home directories and gives access to the previous day's version of their file space. Everyone on the AFS pandemic team should receive the output from this release script. It's normal for a few volumes with names ending in .restore or .duff to fail to release but any normal volume failing to release is a cause for concern and should be investigated. To see which servers a volume is mounted on, use the _vos listvldb_ command <verbatim> brunel[~] vos listvldb user.cms user.cms RWrite: 536871009 ROnly: 536871010 Backup: 536871011 number of sites -> 3 server cetus.inf.ed.ac.uk partition /vicepb RW Site server cetus.inf.ed.ac.uk partition /vicepb RO Site server kelpie.inf.ed.ac.uk partition /vicepq RO Site </verbatim> This tells you that the read-write version of my home directory is served from cetus/vicepb and the read-only mirror is on kelpie/vicepq. Note that there's also a read-only copy of the volume on the read-write partition. Though it isn't shown here, the backup version of the volume also resides on the read-write partition. ---++++AFS commands There are numerous AFS commands split into several command suites according to their functions. Fortunately, the documentation for these comnmands is for once copious and (relatively) well written. You can get an overview of all AFS commands by looking at the _afs_ manual page. There are individual manual pages for each of the command suites and manual pages for each command within the command suite. To get a list of the commands within a suite, run the command: <literal> <<i>command suite name</i>> help </literal> and to see the manual page for an individual command: <literal> man <<i>command suite name</i>>_<<i>command name</i>> </literal> ---++++Becoming an admin user Some of the commands detailed below require that you have obtained an AFS token based on your kerberos admin principal. The most convenient way fo doing this is to include the following in your _.brc_ file <verbatim> alias asu='pagsh -c "export KRB5CCNAME=$KRB5CCNAME.asu \\ && kinit $USER/admin \\ && aklog \\ && PS1=[\\\\h]\\\\u/admin: PS2=[\\\\h]\\\\u/admin.. /bin/bash --norc \\ && kdestroy"' </verbatim> You can then simply type _asu_ and give your admin password to obtain the necessary token. You can check which tokens you have using the _tokens_ command. Should you not be able to access your _.brc_ file for any reason, the following sequence of commands will also allow you to obtain an admin token: <pre> kinit <<i>username</i>>/admin aklog </pre> ---++++Log files On a server, the log files for the various AFS processes can be found in _/usr/afs/logs_. ---++++The Good Stuff Enough of the background, what can go wrong with the AFS service? There are, generally speaking, 5 areas in which problems can occur: * The file server * The database server * The cache manager * The network * The user How to determine where the fault lies? Let us suppose that a user is complaining that they are having trouble accessing part or all of their home directory. I suggest the following course of action: * Obtain admin credentials on the same machine is the user is having problems on. Try to access the user's home directory. If you succeed and all seems well, then the problem is user related. If not * Obtain admin credentials on a different machine and try to access the user's home directory. If you succeed and all seems well, the the problem lies with the cache manager on the original machine. You can confirm this by getting the user to try accessing their home directory on the new machine. If you are still having no luck accessing the users files: * run the command <literal><i>vos listvldb user.<username></i> </literal> You should get the following output <verbatim> vos listvldb user.cms user.cms RWrite: 536871009 ROnly: 536871010 Backup: 536871011 number of sites -> 3 server squonk.inf.ed.ac.uk partition /vicepa RW Site server squonk.inf.ed.ac.uk partition /vicepa RO Site server unicorn.inf.ed.ac.uk partition /vicepb RO Site </verbatim> If this succeeds, it tells you two things: That the volume location database is working correctly and the name of the server the user's read-write volume resides on. If you do not get output similar to this, then the problem is in some way associated with the database servers. If the DB servers seem fine: * Run the command <literal> vos listvol -s <servername> -p <partition> </literal> with _servername_ and _partition_coming from the output from the above command. You should get back something like: <verbatim> vos listvol -s squonk -p vicepa Total number of volumes on server squonk partition /vicepa: 264 backup.root 536871305 RW 3178 K On-line backup.root.backup 536871307 BK 2251 K On-line gdir.admin 536880872 RW 2 K On-line gdir.admin.backup 536880874 BK 2 K On-line . . . user.v1screer 536872219 RW 26301 K On-line user.v1screer.backup 536872221 BK 26301 K On-line user.v1screer.readonly 536872220 RO 26301 K On-line user.v1swils2 536873534 RW 4207473 K On-line user.v1swils2.backup 536873536 BK 4207473 K On-line user.v1swils2.readonly 536873535 RO 4207473 K On-line Total volumes onLine 264 ; Total volumes offLine 0 ; Total busy 0 </verbatim> Pay close attention to the last line of the output. All volumes should be online. Any offline problems indicate a file server problem as does a failure of the command to return (though don't panic too soon as _vos listvol_ may take some time to run. If you've been waiting 5 minutes, there's a problem.) It may be possible using common-sense to take some shortcuts in the above. If you receive reports from 20 different users that they cannot access their home directories and all their home directories reside on the same server, then it's probably a fairly safe bet that the problem lies with the files server. On the other hand, 20 complaining users with home directories on different servers suggest database or network related problems. Finally, we come to the ---+++Top 5(ish) AFS questions ---++++ What should I do when a volume doesn't release? If the daily report on volume releases seems to indicate that some volumes which you would normally expect to release successfully have failed to release, you should investigate. The following only applies to the case when a relatively small amount of volumes have failed to release. If a large number of volumes fail to release, it is almost certainly a problem with one or more file or database servers and you should consult the "What should I do about database server related problems?" and "What should I do about file server related problems?" sections below. First check that the volume which failed to release actually has read-only volumes associated with it. It sometimes happens that new user or group volumes are created without the associated read-only volumes being created. Run the command <verbatim> vos examine <volume name> </verbatim> The output should look something like: <verbatim> vos examine user.cms user.cms 536871009 RW 3619758 K On-line squonk.inf.ed.ac.uk /vicepa RWrite 536871009 ROnly 536871010 Backup 536871011 MaxQuota 8000000 K Creation Tue Jul 7 21:46:50 2009 Copy Wed Jul 8 18:11:13 2009 Backup Wed Oct 7 06:44:13 2009 Last Update Wed Oct 7 14:49:26 2009 16246 accesses in the past day (i.e., vnode references) RWrite: 536871009 ROnly: 536871010 Backup: 536871011 number of sites -> 3 server squonk.inf.ed.ac.uk partition /vicepa RW Site server squonk.inf.ed.ac.uk partition /vicepa RO Site server unicorn.inf.ed.ac.uk partition /vicepb RO Site </verbatim> If this listing after _number of sites_ doesn't contain any read-only volumes, then something is clearly wrong since all volumes should have at least one read-only copy on the same server and partition as the read-write. You can add this with: <verbatim> vos addsite -server <server name> -partition <partition name> -id <volume name> </verbatim> Then retry the release with <verbatim> vos release <volume name> </verbatim> Most, though not all volumes will also have a Read-only volume on a remote site (Kings Buildings for volumes on Forum and Appleton Tower servers, the Forum and Tower are regarded as not being sufficiently distant from each other for DR purposes). The exception to this rule is for group volumes where the group has decided that a separate backup copy of their data is not required. RW partitions are divided into those which are replicated remotely and those which are not, all volumes on a given partition will share a mirror partition on the remote site. If a volume should have a remote RO copy, you should create this as well. From the above, it will be seen that the simplest way to determine where a volume's remote RO should go is to examine one of the other volumes in the same partition and see where its RO is located. The full list of steps would be: 1. Determine the server and partition the problem volume resides on with <verbatim> vos examine <volume name> </verbatim> 1. get the name of the other volumes on this server and partition with: <verbatim> vos listvldb -s <server name> -p <partition> </verbatim> 1. Finally, find out where the remote ROs of the other volumes on the same partition using the same _vos examine_ command as above (but substituting the name of another volume on the same partition obviously). Once you've found where the volume should go, create it using the _vos addsite_ command as above: <verbatim> vos addsite -server <server name> -partition <partition name> -id <volume name> </verbatim> and release the volume again. There is one set of circumstances where this won't work. Some volumes are so large that they have a partition to themselves. If this is the case, consult [[AFSPartitions]] (this may actually be the easier option in all cases) which, as mentioned above, contains information about all the AFS partitions, including the location of any off-site mirror partition. Another cause of a release failing is that the volume may be locked as part of some other operation on it. Volumes are normally only locked for a few seconds or so but can remain locked if the operation on the volume is interrupted prematurely, for example by a server crashing. If the volume is locked, the last line of output from the _vos examine_ command will be _Volume is locked_. To determine whether it is safe to unlock the volume, run the following command substituting the name of the server the read-write volume resides on for _server name_: <verbatim> vos status <server name> </verbatim> If the output from this command is <verbatim> No active transactions on <server name> </verbatim> it's safe to unlock the volume with the command: <verbatim> vos unlock <volume name> </verbatim> Then retry the release. If the output from the _vos status_ command indicated that the server is busy and this continues for more than a few minutes, this is probably an indication of problems on the file server. If the output from the _vos examine_ command returns an error and only prints out the second part of the example output from the line <verbatim> RWrite: 536871009 ROnly: 536871010 Backup: 536871011 </verbatim> this is an indication that the volume is currently off-line. Follow the instructions below for bringing the volume back on-line and retry the release. Finally, if none of the above have provided an answer and the volume still fails to release, rerun the release command with _-verbose_ flag: <verbatim> vos release <volume name> -verbose </verbatim> This will produce a large amount of output which may provide some insight into what is going wrong. ---++++ What should I do about user related problems? Have the user run renc to obtain a new AFS token. If that doesn't solve the problem, view the ACLs for the user's home directory (you will need to _asu_ to do this) using the command <literal> <i> fs la ~<username> </i> </literal> and check that a line like <literal> <i> <username> rlidwka </i> </literal> appears in the output. If it doesn't restore the user's access to their home directory with the command <literal> fs sa ~<username> <username> all </literal> You may also need to check the permissions of the subdirectories in the user's home directory (remember ACLs only apply to directories in AFS and changes aren't recursive). ---++++ What should I do about cache manager related problems? There are three commands available for resetting the cache, _fs flush_ which flushs individual files or directories from the cache, _fs flushvolume_ which flushes all entries associated with a given volume and _fs flushmount_ which flushes all information associated with a mountpoint. Pragmatically, you may find it more expedient to simply reboot the affected machine if possible. ---++++ What should I do about database server related problems? First ascertain whether the problem lies with an individual database server or all three. If all three are affected, the problems are more likely to be network related. Check that you can ping all three of the machines. Use the command <literal> <i> bos status <servername> </i> </literal> to check which AFS processes the server is running. The output should look like: <verbatim> bos status afsdb0 Instance ptserver, currently running normally. Instance vlserver, currently running normally. </verbatim> The ptserver is the protection database server which deals with controlling access to data. The vlserver is the volume location database server which keeps track of where volumes are located. The buserver is the backup server and will not normally concern us. If one or more of these processes doesn't seem to be running, reboot the server and see if that fixes the problem. You can obtain further information about what the database servers are doing using the _udebug_ command. for example: <verbatim> udebug afsdb0 7003 Host's addresses are: 129.215.64.16 Host's 129.215.64.16 time is Fri Oct 2 16:07:09 2009 Local time is Fri Oct 2 16:07:09 2009 (time differential 0 secs) Last yes vote for 16.64.215.129 was 13 secs ago (sync site); Last vote started 13 secs ago (at Fri Oct 2 16:06:56 2009) Local db version is 1254020518.81376 I am sync site until 47 secs from now (at Fri Oct 2 16:07:56 2009) (3 servers) Recovery state 1f Sync site's db version is 1254020518.81376 0 locked pages, 0 of them for write Last time a new db version was labelled was: 475511 secs ago (at Sun Sep 27 04:01:58 2009) Server (129.215.64.18): (db 1254020518.81376) last vote rcvd 13 secs ago (at Fri Oct 2 16:06:56 2009), last beacon sent 13 secs ago (at Fri Oct 2 16:06:56 2009), last vote was yes dbcurrent=1, up=1 beaconSince=1 Server (129.215.64.17): (db 1254020518.81376) last vote rcvd 13 secs ago (at Fri Oct 2 16:06:56 2009), last beacon sent 13 secs ago (at Fri Oct 2 16:06:56 2009), last vote was yes dbcurrent=1, up=1 beaconSince=1 </verbatim> this amount of output, and in particular the line beginning _I am sync site_ shows that afsdb0 is the master server for the volume location database (port 7003, the ptserver is 7002). If we look at one of the slave servers, we get: <verbatim> udebug afsdb2 7003 Host's addresses are: 129.215.64.18 Host's 129.215.64.18 time is Fri Oct 2 16:09:02 2009 Local time is Fri Oct 2 16:09:03 2009 (time differential 1 secs) Last yes vote for 16.64.215.129 was 6 secs ago (sync site); Last vote started 6 secs ago (at Fri Oct 2 16:08:57 2009) Local db version is 1254020518.81376 I am not sync site Lowest host 129.215.64.16 was set 6 secs ago Sync host 129.215.64.16 was set 6 secs ago Sync site's db version is 1254020518.81376 0 locked pages, 0 of them for write </verbatim> What to do if a database server is down and cannot easily be restored to service? In theory it shouldn't matter since the remaining two database servers should be quite capable of coping (in fact even a single server should be ok). Unfortunately there is a problem. When the AFS client starts up on a host, it selects one database server to communicate with. If that database server goes off-line, it should switch to using one of the remaining database servers but a bug in the AFS client means that this does not happen as quickly as it should. If roughly a third of the user base starts complaining that access to the file system has suddenly slowed dramatically, it's a fair bet that one of the database servers has failed. Rebooting a client should make it use a different database server. Alternatively, you can install a new database server by following the instructions at [[AFSInstallingServer][AFSInstallingServer]]. Remember that the new server *MUST* have the same IP address as the failed server. ---++++ What should I do about file server related problems? This very much depends on the nature of the problem. Once you have identified the server on which the problem volume resides, check the status of the server using the _bos status_ command. This should normally return something like: <verbatim> bos status squonk Instance fs, currently running normally. Auxiliary status is: file server running. </verbatim> If instead, the return from _bos status_ says that the file server is _Salvaging file system_, it means that some sort of problem has been detected with the AFS file space on that machine and that the AFS salvager (which could perhaps be considered to be roughly the equivalent of _fsck_ for AFS file systems) is being run to correct any errors. None of the volumes served from this file server will be available until the salvage process has completed. You can keep track of the process of this salvage by checking the contents of the log files with names beginning !SalvageLog in the AFS log file directory _/usr/afs/logs_. Our file servers are configured to run five salvage processes in parallel and so there will be five of these files to check. Once the salvage has completed, see if this has fixed the problem. The length of time the salvage will take depends on the number of volumes and the size of these volumes in the partition. It can take several hours for the salvage of a large partition with many files to complete. Monitor the salvage log to follow progress. If it hasn't, or if the return from _bos status_ reports that the file server is running normally, check the status of the AFS partition the volume resides in with the _vos listvol_ command: <literal> vos listvol -s <<i>servername</i>> -p <<i>partition name</i>> </literal> for example <verbatim> vos listvol -s squonk -p vicepa Total number of volumes on server squonk partition /vicepa: 264 backup.root 536871305 RW 3178 K On-line backup.root.backup 536871307 BK 2251 K On-line gdir.admin 536880872 RW 2 K On-line gdir.admin.backup 536880874 BK 2 K On-line . . . user.v1screer 536872219 RW 26301 K On-line user.v1screer.backup 536872221 BK 26301 K On-line user.v1screer.readonly 536872220 RO 26301 K On-line user.v1swils2 536873534 RW 4207473 K On-line user.v1swils2.backup 536873536 BK 4207473 K On-line user.v1swils2.readonly 536873535 RO 4207473 K On-line Total volumes onLine 264 ; Total volumes offLine 0 ; Total busy 0 </verbatim> We now run the Demand Attached File Server (DAFS ) on all our fileservers. In theory, this should mean that you will never need to run the salvager since the filesystem should do this automatically when issues such as a volume not being attached or being corrupted are detected. It is possible however that a volume may become corrupt without the file system detecting it. If this happens, you should follow this procedure: There are three forms of the command to run the salvager manually <literal> bos salvage -server <<i>server</i>> -volume <<i>volume name</i>> </literal> <literal> bos salvage -server <<i>server</i>> -partition <<i>partition name</i>> </literal> <literal> bos salvage -server <<i>server</i>> -all </literal> The first salvages an individual volume, the second a single partition and the third every AFS partition on the server. When salvaging one or more partitions, the file server is shut down until the salvage is complete and none of the volumes served from the server will be available. If only a single volume is being salvaged, the file server continues to run and access to the remaining volumes on the server will be uninterrupted. If the majority of volumes on the server are still available, it's preferable to salvage the off-line volumes individually rather than taking the file server off-line for all the users of that file server. After salvaging has completed, the affected volumes should be back on-line and once more accessible. If you continue to see volumes going off-line on this server, there may be problems with the underlying storage and you may wish to consult the appropriate expert in the field. The above assumes that only a few volumes on the server are affected but what if the server itself or the underlying storage device has failed? There is a procedure available for converting the read-only copy of a volume to the read-write version (this only makes sense with read-only volumes stored on a different server and underlying storage device of course) but this is not something to be undertaken lightly since undoing this action once the original hardware is once more available takes an extremely large amount of effort. On the other hand, users are likely to be extremely unhappy if their files are unavailable for more than a couple of days... Consider the following: <verbatim> vos listvldb user.cms user.cms RWrite: 536871009 ROnly: 536871010 Backup: 536871011 number of sites -> 3 server squonk.inf.ed.ac.uk partition /vicepa RW Site server squonk.inf.ed.ac.uk partition /vicepa RO Site server unicorn.inf.ed.ac.uk partition /vicepb RO Site </verbatim> This shows that the read-write version of my home volume is on the file server squonk in the partition vicepa. As explained above, there is a read-only version of my volume in the same place. There is also a read-only copy of my home volume on the file server unicorn in partition vicepb. If squonk has suffered some kind of a meltdown and my home volume is likely to be unavailable for several days, I can turn the read-only copy of my home volume on unicorn into the read write version by running the command <verbatim> vos convertROtoRW -server unicorn -partition vicepb -id user.cms </verbatim> In my experience, the newly converted RW volume will be off-line once the command has completed and you will have to salvage the volume (see above) to bring it back on-line. You can get a list of all the volumes in a partition even if the file server is down using <literal> vos listvldb -s <<i>server name</i>> -p <<i>partition name </i>> </literal> To get a list of all the volumes on a server, omit the partition argument in the above command. Once you have converted a read-only volume to read-write, the original read-write volume cannot be allowed to come on-line again or file system confusion will reign. Sorting out the after effects of running convertROtoRW is something that should only be undertaken by experts after much thoughtful consideration. ---++++ What should I do about network related problems? That alas is beyond the scope of this document. Try [[http://www.dice.inf.ed.ac.uk/units/infrastructure/Documentation/Network/FixingThings.html][fixing Network Things]] -- Main.CraigStrachan - 26/3/2019
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r17
<
r16
<
r15
<
r14
<
r13
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r17 - 25 Apr 2019 - 10:40:40 -
CraigStrachan
DICE
DICE Web
DICE Wiki Home
Changes
Index
Search
Meetings
CEG
Operational
Computing Projects
Technical Discussion
Units
Infrastructure
Managed Platform
Research & Teaching
Services
User Support
Other
Service Catalogue
Platform upgrades
Procurement
Historical interest
Emergencies
Critical shutdown
Where's my software?
Pandemic planning
This is
WebLeftBar
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback
This Wiki uses
Cookies