Solutions to common AFS home directory problems

The following have been identified as being issues when moving users over to AFS home directories:

This page is intended to document in one place the solutions to these issues.

Running Cron jobs

There are actually two issues here. The first is that regardless of the job that is actually run, the latest version of Cron will attempt to access the user's home directory and will fail if this access fails (as will almost certainly be the case with an AFS home directory). There is a simple solution to this. Make the first line of the user's crontab file read something like

    HOME=/tmp
This will make cron access /tmp rather than the user's home directory before starting up the job, hopefully with more success.

The second issue is if the cronjob itself needs to access the user's home directory. The obvious solution in this case is to copy what cron needs to access to /tmp or some other world readable space but if this is not practicable, the cron job will have to be run wrapped in kstart. See the section on long running jobs for more information on this.

Update Feb 2010

To have cron jobs run and to be able to access AFS space, you need to use the AFS tips for long running jobs.

The basic principle is that on the machine you want to run the cron jobs:

  1. you setup your long running kerberos credentials.
  2. setup one cronjob that runs regularily, say every 1 hour, and renews your long running credentials.
  3. setup the cronjob that you want to run, but wrap it up in the krenew using the long running creds as specified in steps 1 and 2.

eg, assuming you've requested 30 day renewal credentials from support.

shell> kinit -r 30d -c /tmp/mylongcred
#crontab entries, on the machine you did the kinit.
# the one that keeps the creds renewed
10 * * * * krenew -k /tmp/mylongcred
# the actual job you want to run at some point
40 15 * * * krenew -k /tmp/mylongcred -t /home/user/job 

Note that if your job produces output on standard out that you want to capture and save in your AFS home directory, you can't just add >> /home/user/job.out to the end of the last line it won't work. You need to do the capturing within your script or have it redirect to some non-AFS space eg >> /tmp/job.out

Unusual home directory permissions

This issue will have to be tackled on an individual basis and may require helping the user set up new ACLs and creating new user groups. Remember that ACL changes are not recursive!

Use of Secure Shell public keys

The problem - in order to be able to authenticate an incoming SSH connection using public key authentication, the SSH server must be able to read the public portion of your public key. However, it is running as 'root', without any AFS tokens, and so cannot access your .ssh directory if it is not world readable. You can't simply make .ssh world readable, as it may also contain private key material which must be kept secret.

The solution is to create a sub directory of .ssh, say 'private', which is only readable by the user, copy any private files from .ssh into this directory, and symlink them from their original location. Then make '.ssh' publically readable, and the your homedirectory publically lookable. You need to be vigilant that no private files are created in .ssh in the future (say, whilst creating a new keypair).

In general, the files authorized_keys and any public key material (usually id_dsa.pub and id_rsa.pub) must be public. All other files in .ssh should be moved to the private directory.

Here's an example. Note that this only handles the 'common' private key files. You should ensure that all private keys have been moved to the private directory before giving access to your .ssh directory.

cd ~/.ssh
mkdir private
fs setacl -clear -dir private -acl system:administrators all $USER all
for A in identity id_dsa id_rsa config known_hosts; do [ -f $A ] && mv $A private/; ln -s private/$A; done
fs setacl ~/.ssh system:anyuser rl
fs setacl ~ system:anyuser l

Note that this will leave the top level of the user's home directory world listable. You should make sure that this is acceptable to them.

Access to public_html files

The problem - Though it is not the approved way of doing things these days, some users still have web content stored in a public_html file in their home directory. Since apache currently run without any AFS tokens, it will by default be unable to access these directories.

The 'correct' solution is to have apache run with its own AFS token and set the permissions of the public_html directory appropriately (of course the 'really correct' solution is to get the user to move their web content to hompages.inf or group.inf as appropriate). This is some way off however and so in the meantime a workaround must be used similar to that for secure shell public keys, that is, the public_html directory must be made world readable. It is possible (one reason could be that they have CGI scripts in the public_html directory which they would not like to be viewable) that the user may be unhappy with this. If this is the case, there is no solution to the problem at the moment and the user should not be moved across to AFS until the 'correct' solution outlined above becomes available.

Assuming that the user is happy for public_html to be world readable, the following procedure should be followed:

First make the user's home directory world listable

fs sa <homedir> system:anyuser l

As noted above, this makes the user's home directory world listable. Make sure they are aware of this.

You now need to change the permissions of the public_html directory and any underlying subdirectory (since ACL changes don't propagate downwards) to make the contents of the directory readable. In the user'homes home directory use something like

find public_html -noleaf -type d -exec fs sa {} system:anyuser read \;

This will get the user's web pages working again.

If the user is unhappy at having their home directory world listable, there is a workaround. After making the home directory world listable, create a directory within the home directory called private (or something equally apt) and move everything in the user's home directory other than the public_html and the .ssh (if they are using ssh public keys) directories into private. Then from the user's home directory run the command:

    fs sa private system:anyuser none

This will leave only the public_html and .ssh directories listable. The disadvantage of course is that the pathname of practically everything in the user's home directory will have changed.

Long running jobs

the approach taken with long running jobs depends on whether the job is expected to take less than, or longer than a month to complete.

Jobs less than a month long

Firstly, Users need to have their principals set to have a renewable lifetime before any of this will work. Do this with

/usr/kerberos/sbin/kadmin -q "modprinc -maxrenewlife 1month <principal>"

Do not give /admin principals renewable lifetimes.

then (from Simon), the user has to do the following steps:

Get credentials into a separate credentials cache by doing

# export KRB5CCNAME =/path/to/local/credentialscache

e.g. export KRB5CCNAME =/tmp/mykerbcred

# kinit -r30day

(you need to use a different ccache, as we destroy the one created on login when the user logs out)

Run the job with 'krenew '. See krenew manpage for details.

krenew -k /tmp/mykerbcred -t <jobname>

Jobs more than a month long
This can be done by creating a new kerberos identity, a keytab for that identity and a corresponding AFS id for the user who wishes to run the long job (the Kerberos id will normally be called something like < USERNAME >/longjob and the AFS id < USERNAME >.longjob. The user copies the keytab to the machine he wishes to run the job on and runs the job using the longjob script with the --indefinite flag (they will also need to use the -k flag to specify the location of the key tab). The user must make sure that that their filespace is accessible by the new AFS identity using the fs setacl command (see http://computing.help.inf.ed.ac.uk/afs-top-ten-tips#Tip03 for details). The procedure for creating the new Kerberos identity and the corresponding AFS ID is at AFSKerberosAdditionalIdentities.

Running jobs via condor

Here's what Simon wrote in an email:

standard universe jobs aren't a problem, providing the condor_shadow process, running on the machine from which the job
was submitted, always has sufficient permissions. The user can achieve this by using krestart or krenew as appropriate,
in the same way as for a long running job.

vanilla universe jobs are harder, as they use local IO. To get round this, we submit a credentials cache with the job,
and try using krenew to keep that credentials cache current.


Create a job directory, and populate it.
  mkdir -op /tmp/jobs/job1
  cd /tmp/jobs/job1

  ln -s `which mysim` mysim
  ln -s `which krenew` krenew

Create a credentials cache to run the job as:
  KRB5CCNAME=/tmp/jobs/job1/credentials.krb5 kinit -r1month <user>

And create a Condor submit file that looks something like:

  universe = vanilla
  executable = krenew
  arguments = -t ./mysim
  output = output.txt
  error = error.txt
  log = log.txt
  environment=KRB5CCNAME=FILE:./credentials.krb5
  transfer_input_files = credentials.krb5, mysim
  transfer_files = on_exit
  queue

Number of Files in a Directory Limit

There is a limit in AFS to the number of files you can have in a directory. The limit is around 64,000 files, but if the file names of those files are 16 characters or longer, then that number will be reduced. For example you could only have about 32,000 files in a directory if all the filenames were 40 characters in length. 48 characters and above and the limit further reduces to 21,000 files. See this post to the OpenAFS list https://lists.openafs.org/pipermail/openafs-info/2002-September/005812.html

There's no real solution, other than creating sub directories to split the contents up, or if your file names are longer than 15 characters, then rename them to something less than that.

Topic revision: r15 - 02 Nov 2016 - 15:24:20 - CraigStrachan
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies