This document attempts to bring together links to all our Cosign and iFriend docs, along with a quick troubleshooting section.

Existing Documentation

Troubleshooting and Items of Note

Cosign server issues

weblogin connection problems

If users can't connect to one or both cosign server(s), then restarting cosign and apache on both kaplan and hubley - om cosign restart && om apacheconf restart - is a brute force approach, but will often fix things. The cosign servers and particularly apache can be sensitive when DNS changes are involved (e.g. if the weblogin.inf.ed.ac.uk alias has been changing (e.g. during server upgrades).

configuration problems

The cosign daemon is largely configured via /etc/cosign.conf, which is generated by the component. This file (on both servers) should contain a line for each service that will authenticate to cosign. This part of the file is generated from an LCFG spanning map. Unfortunately, due to LCFG limitations, the cosign servers are sometimes not notified of changes to this map. If the client profile has compiled correctly and published the contents of cosign.services to the spanning map but it's not showing up cosign.conf, try making trivial changes to both kaplan and hubley's profiles.

If it's not a spanning map issue, check the cosign component log (/var/lcfg/log/cosign). The component runs some validation checks on the information it gets from the spanning map - if it doesn't like what it sees, it will exclude that service and log details accordingly.

other problems

Clues to almost all problems will be found in the cosign and/or apache logs....

Logging information

  • cosignd and monster both log to /var/lcfg/log/syslog
  • The cosign component log - /var/lcfg/log/cosign
  • The web interface part of the cosign servers logs to /var/log/httpd/error_log and /var/log/httpd/access_log

Disaster Recovery

Note that, as the two cosign servers are located at IF and AT, if an outage affects the entire central area, we would lose our entire cosign provision. To move the cosign service to an entirely new machine, take the following steps:

  • Bring up weblogin3 or weblogin4 interface on the new machine with correct IP
  • Copy the the relevant header(s) from one of the existing cosign servers - this is likely to consist of just dice/options/apacheconf-2.h and dice/options/cosign-server.h
  • Copy across the keytab used by cosign.httpd_keytab. It may be necessary to manually alter this as described in http://www.dice.inf.ed.ac.uk/units/infrastructure/Documentation/cosign-overview.html
  • Consider restoring the ifriend service as described in the iFriend documentation. Note that we are running a (non-advertised) ifriend KDC on hibbert, a VM at KB, purely to aid with disaster recovery.

If the disaster affects only one server, remove it from the weblogin alias and restart the apacheconf and cosign components on the remaining server.

Cosign client issues

If a client is not behaving correctly, firstly check the configuration:

  • check its cosign.services resource is valid (see lcfg-cosign(8))
  • check the server-side configuration (as detailed above) both for spanning map problems and validation issues
  • check the client-side apache configuration (e.g. start at httpd.conf and follow up all cosign related includes)

If this seems correct, you'll need to delve into log files to see exactly what the issue is (e.g. is it a cosign issue, or an apache issue?) See the "Logging information" section above for details on which log files to investigate on the server. Client-side files are equally important - the cosign apache module logs to the apache error log. The standard location for this on DICE is /var/log/httpd/error_log - this is changeable via apache configuration so it can (and frequently will) be different.

Some client-side issues which are worth bearing in mind:

  • check the ssl certificates used, check permissions on them (apache needs to be able to read them), check they match the (virtual) hostname
    • openssl x509 -in </path/to/cert> -text -noout is very useful
  • if your service needs kerberos tickets, check that it's configured correctly both in /etc/cosign.conf on the servers and in the local apache configuration
  • check the validation string in the cosign configuration
  • check that any non-cosign redirects aren't meddling with access to /cosign/valid
  • (redirection loops are often a symptom of problems accessing the validation handler)

iFriend issues

Firstly, remember that iFriend is backed with a kerberos KDC, so the kerberos logs (/var/lcfg/log/krb5kdc and /var/lcfg/log/kadmind) on kaplan may be informative.

The iFriend web frontend consist of a series of php scripts running on kaplan - these log to the standard apache error_log (var/log/httpd/error_log).

iFriend performs some validation on the email address provided, which has given problems in the past.

If a site doesn't work for an ifriend, but does work for a regular DICE user, check that CosignRequireFactor INF.ED.AC.UK hasn't been set.

-- TobyBlake - 12 Mar 2019

Topic revision: r9 - 30 Apr 2019 - 10:53:55 - TobyBlake
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies