Final project report for "Suspend accounts that have had no recent KDC activity" (351)

Project Description

Add the capability to automatically suspend accounts that have had no KDC activity for more than 'n' months.

Project page

Project updates

Design/Implementation

The design and implementation of this project was an iterative process.

More than 99% of all traffic goes to our master KDC (this is largely due to the way our DNS SRV records are configured). For this reason, we decided to extract authentication only from the master KDC. This is simpler, as Prometheus talks to the master KDC through the kadmin interface and kadmind does not run on the slaves. If we were to decide that we need to accumulate data from the slaves (e.g. if we change our DNS configuration so slaves are used more), then we would need to find another way to gather this data (e.g. regular dump and synchronise).

We collect data from the following KDC fields (kadmin attribute names in parentheses):

  • Last password change (last_pwd_change)
  • Last successful authentication (last_success)
  • Last failed authentication (last_failed)
It is very important that this data is suitably protected by the ACLs in prometheus. It should only be visible to authenticated sysadmins (currently, /admin principals, although this is likely to change as a result of the two-factor authentication project).

A new conduit - KDCAuthStats - synchronises the authentication data from the KDC to Prometheus. Last successful and failed authentication values are only written to Prometheus if they are greater than the value already held.

It should be noted that the master KDC, and any combination of master and slave KDCs, do not (necessarily) hold accurate authentication data for any user. The last_success and last_failed attributes are not replicated from master to slaves. A new master is usually put into service by promoting a slave, and will therefore not inherit authentication data.

Accounts are "suspended" using the existing disableAccount flag. We considered a separate flag for this purpose, and indeed other methods, but this approach was deemed the most appropriate, mainly due to the the desire to avoid unnecessary complexity and it already being supported in Prometheus code and tools. Setting the disableAccount flag on an identity leads to the corresponding principal having the DISALLOW_ALL_TIX attribute set on the KDC.

The existing LifecycleProcessing conduit was expanded to include the inactivity suspension logic. This seemed a good fit, as it already deals with disabling accounts. Writing a new conduit would increase complexity and the possibility that differing conduits would work against each other. Having all such logic in one place is significantly more maintainable.

Accounts are suspended after 180 days (~6 months) of inactivity. This was reduced to 90 days at CEG, but subsequently changed back to 180 days following user feedback.

On suspension of an account, the user is notified by email after the suspension has happened.

Re-enabling is done via a simple command-line tool:

theogony enable-account --trigger <user>

The LifecycleProcessing conduit generates a warning if the latest successful authentication date is more than a day ago.

New accounts need to be handled differently to existing accounts - an account that has had an initial password set (either via the password portal or via a printed password letter), but not yet authenticated, would appear as having never authenticated. For this reason we decided to not consider new accounts for inactivity suspension until 30 days after their password had been changed.

We later extended this policy so that all accounts are now not considered for inactivity suspension if their password has been changed in the last 30 days.

The password portal text (on setting a password) and the printed password letter were both changed to reflect the new policy.

At the time of writing, there are 311 accounts suspended for inactivity (approximate percentages per category: staff: 10%, student: 67%, visitor: 22%, visiting-student: 1%).

Documentation

Effort Spent

The total time spent on this project was 240 hours (~34 days).

Issues/Future Considerations

There is an edge case where accounts have been disabled for a long time, haven't authenticated for >180 days, and are subsequently re-enabled. Specifically, authentication statistics are gathered at 10:30 every week day and the LifecycleProcessing conduit runs at 10:45. If a recently re-enabled account doesn't authenticate by 10:30, it will likely be disabled once more by the next run of the conduit. There is a one-day back off if the account was disabled due to inactivity (see documentation above) and there is a 30 day window if the password was changed, but an account disabled through normal lifecycle with no recent password change has no such protection. This hasn't proved to be an issue so far, but should be monitored. Any solution would add complexity to an already complex situation. It is worth noting that the Account Tidying project should greatly reduce the number of long-dormant accounts.

We should consider disabling the running of the LifecycleProcessing conduit on exam days, depending on the time of the exam.


-- TobyBlake - 24 May 2018

Topic revision: r3 - 07 Jun 2018 - 16:16:15 - TobyBlake
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies