Account Tidying: AFS/homepages/groupspace

Project description

Details of this project can be found here: https://computing.projects.inf.ed.ac.uk/478/

Related to: https://computing.projects.inf.ed.ac.uk/finished/#349

From project 349:

For accounts we want to delete completely, we need to consider the following:

Files owned by that user outwith home directories - e.g. group space, home pages. If there are other places, we should note them explicitly. ACLs including these users.

Initial thoughts

AFS/NFS Space

The nature of group space means that, when a user leaves you can't just delete the group space they requested/own, as others may still be using it. For this reason the deletion of group space can't be fully automated, but we can try to make a decision based on information we hold.

An AFS group space audit was carried out towards the end of last year, so this data could be used to figure out which AFS group spaced we might want to delete when users leave. One of the first things to consider is how do we keep this information accurate and up-to-date?

There are around 850 group space volumes, and we have a record of who the "owner" of each volume is, so the obvious thing to do would be to periodically (annually?) email the owner, asking them to check that the information is correct, and notify us if the space can be deleted/archived, or any of the information held is inaccurate.

It would be nice to automate this process.

My first thoughts are that sharepoint could facilitate this. The AFS audit, currently held in a Excel spreadsheet, can easily be made into a sharepoint list. From there a flow could be created that would annually email the "owner", with a link to the list. When they visit the list page they would be presented only with the rows containing groups owned by them, and they would have permission to update specific fields- whether the information is correct, whether the space is still needed, and if not whether the data should be archived or deleted. A form could be created using powerapps to make this easier whoever is making the changes.

When it comes to creating new AFS space, at the moment all tickets are passed to the Services Unit and we create the space and update the spreadsheet with the details. This is partly because the audit was ongoing, and because we were bringing a lot of new AFS space online, so it wasn't obvious what AFS partitions were safe to use. Going forward AFS group space creation could be passed back to fontline support, and new volumes could be added to the sharepoint list when they're created.

sweb and NFS space could also be added to this list.

Things to consider- Is sharepoint capable of taking in data from outside of sharepoint e.g a list of accounts that are about to be deleted so that the owner can be notified?

As things stand, there are a lot of AFS groups that don't have an active account associated with them. At what point should the data be deleted? Do we ever archive group space and if so, what would the process be for this?

Hompages

Previously (circa 2013) homepages have been archived in /afs/inf.ed.ac.uk/group/support/archiving/, more recently we've used a script to automatically delete the homepages associated with old accounts that were empty, but any that contained content need to be manually deleted. It seems likely that in the future the existing script will be used to simply delete all old homepages automatically.

ACLs

With regard to removing a user from ACLs, we have a script that trawls AFS group space and collects ACL info, but it's run manually only when needed, and takes a long time to complete. A better option might be to run the command volscan (/usr/afs/bin/volscan -type rw -find mount acl) on each AFS server regularly and aggregate the results. When a user needs to be removed, a search of the volscan results would show which ACLs need to be updated. We would assume that we would remove old users from ACLs, but what to do with their files needs to be decided.

Audit/Tidyup

AFS space/quota

An audit (sharepoint) of AFS group space was carried out. Of the 852 group volumes (allocated 116TB of quota), 198 (10.7TB) of them weren't accessible by current any DICE account. These volumes haven't been deleted yet, but access was completely revoked by removing references to old accounts from the group's ACL. The date that this happened was noted in the spreadsheet.

Many old, static and not-paid-for volumes had a much larger quota than was necessary. By reducing this unused quota, 2.17TB of space was made available. Again, the date of this change was made is noted in the spreadsheet.

Homepages

There are currently 4855 homepages directories, of which 1068 have no primary roles, but only two of these are "post-grace". This is because we already run a script /disk/homewikipages/scripts/checkhomepages 3 times a day that checks whether any web/cgi directory needs to be created or removed. If a directory belongs to an account that no longer has the homepages/html capability, and it's empty, then it's deleted automatically. If the directory contains data, then it's manually copied to an archive, where it is deleted after a couple of months.

Dev Meeting 15/01/20

A Project Starting talk was given.
Trickier to establish whether to delete group space. Created a s/s on
sharepoint with all 850 groups and owner/contact details, would be pain to
maintain ourselves so ideally get users to do this. Could automate within
sharepoint, example shown as AFS Audit List. Can use flows based around a
check date column triggered annually which emails users to ask them to
update details. Users will just see their own groups. User changes can
trigger emails to us as well, such as when group space is no longer needed.
We should tie this in with the account closure email, or use data directly
from Prometheus. Alternatively could also use PIP to achieve much the same
thing, would be easier for users and would probably integrate better with
local scripts and Prometheus. Ross to discuss with Tim. The script for
creating group space could also add new entries to a database of group
space. Also need to handle homepages - however these are already automated
to an extent (by email notification), there is a pre-zap stage which moves
the pages so they are not served (we could automate this). Need to change
to no grace entitlement so home pages are removed (or moved out of the way)
immediately - AS will check with web strategy. Also need to do things like
cluster data. At the moment this is similarly semi automated using reports,
in this case ownership is changed to root automatically on entitlement
loss.
Finally ACL data - we can use volscan on server to create lists of ACLs for
groups - we could use this in order to help manage the process of removing
people from ACLs once their account has gone. Some of these changes don't
need to happen immediately, process could queue and batch for example.

Update 11/5/2020

volscan is now running daily on all the AFS servers that hold RW volumes. This logs where in the file system each volume is mounted, as well as all ACLs. I've created a script that searches the logs for a given string. The next step is to use the logs to find ACLs that list old accounts and remove them .

I've written a AFS group space creation script. As well as creating the space it logs the details in a postgres db. The next step is to refine the script, get all the information gathered from the space audit into the postgres db. I need to investigate integrating with PIP.

Dev meeting 12/5/2020

Met on May 12th 2020 [Now running volscan on the R/W AFS servers 
to create logs with ACL details. This is searchable to check for UUN of
someone gone. Thinking about the removal process, i.e. should we 
auto-remove if e.g. the owners of group space. Writing Perl script for 
walking user through creating some new AFS space which will also 
record additional details in Postgres database (instead of sharepoint) 
perl script - this will also suggest sensible candidate partitions for the 
new space. Plan to then use PIP to help access/maintain data. Have 
freed up 2TB of space - another 10TB could be cleared by dropping 
unused group space. Plan to record sponsor/owner of space when 
created.]

Update 13/7/2020

I've been working on the AFS group creation script. When creating a space it now suggests a suitable server and partition based on the information on https://wiki.inf.ed.ac.uk/DICE/AFSPartitions. Certain server/partition combinations have been marked as the default place ("DEFAULT or DEFAULT RO") to be used when creating new group space. The script looks for these partitions, and works out whether there's enough unallocated space (size of the partition - quotas already allocated - 200GB) for the new group. If none of the default partitions have enough unallocated space, then the script exits and suggests contacting the services-unit.

If the group space doesn't need to be backed up, then the script prints out instructions along with the lines you need to paste into tibsconf-server.h

Having recently had to create a few sweb group spaces, I think something similar to the above would be useful. Creating an sweb is quite a convoluted process, so it would be nice if the script could take care of at least the space creation part.

I've run into a problem with volscan, the logs are large (on some AFS servers the daily log can be 16GB) and filling up /var. So I've stopped the script running, and I'll update it so it writes somewhere else. Possibly /disk/scratch.

Dev meeting 14/7/2020

AFS space scripts - added script to suggest partitions, this checks AFS web
page which has manual list of partitions/usage. Excluding partition from
backups needs lines added to tibs.conf so scripts now tells you what to
add. Creating sweb sites is a lot of steps so would be good if scripts
could help with that - easy to create AFS space itself but manual steps
needed for getting uid, creating user, principal, and keytab so would be
good to automate some of this. Integration with homepages tidying - if
someone leaves and has an empty homepage directory then its deleted but if
not the content is moved aside and deleted later (not automatically
though). Might want support in scripts for maintaining PostgreSQL content,
e.g. for updating records and deleting. We have 100+ instances of unowned
group space, access has already been removed but need to decide when to
delete. In future when group space is requested we will be setting a
reclaim date, space is only purchased for a fixed period of time now.

Update 24/08/2020

I've documented the homepages archiving/deletion procedure. I've migrated all the data from the spreadsheet in sharepoint to a postgresdb. I've updated the volscan script, it now only keeps 3 days of logs, and it compresses them so they take up less space in /var.

Update 25/09/2020

Scripts

  • /usr/bin/afsvolscan: Logs detailed information about AFS volumes. It will allow us to find and ultimately remove old users from ACLs. It runs nightly on small AFS RW servers - ladon, nuggle, riddles, keto. The log files produced can be quite large, so it only saves 2 days worth of gzipped logs (under /var/services-unit/volscan). On the larger AFS servers - peppercorn, gresley, lemon, collett - it only runs on a Friday and Tuesday, as the script takes several days to complete - if it ran daily there would be several instances running at any one time.
    • TO-DO: Create a script that regularly goes through all the logs that afsvolscan creates. Do a "pts exam" on each unique AFS ID, and for any that return "User or group does not exist", search the logs for ACLs that contain that ID and remove it.

  • afscreate: Used to create new group and sweb space and add info to the database.
    • TO-DO: Minor updates - If there's a website predict default web url. If creating an sweb create the "data" and "web" directories and add "system:securewebserver read" to the ACL. Update the script to get the required gdir using fs exam (it currently tries to figure out which gdir to release by going up a level).

  • afsdel: Used to delete group and sweb space and mark as deleted in the database.
    • TO-DO: Minor updates - The script can take a long time to complete depending on the size of the volume being deleted, so ask for the comment first, and make it mandatory. Need to check/sanitise the comment input, it doesn't like "'". Also display a message stating "This deletion may take some time". Comments are appended to, could be separated in the db by ":". Implement a -force flag to remove the entry from the db right away.

  • afscheckup: Reports back the volumes that have an owner that is no longer "active", volumes that have passed their "needed_until" date, volumes that had their access removed over a year ago, and volumes that were deleted over a year ago.
    • TO-DO: Minor updates - Email results to services weekly. It's a bit slow, it might be faster to check whether a user is active using prometheus-get-info rather than prometheus-lifecycle. Might be nice to return the relevant dates for need until, access removed, and deleted results. Check to see whether any goups exist that are not in the db.

  • afsquery: Used to query the database.
    • TO-DO: Make the search case insensitive. Results could show whether space has been deleted. Users could run "afsquery" and see only their results i.e the group spaces where they are "owner".

  • afsremoveaccess?: Haven't created this yet. We'll need a script for removing access and updating the db with the date. Could be part of a "*afsupdate*" script which would allow you to update most of the fields in the database.

NFS

  • Do (another?) audit of riddles and nuggle. New NFS space is rarely created now, it has become a bit of an edge case. Planning to add NFS groups manually to the group space db. "NFS." can be the group (which is the primary key.)

Scope

  • I've so far concentrated on AFS and ACLs, homepages, NFS. What about clusters, avail files, subversion, git, blog, wordpress, mailing lists?

-- RossArmstrong - 18 Feb 2020

Topic revision: r14 - 05 Oct 2020 - 13:24:42 - RossArmstrong
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies