Account Tidying: AFS/homepages/groupspace

Project description

Details of this project can be found here: https://computing.projects.inf.ed.ac.uk/478/

Related to: https://computing.projects.inf.ed.ac.uk/finished/#349

From project 349:

For accounts we want to delete completely, we need to consider the following:

Files owned by that user outwith home directories - e.g. group space, home pages. If there are other places, we should note them explicitly. ACLs including these users.

Initial thoughts

AFS/NFS Space

The nature of group space means that when a user leaves you can't just delete the group space they requested/own, as others may still be using it. For this reason the deletion of group space can't be fully automated, but we can try to make decisions based on information we hold.

An AFS group space audit was carried out towards the end of last year, so this data could be used to figure out which AFS group spaced we might want to delete when users leave. One of the first things to consider is how do we keep this information accurate and up-to-date?

There are around 850 group space volumes, and we have a record of who the "owner" of each volume is, so the obvious thing to do would be to periodically (annually?) email the owner, asking them to check that the information is correct, and notify us if the space can be deleted/archived, or any of the information held is inaccurate.

It would be nice to automate this process.

My first thoughts are that sharepoint could facilitate this. The AFS audit, currently held in a Excel spreadsheet, can easily be made into a sharepoint list. From there a flow could be created that would annually email the "owner", with a link to the list. When they visit the list page they would be presented only with the rows containing groups owned by them, and they would have permission to update specific fields- whether the information is correct, whether the space is still needed, and if not whether the data should be archived or deleted. A form could be created using powerapps to make this easier whoever is making the changes.

When it comes to creating new AFS space, at the moment all tickets are passed to the Services Unit and we create the space and update the spreadsheet with the details. This is partly because the audit was ongoing, and because we were bringing a lot of new AFS space online, so it wasn't obvious what AFS partitions were safe to use. Going forward AFS group space creation could be passed back to fontline support, and new volumes could be added to the sharepoint list when they're created.

sweb and NFS space could also be added to this list.

Things to consider- Is sharepoint capable of taking in data from outside of sharepoint e.g a list of accounts that are about to be deleted so that the owner can be notified?

As things stand, there are a lot of AFS groups that don't have an active account associated with them. At what point should the data be deleted? Do we ever archive group space and if so, what would the process be for this?

Hompages

Previously (circa 2013) homepages have been archived in /afs/inf.ed.ac.uk/group/support/archiving/, more recently we've used a script to automatically delete the homepages associated with old accounts that were empty, but any that contained content need to be manually deleted. It seems likely that in the future the existing script will be used to simply delete all old homepages automatically.

ACLs

With regard to removing a user from ACLs, we have a script that trawls AFS group space and collects ACL info, but it's run manually only when needed, and takes a long time to complete. A better option might be to run the command volscan (/usr/afs/bin/volscan -type rw -find mount acl) on each AFS server regularly and aggregate the results. When a user needs to be removed, a search of the volscan results would show which ACLs need to be updated. We would assume that we would remove old users from ACLs, but what to do with their files needs to be decided.

Audit/Tidyup

AFS space/quota

An audit (sharepoint) of AFS group space was carried out. Of the 852 group volumes (allocated 116TB of quota), 198 (10.7TB) of them weren't accessible by current any DICE account. These volumes haven't been deleted yet, but access was completely revoked by removing references to old accounts from the group's ACL. The date that this happened was noted in the spreadsheet.

Many old, static and not-paid-for volumes had a much larger quota than was necessary. By reducing this unused quota, 2.17TB of space was made available. Again, the date of this change was made is noted in the spreadsheet.

Homepages

There are currently 4855 homepages directories, of which 1068 have no primary roles, but only two of these are "post-grace". This is because we already run a script /disk/homewikipages/scripts/checkhomepages 3 times a day that checks whether any web/cgi directory needs to be created or removed. If a directory belongs to an account that no longer has the homepages/html capability, and it's empty, then it's deleted automatically. If the directory contains data, then it's manually copied to an archive, where it is deleted after a couple of months.

Dev Meeting 15/01/20

A Project Starting talk was given.
Trickier to establish whether to delete group space. Created a s/s on
sharepoint with all 850 groups and owner/contact details, would be pain to
maintain ourselves so ideally get users to do this. Could automate within
sharepoint, example shown as AFS Audit List. Can use flows based around a
check date column triggered annually which emails users to ask them to
update details. Users will just see their own groups. User changes can
trigger emails to us as well, such as when group space is no longer needed.
We should tie this in with the account closure email, or use data directly
from Prometheus. Alternatively could also use PIP to achieve much the same
thing, would be easier for users and would probably integrate better with
local scripts and Prometheus. Ross to discuss with Tim. The script for
creating group space could also add new entries to a database of group
space. Also need to handle homepages - however these are already automated
to an extent (by email notification), there is a pre-zap stage which moves
the pages so they are not served (we could automate this). Need to change
to no grace entitlement so home pages are removed (or moved out of the way)
immediately - AS will check with web strategy. Also need to do things like
cluster data. At the moment this is similarly semi automated using reports,
in this case ownership is changed to root automatically on entitlement
loss.
Finally ACL data - we can use volscan on server to create lists of ACLs for
groups - we could use this in order to help manage the process of removing
people from ACLs once their account has gone. Some of these changes don't
need to happen immediately, process could queue and batch for example.

Update 11/5/2020

volscan is now running daily on all the AFS servers that hold RW volumes. This logs where in the file system each volume is mounted, as well as all ACLs. I've created a script that searches the logs for a given string. The next step is to use the logs to find ACLs that list old accounts and remove them .

I've written a AFS group space creation script. As well as creating the space it logs the details in a postgres db. The next step is to refine the script, get all the information gathered from the space audit into the postgres db. I need to investigate integrating with PIP.

Dev meeting 12/5/2020

Met on May 12th 2020 [Now running volscan on the R/W AFS servers 
to create logs with ACL details. This is searchable to check for UUN of
someone gone. Thinking about the removal process, i.e. should we 
auto-remove if e.g. the owners of group space. Writing Perl script for 
walking user through creating some new AFS space which will also 
record additional details in Postgres database (instead of sharepoint) 
perl script - this will also suggest sensible candidate partitions for the 
new space. Plan to then use PIP to help access/maintain data. Have 
freed up 2TB of space - another 10TB could be cleared by dropping 
unused group space. Plan to record sponsor/owner of space when 
created.]

Update 13/7/2020

I've been working on the AFS group creation script. When creating a space it now suggests a suitable server and partition based on the information on https://wiki.inf.ed.ac.uk/DICE/AFSPartitions. Certain server/partition combinations have been marked as the default place ("DEFAULT or DEFAULT RO") to be used when creating new group space. The script looks for these partitions, and works out whether there's enough unallocated space (size of the partition - quotas already allocated - 200GB) for the new group. If none of the default partitions have enough unallocated space, then the script exits and suggests contacting the services-unit.

If the group space doesn't need to be backed up, then the script prints out instructions along with the lines you need to paste into tibsconf-server.h

Having recently had to create a few sweb group spaces, I think something similar to the above would be useful. Creating an sweb is quite a convoluted process, so it would be nice if the script could take care of at least the space creation part.

I've run into a problem with volscan, the logs are large (on some AFS servers the daily log can be 16GB) and filling up /var. So I've stopped the script running, and I'll update it so it writes somewhere else. Possibly /disk/scratch.

Dev meeting 14/7/2020

AFS space scripts - added script to suggest partitions, this checks AFS web
page which has manual list of partitions/usage. Excluding partition from
backups needs lines added to tibs.conf so scripts now tells you what to
add. Creating sweb sites is a lot of steps so would be good if scripts
could help with that - easy to create AFS space itself but manual steps
needed for getting uid, creating user, principal, and keytab so would be
good to automate some of this. Integration with homepages tidying - if
someone leaves and has an empty homepage directory then its deleted but if
not the content is moved aside and deleted later (not automatically
though). Might want support in scripts for maintaining PostgreSQL content,
e.g. for updating records and deleting. We have 100+ instances of unowned
group space, access has already been removed but need to decide when to
delete. In future when group space is requested we will be setting a
reclaim date, space is only purchased for a fixed period of time now.

Update 24/08/2020

I've documented the homepages archiving/deletion procedure. I've migrated all the data from the spreadsheet in sharepoint to a postgresdb. I've updated the volscan script, it now only keeps 3 days of logs, and it compresses them so they take up less space in /var.

Update 25/09/2020

Scripts

  • /usr/bin/afsvolscan: Logs detailed information about AFS volumes. It will allow us to find and ultimately remove old users from ACLs. It runs nightly on small AFS RW servers - ladon, nuggle, riddles, keto. The log files produced can be quite large, so it only saves 2 days worth of gzipped logs (under /var/services-unit/volscan). On the larger AFS servers - peppercorn, gresley, lemon, collett - it only runs on a Friday and Tuesday, as the script takes several days to complete - if it ran daily there would be several instances running at any one time.
    • TO-DO: Create a script that regularly goes through all the logs that afsvolscan creates. Do a "pts exam" on each unique AFS ID, and for any that return "User or group does not exist", search the logs for ACLs that contain that ID and remove it.

  • afscreate: Used to create new group and sweb space and add info to the database.
    • TO-DO: Minor updates - If there's a website predict default web url. If creating an sweb create the "data" and "web" directories and add "system:securewebserver read" to the ACL. Update the script to get the required gdir using fs exam (it currently tries to figure out which gdir to release by going up a level).

  • afsdel: Used to delete group and sweb space and mark as deleted in the database.
    • TO-DO: Minor updates - The script can take a long time to complete depending on the size of the volume being deleted, so ask for the comment first, and make it mandatory. Need to check/sanitise the comment input, it doesn't like "'". Also display a message stating "This deletion may take some time". Comments are appended to, could be separated in the db by ":". Implement a -force flag to remove the entry from the db right away.

  • afscheckup: Reports back the volumes that have an owner that is no longer "active", volumes that have passed their "needed_until" date, volumes that had their access removed over a year ago, and volumes that were deleted over a year ago.
    • TO-DO: Minor updates - Email results to services weekly. It's a bit slow, it might be faster to check whether a user is active using prometheus-get-info rather than prometheus-lifecycle. Might be nice to return the relevant dates for need until, access removed, and deleted results. Check to see whether any goups exist that are not in the db.

  • afsquery: Used to query the database.
    • TO-DO: Make the search case insensitive. Results could show whether space has been deleted. Users could run "afsquery" and see only their results i.e the group spaces where they are "owner".

  • afsremoveaccess?: Haven't created this yet. We'll need a script for removing access and updating the db with the date. Could be part of a "*afsupdate*" script which would allow you to update most of the fields in the database.

NFS

  • Do (another?) audit of riddles and nuggle. New NFS space is rarely created now, it has become a bit of an edge case. Planning to add NFS groups manually to the group space db. "NFS." can be the group (which is the primary key.)

Scope

  • I've so far concentrated on AFS and ACLs, homepages, NFS. What about clusters, avail files, subversion, git, blog, wordpress, mailing lists?

Dev meeting 13/10/2020

Main AFS scripts ptretty much done: * "create", ask questions and then creates 
AFS space plus updates the PostgreSQL database. Does some guess work 
following user inputs to make running this faster and less monotonous. 
* "delete", user enters volume and path and scripts checks in database for 
relevant mount points and gets confirmation as can take a while to delete the AFS space. 
A user entered comment plus deleted date is held in the database - the database 
entry will be removed a year after this. * "query", can use to check if user has 
space already, and for charging information. A report will be emailed to services 
unit listing group space where owner is no longer active, space gone past review 
its date, volumes where access was removed a year ago, volumes deleted over 
a year ago and volumes manually created but not in the database. An "update" script 
has not been done yet and needed for when things change with groups. Use cron for
 clearing old AFS IDs from ACLS, afs-volscan script searches and removes but path
 is relative to volume so don't know mount point - approach being taken is to additionally
 mount all volumes at one point we do know. We can use "fs clean acl" which removes 
IDs in one go across all volumes - working on this now - its agnostic about volume so 
all will be checked for orphaned IDs, we have ~45000 orphaned just now on one server 
alone so there will be an initial big clear out. Cleaning home pages - reported this was 
being done already but the process is now documented. 

Dev meeting 24/11/2020

Working on database schema improvements, e.g .for handling multiple charges each 
with date, comment, RT. Then need to finish scripts for changes in db. 

Dev meeting 26/01//2021

The afscreate, afsquery and afsdelete scripts have been
 updated so that they all use the new database mentioned at the last meeting. 
The scripts also needed to be updated before they would work under Ubuntu. 
The afsupdate script has been completed, meeting the deliverable 
"Create a script for updating the information held in the database." 
The next thing to work on will be the "Implement mechanism for removal 
of old AFSIDs from ACLs. " deliverable.

Update 08/03/2021

I've been working on a script that will remove old AFS ids from ACLs. It's pretty much finished, I'll run it once I get a second opinion from Neil as to whether I've gone about it the correct way. I've also written a script that can be used to find where specific AFSid are in use (in ACLs). It searches the volscan reports that are produced on each AFS RW server. Next I'm going to revisit the reporting script, it needs to be updated now that the database has changed.

-- RossArmstrong - 18 Feb 2020

Topic revision: r15 - 08 Mar 2021 - 15:48:30 - RossArmstrong
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies