PAVD

This is the interim process for handling the Undergraduate Post Application Visit Day. This is driven from applicant information held in EUCLID and some locally maintained information. The process is entirely automatic. The source is a PAVD Excel report generated in BOXI from the EUCLID Universe and received locally as an attachment in an email. The report is generated and sent out every weekday morning using a recurrent BOXI schedule. The local synchronization is carried out by a .procmailrc triggered Python script for the daidb user on the database server.

The PAVD structures in the local database consist of five tables.

  • pavddata - the raw data from EUCLID (a representation of the Excel file content held in the database).
  • pavd_person, pavd_course - simplistic representation of the EUCLID Universe structure held in sync with the pavddata content (and hence EUCLID upstream data), these form the read-only local cache of EUCLID UG application data.
  • pavd_ie, pavd_dates - locally maintained information (by the ITO) for the PAVD process complementing the EUCLID data (local records are tied to corresponding EUCLID records by the student code).

See the EUCLIDiE Cluster in the normal database service data model pages for more details. The pavd_person, pavd_course and pavd_ie tables are all accessable and maintainable by the ITO via the PAVD custom form in the normal graphical user interface.

PAVD Sync Process

The synchronization process works as follows.

  • A PAVD report in Excel format is sent as an attachment in email to infdb-upload@inf.ed.ac.uk (aka daidb@inf.ed.ac.uk) from BOXI every week day morning at 8AM. This process is run out of a specific user account at present. Note that the report runs with a fixed range of 15/10/2008 - 14/10/2009 to pick up the correct set of applicants, this will need to be changed for next sessions intake.
  • The daidb users .procmailrc filter on the database server picks up this report by a subject line match and pipes it to the pavd_upload script in the conduits/etc directory.
  • The pavd_upload script:
    • saves a copy of the received email message as a file called BOX_ITO_PAVD_YYYYMMDDHHMM.msg in the conduits/holding directory (where YYYYMMDDHHMM is the current day and time).
    • unpacks the attachment and saves the Excel document as a file called BOX_ITO_PAVD_YYYYMMDDHHMM.xls in the conduits/holding directory.
    • converts the Excel document to an Ingres load file (ascii text percent delimited) for very fast direct table load in a file called BOX_ITO_PAVD_YYYYMMDDHHMM.txt in the conduits/holding directory
    • generates an SQL file to load the pavddata table from the content of this load file in a file called BOX_ITO_PAVD_YYYYMMDDHHMM.sql.
    • runs the sql infdb command on BOX_ITO_PAVD_YYYYMMDDHHMM.sql to load the pavddata table. This completely wipes the old content of the pavddata table and replaces it with the live current content in the Excel file.
    • runs the conduit pavd_sync which works out the differences between the new content of the pavddata table and the old content in the pavd_person and pavd_course tables and produces a set of SQL statements to bring these tables into line in the file pavd_sync.sql in the conduits/outputs directory. In detail the pavd_sync conduit runs queries to compare the content of the pavddata table and the pavd_person and pavd_course tables and creates some SQL to bring the latter in line with the former. It will print output saying what records (applicants and courses) are being created, updated or deleted. Note that no records are actually deleted from the local copies - if an application once appeared in the download and now no longer does it is simply marked locally as deleted with a date. The only case where an application will disappear from the download is when the applicant cancels the application themselves, hence it should not be frequent and more than a few deletes should be cause for concern.
    • runs the sql infdb command on the pavd_sync.sql file to make the table updates, bringing the local cache copy uptodate with the upstream status. Note that it is only at this step that the actual PAVD local copies of EUCLID Universe structures are affected.
    • runs the conduit pavd_sync_report which queries for all the changes made by the sync process and produces a text report in the file pavd_sync_report.txt in conduits/outputs.
    • emails the pavd_sync_report.txt file to the ITO request tracker (only if non-zero content).
    • output and error from all the above is logged in the file BOX_ITO_PAVD_YYYYMMDDHHMM.log in the conduits/holding directory.

The whole process only takes a few seconds.

To run the process manually the simplest approach is to email a compatible Excel document as an attachment to infdb-upload with the correct subject line (see the .procmailrc). Alternatively if the original report was received but the process broke locally at a later point, the received email message will be held in the file called BOX_ITO_PAVD_YYYYMMDDHHMM.msg in the conduits/holding directory. This can simply be piped directly into the upload script as the daidb user, for example cd ~/conduits/etc and then cat ../holding/BOX_ITO_PAVD_YYYYMMDDHHMM.msg | ./pavd_upload, to allow manual testing and debugging. Note that the warnings about file size and OLE2 inconsistency seem when running manually are normal and do not break the process.

-- TimColles - 09 Feb 2009

Topic revision: r6 - 30 Apr 2009 - 14:13:29 - TimColles
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies