Trying out the use of eXist (XML database) to store each year's information

  • The server currently runs on
  • The software and data are under /disk/scratch/eXist
  • Copied twice daily (see ~ht/.crontab) to /group/ltg/users/ht/exideDataMirror so it gets backed up
    • Note that the database itself has to do a twice-daily incremental db dump, as the applications info and reviews now live only in the database
  • Launch the server with /disk/scratch/eXist/bin/
    • I've editted this to capture stderr to a date-stamped file in /disk/scratch/eXist/log
  • There are security issues: At the moment there's nothing to stop anyone from launching the eXide dashboard and browsing things from there

All the code and data are in the database, in an app called phd, so the base URI for access is

For data protection reasons, app.xq is now set up to only respond to requests from three IP addresses corresponding to toaster.inf, a.k.a. web-groups.inf a.k.a. select.inf, so the way in that works is a forwarding stub at

This enforces an access restriction to staff and others listed individually in the acls for /group/admissions[/bin]

Data comes in in three ways:

  1. Export ("Export current report as. . .Excel) a current ILCC PGR live applications from BI (Business Intelligence),,
    HST PGR applicants
    1. Run, (/group/admissions/bin on DICE, ~ht/bin on luther), which should pump all the applicants one at a time to http://localhost:8080/exist/apps/phd/new-app-maybe.xq via ( /group/admissions/bin on DICE, ~ht/lib/python on luther)
    2. This depends on either being inside the Informatics firewall, or ssh -f -N -L 8080:localhost:8080
    3. New applicants go into .../apps/phd/[year]/apps/[uun].xml
      1. Existing applicants are ignored, except that status changes are implemented and logged to stdout
    4. View all applicants with, or select one with ?uun=S... Earlier years with ?year=20..
    5. Applicants are linked to their application data as scraped from EUCLID, see below.
    6. Use to add attributes to app entries
  2. Get a review
    1. Reviewer completes and submits
    2. Backup email is sent, but also an XML version is sent to
    3. Where it goes into .../apps/phd/[year]/reviews/[uun].xml
      • Either creating it, or adding a new rev inside the doc elt revs
  3. Manually update the status.xlsx spreadsheet (in /z/studentInfo or bus/ilcc_phd/studentInfo)
    1. Copy-paste new/updated entries from the app.xq display into the 'assess' sheet
      1. Easiest is to just grab everything from the first header line ("UUN...") to the end
      2. Paste into an xemacs buffer
      3. Use /group/admissions/bin/cleanup.el:M-x cleanup to produce single lines with no headers
      4. Copy, and paste over the data rows in the 'assess' sheet
      5. Sort the results by UUN, and update the newReviews named region if necessary
    2. For new applicants, add their UUN to column B of a new row, copy down cols A,C and F--I in the 'status' sheet

Funding sources are linked from, fees are at At the moment home fees are approx. 4K, overseas 20.5K. See also T. Ironside's guidance.

Webmark form for advancing to SD:

The resulting ISS tickets are accessible via:

Admission status codes

[Superseded] tool for downloading application material from EUCLID to my webspace ( is ~ht/bus/ilcc_phd/studentInfo/ (CVS version 1.5 is last that worked like this. . .)

  1. > cd ~ht/bus/ilcc_phd/studentInfo
  2. > export PYTHONPATH=/group/ltg/projects/lcontrib/lib/python2.6/site-packages
  3. > python Sxxxxxxx Sxxxxxxx ...

Tool for downloading application material from EUCLID now lives at /group/admissions, and, protected by Kerberos/CoSign. is still there, vastly expanded, with as driver, both in bin. Logs in logs, data under 2016/S....... (and S......._[surname]) /[degree pgm code]/

Theon beliefs about current applicants (as from and are dropped into /group/admissions every morning at 0745 (Tim Colles is contact for this)

  1. To scrape a current set of information from EUCLID:
    • > cd /group/admissions
    • > bin/
  2. For more detailed control, unpack this in various ways, from /group/admissions:
    • > bin/ [-d ...] TP199CSV _CDT_applications.csv TP001CSV _Applications.csv TP226CSV _Hold_Gathered.csv TP267CSV _CDT_Hold_Gathered.csv
    • The necessary EUCLID login happens right away, but -d ... will delay the actual operation for ... minutes
    • The best way to manage this is to have a VNC server running on a DICE machine
      • Use HOME=/group/admissions vncserver :10
    • And connect via either vncviewer (local) or NoMachine (remote)
      • nescio4v
    • Periodically check for cases where Theon is no longer reporting, and update them:
      • > bin/
      • > bin/ ht $(cat new_dormant)
  3. To update specific students/applications, use
    • > HOME=/group/admissions bin/ [username] Snnnnnnn:[pgmcode] . . .
  4. The implementation depends on Selenium to scrape data from Firefox -- if they get our of sync you may see early hard errors such as "Shouldn't happen -- exception caught at highest level...The browser appears to exit before we could connect"

Annual PGR Review meeting

To get the necessary supervisor information for this, use the Explorer tool in BOBI, point at the "Informatics Research Student Programme Infospace", filter on ILCC + ICCS and export the result to EXCEL where you can delete columns, reorder and sort to get what you need. I found that sorting on Supervisor End Date, Year of Programme, UUN, Supervisor Type (Z-->A) and Supervisor Name worked well, saved that to TSV

-- HenrySThompson - 15 Jan 2014

Topic revision: r32 - 21 Nov 2017 - 12:35:17 - HenrySThompson
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies