EUCLID Interoperability - Final Report

This is the final report for Project 111 - EUCLID Interoperability.


1. Introduction

The primary aim of this project was to ensure that all our services that relied on data from the current DACS system would continue to work after the transition to EUCLID.

2. Deliverables

  1. Ensure that IS deliver a EUCLID interface that meets our needs. This was achieved. The EUGEX "Generic Schools" Students feed was introduced to replace our old SMS Data Feed with near compatible structure and content. The EUGEX "Generic Schools" Courses feed replaced our equivalent download from the old Wisard/SACS system (WebCT Feed). These new feeds are an official IS service for downstream providers (unlike the old SMS one), available to any school and properly change managed with QA and test procedures in place. They are both part of EUGEX which is the one stop central delivery system for providing data from EUCLID to all other downstream IS and local school services.
  2. Update our local systems to work with the new EUCLID interface as seamlessly as feasible. This was achieved. A two stage approach was adopted. Initially the new feed was retrofitted onto the old processes. This was intended to provide a fallback and a life extension to the old ITO database system which was looking unlikely to be replaced before Q4 2010 as planned. At the same time an entirely new process was developed, called the "Generic Sync Framework" (GSF), which was introduced in Q4 2010 (running in parallel and used for 3G data only). This would take any source of data and by simple configuration automatically produce all the in-database support to drive the sync process(es) from that data source. The GSF was derived from work previously done for implementing the PAVD and PGT data feeds which provided data to the old ITO database system from BOXI. One key change made was to replace the external code process used in PAVD and PGT with an entirely in-database solution. The GSF provided a new feature called "local override" that would allow data to be maintained locally in extremis when EUCLID and/or the central feed was broken, but would allow the sync process to reclaim modified data silently when it was working again. This feature was never really used for its originally intended purpose, but has been used in new and unexpected ways. The ITO database system moved to Theon in Q4 2011 and also to the GSF for all data (EUGEX as well as supplemental data) and the old sync processes were dropped. The GSF was given a refresh in Q4 2011 to add additional functionality ("paramaterized sync") and to make performance/maintainability improvements. At that time it was re-named as the "TheonCoupler". The TheonCoupler as a result of being generic and running entirely in-database with the "local override" and "paramaterized sync" features became the de-facto replacement for many internal database processes which were automated originally by case by case hand written functions. The TheonCoupler now underpins a large part of core Theon functionality, all incoming data flows through it and many automated processes have been standardized to work using it. As such the contribution of this project has gone far beyond original intention and expectation with considerable fringe benefit.

3. Coupler Processes in Theon

This project explicitly only covered the processing of data from EUGEX that replaced our original data feeds. There are now many additional sources of data that flow in (or within) Theon. New ones are easy to add using configuration for TheonCoupler as all the necessary functionality is auto built into the master schema. The following lists the current distinct data feeds that are additional to the EUGEX ones.

  • ADMIS which is used for PGR/PGT student admissions data
  • ESSMU which is used for student/course data that is supplemental to EUGEX (i.e. not available in the standard feed)
  • HTBN which is used for contractual data from HR on students on TDM contracts
  • PPIPMI which is used for contractual data from HR on staff, replacing some of the locally maintained InfHR data
  • SPAWN which is an internal only coupler process used for automated record creation/maintenance (for holding assessment data), replacing (and greatly enhancing/extending) functions that were previously all hand written on a case by case basis
  • STAFF2G which is used to pull staff data from the legacy database into Theon (increasingly now replaced by PPIPMI outside of the legacy systems)
  • UPCOM which is used for handling continuous rollover of data from the current session into the next session, replacing (and greatly enhancing/extending) the old manually executed once per-session rollover procedure
  • WEBMARK which is used for transferring HTBN bids data entered by users through Webmark forms
  • ENROL which is a recently introduced process used for automated record creation based on user maintained auto enrolment rules
  • RT is pending but will transfer data directly from RT4 (using a "remote db call" within Theon) on open/new tickets associated with students

Hence the ubiquity of the small core of functionality introduced by this project is apparent.

4. Codebase

The TheonCoupler is built upon the "conduits" framework and is integrated with the XSD schema and XSLT processing functions used in Theon. Code actually specific to the TheonCoupler framework amounts to about 500 lines of Gurgle code (primarily a templating process). Each target for a coupler process requires about 10 to 20 lines of key value pair configuration. There are currently just shy of 100 distinct coupler process targets. Most coupler process targets also require view definitions to adjust the source data, these average about 30 lines for a current total of about 2500 lines. Data from outside Theon needs to be got into Theon in the first place before TheonCoupler can perform the sync process. Each external source needs a small configuration script using the "incoming" framework that provides some definitions for a Python module, also about 10 to 20 lines each. We currently have about 20 of these. The Python module provides all the necessary conversion (mail message attachment, XLS, CSV and delimted text processing) and uploading (using SQL) and is 450 lines (most of which is third party XLS handling code). A small amount of LCFG configuration is added to pass over incoming data and trigger incoming processes. The total auto-generated SQL DDL associated with the TheonCoupler and all coupler process targets is approximately 22000 lines.

5. Timeline

The project was started in Q4 of 2008. Most work up until Q2 2010 was sporadic and consisted of liaison meetings with IS and other Schools to ensure that the first deliverable was achieved. This was a feed to replace the existing SMS Data and WebCT Course feed from the DACS system with functionally equivalent feeds from EUCLID. In Q3 2010 a retrofit feed was produced as a fallback for the legacy 2g system. This took the new IS EUGEX data and mapped it back into the old processes that fed the then current 2g database system. In Q4 2010 the new prototypical GSF was introduced to replace the existing processes and drive the new 3g system with data directly. In Q3 2011 the ITO aspects of the legacy 2g system were migrated into 3g along with switching to the GSF and the retrofit feed was discarded. The GSF was refreshed and stabilised in Q4 2011. Since the introduction of the GSF many new feeds have been added.

6. Effort and Analysis

This project was given 8 weeks effort which in hindsight seems rather high for its original intended purpose. In practice explicit effort on this project was 2.5 weeks. Related effort (mostly specific coupler processes over and above the core EUGEX ones, but probably also some ancillary/infrastructure) was 5.5 weeks. Resulting in a total of 8 weeks rather surprisingly. The project deliverables were all met in Q4 2010 and the end of core project effort was also then (as shown in graph below). Effort on other coupler processes and refinement to the coupler framework itself continued on after that point but were not attributed this project (nor should they have been). The graph below shows effort in hours over time (in year/quarter blocks) for the core project effort and related non-core effort.


The graph below shows combined effort (core and non-core) alongside revision commit change over time. This commit change is specific to the TheonCoupler framework itself, not individual coupler process targets. The introduction in 2010 Q4, subsequent revision (non-core) in 2011 Q4 is apparent. The effort in 2012 Q4 is related to across the board refinement of some coupler processes, not the framework itself which is why there is no commit change shown for that.


The final graph below shows combined effort (core and non-core) alongside revision commit change over time as above except the commit change is not for the framework but is broken down into each specific coupler process. This shows the emergence of the core EUGEX, ESSMU and ADMIS feeds in Q4 2010 for the IGS use of Theon. The revision of the EUGEX and ESSMU feeds for the migration of the ITO to using Theon in Q3/4 2011 and also the introduction of the SPAWN and UPCOM feeds is apparent. The SPAWN feed shows continuous revision throughout the 2011/12 session as its usage is associated with the assessment process which was in development and constant flux over that time frame. Q4 2011 introduces the HTBN feed for that new business area. Q3/4 of 2012 shows a major revision of SPAWN and UPCOM to address numerous issues and gaps in provision that arose (or were identified anyway) during their first live usage in the 2011/12 session. Finally Q4 2012 introduces the PPIPMI feed that pushes staff data from HR into Theon. The graph also shows the linear trend in combined effort which falling over the defined period.


7. Approach

The development approach taken in this project was interesting, in that by far the greatest result was not ever expressed as a core aim/deliverable of the project. This would appear to be an aspect of what is called "emergent design" in agile software development. It is in part due to not taking this project out of its surrounding context as that was also driving what will actually really be needed. It is in part due to not over specifying the original requirements (or plan to meet the deliverables) of the project. It is in part due to running the system live from early on and evolving requirement based on actual user and technical demand. It is in part due to allowing a more free flowing, less targeted and longer term development cycle. The quote below stolen from Wikipedia describes this.

Emergent design is a consistent topic in agile software development, as a result of the methodology's focus on delivering small pieces of working code with business value. With emergent design, a development organization starts delivering functionality and lets the design emerge. Development will take a piece of functionality A and implement it using best practices and proper test coverage and then move on to delivering functionality B. Once B is built, or while it is being built, the organization will look at what A and B have in common and refactor out the commonality, allowing the design to emerge. This process continues as the organization continually delivers functionality. At the end of an agile release cycle, development is left with the smallest set of the design needed, as opposed to the design that could have been anticipated in advance. The end result is a smaller code base, which naturally has less room for defects and a lower cost of maintenance.

It is a theme that will be common in all Theon related work and will be dealt with in more depth in other final reports. The quote above does not reflect on some of the negative aspects of the approach of course and it is definitely not going to be applicable in all scenarios. However, the larger the scope of a project and the less quantifiable and more evolving are the requirements of a project then the more attractive this approach looks to be.

8. Documentation

Technical documentation for setting up, configuring and running a coupler process using TheonCoupler is available on the Theon Trac Wiki. Note that it assumes a good knowledge of general Theon management, specifically schema change and the conduits framework. It also has more details on individual coupler processes and process targets.

9. Ongoing and Future Work

  • Move older feeds still using the GSF codebase onto the new TheonCoupler codebase. Drop GSF codebase. This is a small action in Trac, #1121, which will be addressed in some future spare cycles.
  • Replace existing configuration files with schema meta data, management desktop and DDL template processing using XSLT. This will also automate some of the remaining manual steps in the process and address a few minor deficiencies in the framework at the same time (such as handling for correlated processes and customisable parameter types). This is already in progess and is part of the Change Management project tracked separately.
  • Re-implement the bulk of the "marker" framework (custom template functions that carry out assessment processing) using enhancements to SPAWN. This is pending.
  • See also the roadmap entries for Consolidation of Coupler Feeds and Consolidation of Coupler which are buckets for smaller issues.

-- TimColles - 29 Jan 2013

Topic revision: r4 - 30 Jan 2013 - 16:15:16 - TimColles
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies